6 Translation efficiency changes in CDS

The ribosome footprint (ribo-seq) and mRNA (RNA-seq) profiling allowed to determine changes in translation efficiency between two treatment groups. We employed a contrast model that calculated the logarithmic differences of the ratios between two groups in Ribo-seq and RNA-seq, with the null hypothesis defined as:

\[\begin{equation} H_0: \Bigg| log\left(\frac{treatment 2}{treatment 1}\right)_{ribo-seq} - log\left(\frac{treatment 2}{treatment 1}\right)_{RNA-seq} \Bigg| = 0, \tag{6.1} \end{equation}\] where the treatment 2 and 1 groups could be DUX-pulse+\(IFN_\gamma\) and \(IFN_\gamma\), respectively, or DUX-pulse and untreated.

In this chapter, we examined the effects of DUX4 on translation efficiency in CDS, specifically comparing DUX-pulse+\(IFN_\gamma\) to \(IFN_\gamma\). Here, we illustrate the steps and code used to discover any changes in translation efficiency.

6.1 Steps

To discover changes in translation efficiency induced by DUX4, we followed these steps:

  1. We defined genomic features that include gene-based CDS, as well as transcript-based features such as 5’ UTR, 13 nts up/downstream from translation start sites, first coding exons, and 3’ UTR. We then counted p-sites within these genomic features, as described in Chapter 4.
  2. Conducted non-specific filtering by excluding 92 histone1 (H1) and histone 2 (H2) variants, as well as DUX4- and \(IFN_\gamma\)-induced genes (as described in Chapter 5). We applied loose base-mean filtering to both RNA-seq and ribo-seq data.
  3. Pre-estimated size factor separately for RNA-seq and ribo-seq libraries using DESeq2 and p-site counts (ribo-seq) and gene counts (RNA-seq), irrespective of the genomic features.
  4. Applied loose base-mean non-specific filtering to avoid very low counts (noise)
  5. Performed hypothesis testing by applying the contrast model described above to DESeq2, using adjusted p-values < 0.05 and |logFC| > 1 as the thresholds for significant translation efficiency changes.
  6. Conducted GO term analysis for down-regulated translation changes.

6.2 Flowchart

knitr::include_graphics("./images/flowchart-translation-changes.png")
Flowchart to examine the translational efficiency changes induced by DUX4.

Figure 3.5: Flowchart to examine the translational efficiency changes induced by DUX4.

6.3 Load libraries and define parameters

Note that the main directory is named pkg_dir, and we have set it to our local path. The code chunks in this chapter are extracted from here.

library(DESeq2)
library(readxl)
library(writexl)
library(tidyverse)
library(corrr)
library(plyranges)
library(wesanderson)
library(goseq)
library(org.Hs.eg.db)
library(hg38.HomoSapiens.Gencode.v35)
data(gene.anno)
txdb <- hg38.HomoSapiens.Gencode.v35

library(BiocParallel)
bp_param=MulticoreParam(workers = 4L)
register(bp_param, default=TRUE)
# define parameters
pkg_dir <- "~/CompBio/DUX4-IFNg-ribosome-footprints"
stats_dir <- file.path(pkg_dir, "stats", "translation_changes", "CDS")
source(file.path(pkg_dir, "scripts", "06A-tools_TE.R"))
source(file.path(pkg_dir, "scripts", "tools.R"))
load(file.path(pkg_dir, "data", "dds_cds_by_gene.rda"))

my_color <- wesanderson::wes_palette("Darjeeling1", n=5)[2:5]
names(my_color) <- levels(dds_cds_by_gene$treatment)

# define comparison groups
list_comp <- list(S1 = c("untreated", "DOX_pulse"),
                  S2 = c("untreated", "IFNg"),
                  S3 = c("untreated", "DOX_pulse_IFNg"),
                  S4 = c("IFNg", "DOX_pulse"),
                  S5 = c("DOX_pulse", "DOX_pulse_IFNg"),
                  S6 = c("IFNg", "DOX_pulse_IFNg" ))

Next, we loaded the data sets of ribosome footprints and mRNA profiling as we as the genes to be remove from the analysis of translation efficiency changes.

#
# load rse by genes
#
load(file.path(pkg_dir, "data", "rse_cds_by_gene.rda"))
load(file.path(pkg_dir, "data", "rse_cds_mRNA.rda")) # cds by genes

#
# load DUX4_altered and INFg_altered
#
load(file.path(pkg_dir, "data", "DUX4_induced_v2.rda")) # LFC > 1
load(file.path(pkg_dir, "data", "IFNg_induced_v2.rda")) # LFC > 1

#
# identify H1 and H2 variants and PSM subunits
#
histone_variants <- .histone_variants(gene.anno)
exclude_gene <- c(histone_variants$gene_id, DUX4_induced_v2$ensembl)
psm_name <- readxl::read_xlsx(file.path(pkg_dir, "extdata", "Gene Lists_Proteasome_MHC-I.xlsx"),
                              sheet=1, range="A4:B49") 
PSM <- as.data.frame(gene.anno) %>% dplyr::select(gene_name, gene_id) %>%
  dplyr::filter(gene_name %in% psm_name$Symbol)

6.4 DESeq2: DUX4+\(IFN_\gamma\) vs. \(IFN_\gamma\) in CDS

The code chunks below perform DESeq2’s differential analysis using a designated contrast model (as mentioned above) to discover translation efficiency changes in DUX4+\(IFN_\gamma\) relative \(IFN_\gamma\) in CDS.

# S6: use .comb_RNA_Ribo to combine mRNA and ribosome profilings and 
# define design matrix to be used by DESeq2
dds_S6 <- .comb_RNA_Ribo(rse_rna = rse_cds_mRNA, 
                         rse_ribo = rse_cds_by_gene, 
                         treatments=list_comp[["S6"]],
                         rna_mean_filter=10, ribo_mean_filter=5)
dds_S6 <- dds_S6[!rownames(dds_S6) %in% exclude_gene]        
dds_S6 <- DESeq(dds_S6)
#resultsNames(dds_S6)
S6_ribo_over_rna <- results(dds_S6, 
                            name="protocolRibo.treatmentDOX_pulse_IFNg", 
                            alpha=0.05) # translation efficiency changes
S6_rna <- results(dds_S6, 
                  name="treatment_DOX_pulse_IFNg_vs_IFNg", alpha=0.05)  
S6_ribo <- results(dds_S6, 
                   contrast=list(c("treatment_DOX_pulse_IFNg_vs_IFNg",
                                   "protocolRibo.treatmentDOX_pulse_IFNg")),
                   alpha=0.05)

Next we tidied the result and made a scatter plot for the log fold changes between DUX-pulse and \(IFN_\gamma\) in ribo-seq and in RNA-seq.

# tidy the results and exclude DUX4-induced genes
tidy_S6_v2 <- .tidy_res_ribo_over_rna(dds=dds_S6, 
                                      inter_res=S6_ribo_over_rna, 
                                      ribo_res=S6_ribo, rna_res=S6_rna,
                                      alpha=0.05, lfc_threshold=1) %>%
             dplyr::filter(!ensembl %in% DUX4_induced_v2$ensembl) %>%
             dplyr::mutate(IFNg_induced_v2 = ensembl %in% IFNg_induced_v2$ensembl)

# scatter plot between log2FC in ribo-seq and in RNA-seq
res <- tidy_S6_v2
msg_up   <- sprintf("log2(Ribo / RNA) > 1: %4.0f", sum(res$status == "up"))
msg_down <- sprintf("log2(Ribo / RNA) < -1: %4.0f", sum(res$status == "down"))  
pearson <- cor(res$rna_lfc, res$ribo_lfc)             
label_x_pos <- 0 
label_y_pos <- -2.5

ggplot(res, aes(x=rna_lfc, y=ribo_lfc)) +
  geom_point(size=2, alpha=0.5, aes(color=status, shape=IFNg_induced_v2)) +
  theme_bw() +
  annotate("text", x=label_x_pos, y=label_y_pos, label=paste0("Pearson = ", format(pearson, digit=2)),
           hjust = 0, vjust=1) +
  annotate("text", x=label_x_pos, y=label_y_pos-0.5, label=msg_up, color="red", 
           hjust = 0, vjust=1) +
  annotate("text", x=label_x_pos, y=label_y_pos-1, label=msg_down, 
           color="blue", hjust=0, vjust=1) +
  scale_color_manual(values=c("gray75", "red", "blue")) + 
  scale_shape_manual(values=c(19, 1)) +
  theme(panel.grid.minor = element_blank(), 
        plot.title = element_text(hjust = 0.5), 
        legend.position="bottom", legend.box="vertical") +
  labs(x=bquote(~"RNA-seq:" ~log[2]~"(DOX_pulse_IFNg / IFNg)"), 
       y=bquote(~"Ribo-seq:" ~log[2]~"(DOX_pulse_IFNg / IFNg)"), 
       title="Ribo-seq vs. RNA-seq (exclude DUX4-induced)", shape="IFNg-induced") 
Scatter plot of log fold changes between DUX-pulse and IFN in ribo-seq and in RNA-seq. Blue dots indicate down-regulated translation efficiency changes and red up-regulated.

Figure 6.1: Scatter plot of log fold changes between DUX-pulse and IFN in ribo-seq and in RNA-seq. Blue dots indicate down-regulated translation efficiency changes and red up-regulated.

6.5 GO term analysis

# take down-regulated translation efficiency and use goseq to find the
# affected GO terms. Display the top 10.
interested <- tidy_S6_v2 %>% dplyr::filter(status=="down")
universal <- tidy_S6_v2 %>% pull(ensembl) %>% str_replace("\\..*", "")
selected <- interested %>% pull(ensembl) %>% str_replace("\\..*", "")
enriched_go <- .do_goseq(universal=universal, selected_gene=selected,  p_value=0.3,
                         return.DEInCat=FALSE, dds=dds_S6)
knitr::kable(enriched_go[1:10, ], caption="Top 10 enriched GO terms determined by down-regulated translation efficiency induced by DUX4." )
Table 6.1: Top 10 enriched GO terms determined by down-regulated translation efficiency induced by DUX4.
category over_represented_pvalue numDEInCat numInCat term ontology padj
GO:0019219 0.0000025 60 2702 regulation of nucleobase-containing compound metabolic process BP 0.0232240
GO:0051252 0.0000033 56 2466 regulation of RNA metabolic process BP 0.0232240
GO:0006355 0.0000562 49 2239 regulation of DNA-templated transcription BP 0.1278336
GO:1903506 0.0000578 49 2241 regulation of nucleic acid-templated transcription BP 0.1278336
GO:2001141 0.0000625 49 2248 regulation of RNA biosynthetic process BP 0.1278336
GO:0045605 0.0000644 3 6 negative regulation of epidermal cell differentiation BP 0.1278336
GO:0045683 0.0000644 3 6 negative regulation of epidermis development BP 0.1278336
GO:0006351 0.0000857 50 2339 DNA-templated transcription BP 0.1346224
GO:0097659 0.0000872 50 2340 nucleic acid-templated transcription BP 0.1346224
GO:0032774 0.0001018 50 2355 RNA biosynthetic process BP 0.1414823