6 Translation efficiency changes in CDS

The ribosome footprint (ribo-seq) and mRNA (RNA-seq) profiling allowed to determine changes in translation efficiency between two treatment groups. We employed a contrast model that calculated the logarithmic differences of the ratios between two groups in Ribo-seq and RNA-seq, with the null hypothesis defined as:

\[\begin{equation} H_0: \Bigg| log\left(\frac{treatment 2}{treatment 1}\right)_{ribo-seq} - log\left(\frac{treatment 2}{treatment 1}\right)_{RNA-seq} \Bigg| = 0, \tag{6.1} \end{equation}\] where the treatment 2 and 1 groups could be DUX-pulse+\(IFN_\gamma\) and \(IFN_\gamma\), respectively, or DUX-pulse and untreated.

In this chapter, we examined the effects of DUX4 on translation efficiency in CDS, specifically comparing DUX-pulse+\(IFN_\gamma\) to \(IFN_\gamma\). Here, we illustrate the steps and code used to discover any changes in translation efficiency.

6.1 Steps

To discover changes in translation efficiency induced by DUX4, we followed these steps:

We defined genomic features that include gene-based CDS, as well as transcript-based features such as 5’ UTR, 13 nts up/downstream from translation start sites, first coding exons, and 3’ UTR. We then counted p-sites within these genomic features, as described in Chapter 4.
Conducted non-specific filtering by excluding 92 histone1 (H1) and histone 2 (H2) variants, as well as DUX4- and \(IFN_\gamma\)-induced genes (as described in Chapter 5). We applied loose base-mean filtering to both RNA-seq and ribo-seq data.
Pre-estimated size factor separately for RNA-seq and ribo-seq libraries using DESeq2 and p-site counts (ribo-seq) and gene counts (RNA-seq), irrespective of the genomic features.
Applied loose base-mean non-specific filtering to avoid very low counts (noise)
Performed hypothesis testing by applying the contrast model described above to DESeq2, using adjusted p-values < 0.05 and |logFC| > 1 as the thresholds for significant translation efficiency changes.
Conducted GO term analysis for down-regulated translation changes.

6.2 Flowchart

knitr::include_graphics("./images/flowchart-translation-changes.png")

Figure 3.5: Flowchart to examine the translational efficiency changes induced by DUX4.

6.3 Load libraries and define parameters

Note that the main directory is named pkg_dir, and we have set it to our local path. The code chunks in this chapter are extracted from here.

library(DESeq2)
library(readxl)
library(writexl)
library(tidyverse)
library(corrr)
library(plyranges)
library(wesanderson)
library(goseq)
library(org.Hs.eg.db)
library(hg38.HomoSapiens.Gencode.v35)
data(gene.anno)
txdb <- hg38.HomoSapiens.Gencode.v35

library(BiocParallel)
bp_param=MulticoreParam(workers = 4L)
register(bp_param, default=TRUE)

# define parameters
pkg_dir <- "~/CompBio/DUX4-IFNg-ribosome-footprints"
stats_dir <- file.path(pkg_dir, "stats", "translation_changes", "CDS")
source(file.path(pkg_dir, "scripts", "06A-tools_TE.R"))
source(file.path(pkg_dir, "scripts", "tools.R"))
load(file.path(pkg_dir, "data", "dds_cds_by_gene.rda"))

my_color <- wesanderson::wes_palette("Darjeeling1", n=5)[2:5]
names(my_color) <- levels(dds_cds_by_gene$treatment)

# define comparison groups
list_comp <- list(S1 = c("untreated", "DOX_pulse"),
                  S2 = c("untreated", "IFNg"),
                  S3 = c("untreated", "DOX_pulse_IFNg"),
                  S4 = c("IFNg", "DOX_pulse"),
                  S5 = c("DOX_pulse", "DOX_pulse_IFNg"),
                  S6 = c("IFNg", "DOX_pulse_IFNg" ))

Next, we loaded the data sets of ribosome footprints and mRNA profiling as we as the genes to be remove from the analysis of translation efficiency changes.

#
# load rse by genes
#
load(file.path(pkg_dir, "data", "rse_cds_by_gene.rda"))
load(file.path(pkg_dir, "data", "rse_cds_mRNA.rda")) # cds by genes

#
# load DUX4_altered and INFg_altered
#
load(file.path(pkg_dir, "data", "DUX4_induced_v2.rda")) # LFC > 1
load(file.path(pkg_dir, "data", "IFNg_induced_v2.rda")) # LFC > 1

#
# identify H1 and H2 variants and PSM subunits
#
histone_variants <- .histone_variants(gene.anno)
exclude_gene <- c(histone_variants$gene_id, DUX4_induced_v2$ensembl)
psm_name <- readxl::read_xlsx(file.path(pkg_dir, "extdata", "Gene Lists_Proteasome_MHC-I.xlsx"),
                              sheet=1, range="A4:B49") 
PSM <- as.data.frame(gene.anno) %>% dplyr::select(gene_name, gene_id) %>%
  dplyr::filter(gene_name %in% psm_name$Symbol)

6.4 DESeq2: DUX4+\(IFN_\gamma\) vs. \(IFN_\gamma\) in CDS

The code chunks below perform DESeq2’s differential analysis using a designated contrast model (as mentioned above) to discover translation efficiency changes in DUX4+\(IFN_\gamma\) relative \(IFN_\gamma\) in CDS.

# S6: use .comb_RNA_Ribo to combine mRNA and ribosome profilings and 
# define design matrix to be used by DESeq2
dds_S6 <- .comb_RNA_Ribo(rse_rna = rse_cds_mRNA, 
                         rse_ribo = rse_cds_by_gene, 
                         treatments=list_comp[["S6"]],
                         rna_mean_filter=10, ribo_mean_filter=5)
dds_S6 <- dds_S6[!rownames(dds_S6) %in% exclude_gene]        
dds_S6 <- DESeq(dds_S6)
#resultsNames(dds_S6)
S6_ribo_over_rna <- results(dds_S6, 
                            name="protocolRibo.treatmentDOX_pulse_IFNg", 
                            alpha=0.05) # translation efficiency changes
S6_rna <- results(dds_S6, 
                  name="treatment_DOX_pulse_IFNg_vs_IFNg", alpha=0.05)  
S6_ribo <- results(dds_S6, 
                   contrast=list(c("treatment_DOX_pulse_IFNg_vs_IFNg",
                                   "protocolRibo.treatmentDOX_pulse_IFNg")),
                   alpha=0.05)

Next we tidied the result and made a scatter plot for the log fold changes between DUX-pulse and \(IFN_\gamma\) in ribo-seq and in RNA-seq.

# tidy the results and exclude DUX4-induced genes
tidy_S6_v2 <- .tidy_res_ribo_over_rna(dds=dds_S6, 
                                      inter_res=S6_ribo_over_rna, 
                                      ribo_res=S6_ribo, rna_res=S6_rna,
                                      alpha=0.05, lfc_threshold=1) %>%
             dplyr::filter(!ensembl %in% DUX4_induced_v2$ensembl) %>%
             dplyr::mutate(IFNg_induced_v2 = ensembl %in% IFNg_induced_v2$ensembl)

# scatter plot between log2FC in ribo-seq and in RNA-seq
res <- tidy_S6_v2
msg_up   <- sprintf("log2(Ribo / RNA) > 1: %4.0f", sum(res$status == "up"))
msg_down <- sprintf("log2(Ribo / RNA) < -1: %4.0f", sum(res$status == "down"))  
pearson <- cor(res$rna_lfc, res$ribo_lfc)             
label_x_pos <- 0 
label_y_pos <- -2.5

ggplot(res, aes(x=rna_lfc, y=ribo_lfc)) +
  geom_point(size=2, alpha=0.5, aes(color=status, shape=IFNg_induced_v2)) +
  theme_bw() +
  annotate("text", x=label_x_pos, y=label_y_pos, label=paste0("Pearson = ", format(pearson, digit=2)),
           hjust = 0, vjust=1) +
  annotate("text", x=label_x_pos, y=label_y_pos-0.5, label=msg_up, color="red", 
           hjust = 0, vjust=1) +
  annotate("text", x=label_x_pos, y=label_y_pos-1, label=msg_down, 
           color="blue", hjust=0, vjust=1) +
  scale_color_manual(values=c("gray75", "red", "blue")) + 
  scale_shape_manual(values=c(19, 1)) +
  theme(panel.grid.minor = element_blank(), 
        plot.title = element_text(hjust = 0.5), 
        legend.position="bottom", legend.box="vertical") +
  labs(x=bquote(~"RNA-seq:" ~log[2]~"(DOX_pulse_IFNg / IFNg)"), 
       y=bquote(~"Ribo-seq:" ~log[2]~"(DOX_pulse_IFNg / IFNg)"), 
       title="Ribo-seq vs. RNA-seq (exclude DUX4-induced)", shape="IFNg-induced")

Scatter plot of log fold changes between DUX-pulse and IFN in ribo-seq and in RNA-seq. Blue dots indicate down-regulated translation efficiency changes and red up-regulated.

Figure 6.1: Scatter plot of log fold changes between DUX-pulse and IFN in ribo-seq and in RNA-seq. Blue dots indicate down-regulated translation efficiency changes and red up-regulated.

6.5 GO term analysis

# take down-regulated translation efficiency and use goseq to find the
# affected GO terms. Display the top 10.
interested <- tidy_S6_v2 %>% dplyr::filter(status=="down")
universal <- tidy_S6_v2 %>% pull(ensembl) %>% str_replace("\\..*", "")
selected <- interested %>% pull(ensembl) %>% str_replace("\\..*", "")
enriched_go <- .do_goseq(universal=universal, selected_gene=selected,  p_value=0.3,
                         return.DEInCat=FALSE, dds=dds_S6)
knitr::kable(enriched_go[1:10, ], caption="Top 10 enriched GO terms determined by down-regulated translation efficiency induced by DUX4." )

Table 6.1: Top 10 enriched GO terms determined by down-regulated translation efficiency induced by DUX4.
category	over_represented_pvalue	numDEInCat	numInCat	term	ontology	padj
GO:0019219	0.0000025	60	2702	regulation of nucleobase-containing compound metabolic process	BP	0.0232240
GO:0051252	0.0000033	56	2466	regulation of RNA metabolic process	BP	0.0232240
GO:0006355	0.0000562	49	2239	regulation of DNA-templated transcription	BP	0.1278336
GO:1903506	0.0000578	49	2241	regulation of nucleic acid-templated transcription	BP	0.1278336
GO:2001141	0.0000625	49	2248	regulation of RNA biosynthetic process	BP	0.1278336
GO:0045605	0.0000644	3	6	negative regulation of epidermal cell differentiation	BP	0.1278336
GO:0045683	0.0000644	3	6	negative regulation of epidermis development	BP	0.1278336
GO:0006351	0.0000857	50	2339	DNA-templated transcription	BP	0.1346224
GO:0097659	0.0000872	50	2340	nucleic acid-templated transcription	BP	0.1346224
GO:0032774	0.0001018	50	2355	RNA biosynthetic process	BP	0.1414823

5 mRNA profiling and DUX4/IFNg induced genes

7 5’ UTR Free Energy Evaluation