1 About

This book provides detailed descriptions and R code intended to facilitate transparency and reproducibility in our ribosome footprint profiling and related analysis for our publication: DUX4 orchestrates translational reprograming by broadly suppressing translation.

1.1 Samples and treatments

A total of 12 samples were prepared for RNA-seq and Ribo-seq, seperately, and each comprising four distinct treatments: untreated, DUX4-pulse, IFN-gamma, and DUX4-pulse+INF-gamma, with triplicates for each treatment. The processed data and profilings reside at the repository’s data directory.

1.2 Softwear requirement

  • R (>=4.0.3): the tidyverse project, knitr, bookdown, rmarkdown
  • Bioconductor: DESeq2, goseq, GenomicAlignment, GenomicFeature, ribosomeProfilingQC, and etc.

1.3 data

The data directory in this repository contains the processed data sets used for our analyses:

ribosome footprints

  • dds_cds_by_gene.rda: a DESeqDataSet instance containing p-site profiling on CDS regions and metadata, including size factor and treatments
  • rse_[5UTR_1stExon|3UTR|TSS|1st_exon]_by_tx.rda: RangedSummarisedExperiment instances containing the metadata and transcript-based p-site profiling on different genomic features, including 5’UTR+1st exons, 13 nt up/downstream from transcription sites, 1st exons, and 3’ UTR regions. Note that the size factors are inherited from CDS-based profiling instance dds_cds_by_data

mRNA profiling:

  • rse_cds_mRNA.rda: a list of RangedSummarixedExperiment instances containing mRNA counts (mRNA) for gene-based CDS along with the sizeFactors and metadata.
  • rse_[5UTR_1stExon|3UTR|TSS|1st_exon]_by_tx_mRNA.rda: lists of RangedSummarisedExperiment instances containing transcript-based mRNA counts (mRNA) for different genomic features, including 5’UTR+1st exons, 13 nt up/downstream from transcription sites, 1st exons, and 3’ UTR regions. Note that through out the analyses for this project, the sizeFactors of these transcript-based mRNA counts are inherited from the CDS-based profiling instance rse_cds_mRNA

MISC:

  • DUX4_induced_v2.rda: DUX4 induced genes
  • IFNg_induced_v2.rda: IFN_gamma induced genes

1.4 Annotation

We collected the annotation from Gencode version 35 and made a Bioconductor-based TxDb package.

1.4.1 Make TxDb annotation package for gencode v35

The code chunk below demonstrates how to create a customized TxDB package form GTF file, specifically called hg38.HomoSapiens.Gencode.v35, tailored to our bioinformatics analysis.

The steps include:

  1. Transform the downloaded Gencode v35 GTF file into a GRange instance
  2. Convert the GRange instance into a TxDB package
  3. (Optional) Include the gene annotation (gene_name, gene_type, gene_ID, and gene_type) as a DataFrame instance in the data folder in the package. This step not necessary if building an Ensembl DB package (EnsDb)
library(rtracklayer)
library(GenomicFeatures)

## Define the destination and package name of your TxDB package
pkg_name <- "hg38.HomoSapiens.Gencode.v35"
dest_dir <- "/fh/fast/tapscott_s/CompBio/hg38"

## Where is my GTF file
gtf_file <- "/fh/fast/tapscott_s/CompBio/genome_reference/GRCh38/Annotation/gencode.v35.annotation.gtf"

## Import the GTF file into a GRange instance
gencode <- rtracklayer::import.gff(gtf_file)

## Define metadata: version, source, and etc.
organism <- "human"
release <- "v35"
dataSource <- paste0("ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_",
                     organism, "/", release)
metadata <- data.frame(
  name=c("Organism", "Resource URL", "Resource GTF file",
         "Taxonomy ID", "miRBase build ID", "Data source"),
  value=c("Homo sapiens", dataSource, gtf_file, NA, NA, dataSource))

## Prepare the metadata
metadata <- GenomicFeatures:::.prepareGFFMetadata(gtf_file, dataSource,
                                                  organism="Homo sapiens")

## Combine the GRange instance and metadata into a TxDB instance
txdb <- GenomicFeatures:::makeTxDbFromGRanges(gr=gencode,
                                              metadata=metadata)

## Build a TxDb package
makeTxDbPackage(txdb, version="4.2.2", author="Chao-Jen Wong",
                pkgname=pkg_name, destDir=dest_dir, license="Artistic-2.0",
                provider="Gencode", providerVersion=release,
                maintainer="Chao-Jen Wong <cwon2@fredhutch.org>")

1.4.2 Build EnsDb package using AnnotationHub

In retrospect, I would use AnnotationHub() and GenomicFeatures::makeEnsembldbPackage() to make an EnsDB package instead of TxDB because EnsDB has slots/functions to retrieve the gene information. Below is an example:

#'
#' EnsDb.Hsapiens.v92: 
#'
library(AnnotationHub)
library(GenomicFeatures)
ah <- AnnotationHub()
query(ah, c("hsapiens"))
edb <- ah[["AH60977"]]
seqlevelsStyle(edb) <- "NCBI"
makeEnsembldbPackage(ensdb=dbfile(dbconn(edb)), version="1.0.0",
                     maintainer="Chao-Jen Wong <cwon2@fredhutch.org>",
                     author="Chao-Jen Wong",
                     destDir="/fh/fast//tapscott_s/CompBio/hg38")

1.5 Additional scripts

The scripts directory contains the R code and shell scripts performing preprocessing and bioinformatics analysis for the manuscripts.