1 About
This book provides detailed descriptions and R code intended to facilitate transparency and reproducibility in our ribosome footprint profiling and related analysis for our publication: DUX4 orchestrates translational reprograming by broadly suppressing translation.
1.1 Samples and treatments
A total of 12 samples were prepared for RNA-seq and Ribo-seq, seperately, and each comprising four distinct treatments: untreated, DUX4-pulse, IFN-gamma, and DUX4-pulse+INF-gamma, with triplicates for each treatment. The processed data and profilings reside at the repository’s data directory.
1.2 Softwear requirement
- R (>=4.0.3): the tidyverse project, knitr, bookdown, rmarkdown
- Bioconductor: DESeq2, goseq, GenomicAlignment, GenomicFeature, ribosomeProfilingQC, and etc.
1.3 data
The data directory in this repository contains the processed data sets used for our analyses:
ribosome footprints
- dds_cds_by_gene.rda: a DESeqDataSet instance containing p-site profiling on CDS regions and metadata, including size factor and treatments
- rse_[5UTR_1stExon|3UTR|TSS|1st_exon]_by_tx.rda: RangedSummarisedExperiment instances containing the metadata and transcript-based p-site profiling on different genomic features, including 5’UTR+1st exons, 13 nt up/downstream from transcription sites, 1st exons, and 3’ UTR regions. Note that the size factors are inherited from CDS-based profiling instance
mRNA profiling:
: a list of RangedSummarixedExperiment instances containing mRNA counts (mRNA) for gene-based CDS along with thesizeFactors
and metadata.
: lists of RangedSummarisedExperiment instances containing transcript-based mRNA counts (mRNA) for different genomic features, including 5’UTR+1st exons, 13 nt up/downstream from transcription sites, 1st exons, and 3’ UTR regions. Note that through out the analyses for this project, thesizeFactors
of these transcript-based mRNA counts are inherited from the CDS-based profiling instancerse_cds_mRNA
: DUX4 induced genes -
: IFN_gamma induced genes
1.4 Annotation
We collected the annotation from Gencode version 35 and made a Bioconductor-based TxDb package.
1.4.1 Make TxDb annotation package for gencode v35
The code chunk below demonstrates how to create a customized TxDB package form GTF file, specifically called hg38.HomoSapiens.Gencode.v35
, tailored to our bioinformatics analysis.
The steps include:
- Transform the downloaded Gencode v35 GTF file into a
instance - Convert the
instance into a TxDB package - (Optional) Include the gene annotation (gene_name, gene_type, gene_ID, and gene_type) as a
instance in thedata
folder in the package. This step not necessary if building an Ensembl DB package (EnsDb
## Define the destination and package name of your TxDB package
pkg_name <- "hg38.HomoSapiens.Gencode.v35"
dest_dir <- "/fh/fast/tapscott_s/CompBio/hg38"
## Where is my GTF file
gtf_file <- "/fh/fast/tapscott_s/CompBio/genome_reference/GRCh38/Annotation/gencode.v35.annotation.gtf"
## Import the GTF file into a GRange instance
gencode <- rtracklayer::import.gff(gtf_file)
## Define metadata: version, source, and etc.
organism <- "human"
release <- "v35"
dataSource <- paste0("ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_",
organism, "/", release)
metadata <- data.frame(
name=c("Organism", "Resource URL", "Resource GTF file",
"Taxonomy ID", "miRBase build ID", "Data source"),
value=c("Homo sapiens", dataSource, gtf_file, NA, NA, dataSource))
## Prepare the metadata
metadata <- GenomicFeatures:::.prepareGFFMetadata(gtf_file, dataSource,
organism="Homo sapiens")
## Combine the GRange instance and metadata into a TxDB instance
txdb <- GenomicFeatures:::makeTxDbFromGRanges(gr=gencode,
## Build a TxDb package
makeTxDbPackage(txdb, version="4.2.2", author="Chao-Jen Wong",
pkgname=pkg_name, destDir=dest_dir, license="Artistic-2.0",
provider="Gencode", providerVersion=release,
maintainer="Chao-Jen Wong <cwon2@fredhutch.org>")
1.4.2 Build EnsDb package using AnnotationHub
In retrospect, I would use AnnotationHub()
and GenomicFeatures::makeEnsembldbPackage()
to make an EnsDB
package instead of TxDB
because EnsDB
has slots/functions to retrieve the gene information. Below is an example:
#' EnsDb.Hsapiens.v92:
ah <- AnnotationHub()
query(ah, c("hsapiens"))
edb <- ah[["AH60977"]]
seqlevelsStyle(edb) <- "NCBI"
makeEnsembldbPackage(ensdb=dbfile(dbconn(edb)), version="1.0.0",
maintainer="Chao-Jen Wong <cwon2@fredhutch.org>",
author="Chao-Jen Wong",
1.5 Additional scripts
The scripts directory contains the R code and shell scripts performing preprocessing and bioinformatics analysis for the manuscripts.