Longitudinal measures of RNA-seq expression and disease activity in FSHD muscle biopsies
2020-02-19
Chapter 1 Introduction
This gitbook includes narratives of data analysis and reproducible R codes supporting the manuscript “Logitidinal measures of RNA-seq expression and disease activity in FSHD muscle”, a follow-up study of MRI-informed muscle biopsies correlated MRI with pathology and DUX4 target gene expression in FSHD.
This book has nine parts:
- Introduction
- Pre-processing and gene counts
- The one-year follow-up visit
- Changes over two visits
- Classes of RNA-seq biopsy sampels
- Markers discriminate mildly affected FSHDs from the controls
- PAX7 score and disease avtivity
- Immune cell infiltration
- Biopsy quality control
The analysis is performed by using R (>= 3.5.1) and Bioconductor (>=3.7) packages. Some unessential codes are tedious and long, so I make them invisible from the book but they are all included in the original Rmd files that can be found here in our github master branch.
1.1 Package structure on the master branch
The master branch includes the preprocessed datasets (R/Bioconductor format), original scripts, supplemental tables, and Rmd files that make this gitbook.
-data (*.rda)
|- sanitized.dds: a DESeqDataSet instance of RNA-seq gene expression matrix
|- cluster_df: a data.frame instance of k-means clustering results
|- FSHD_markers: a DataFrame instance of 53 DUX4-targeted FSHD robust biomarkers
|- marker_path_scores: sample marker scores (DUX4/inflamm/extracellular matrix/cell cycle)
based on gene expression alog with MRI characteristics and
histology scores
|- mri_pathology: a data.frame instance of MRI characteristics and histopathology scores
|- pax7_targets: a data.frame instance of PAX7-targeted gene expression matrix
|- year2.dds: a DESeqDataSet instance of the follow-up visit gene expression
matrix
- inst
|- gitbook: Rmd files that make this gitbook
|- extdata: supplemental tables
|- scripts: shell and original R scripts
Note that sanitized.rlg
instance is used throughout this book but not included in the data directory. The user can generate this rlog transformation object from sanitized.dds
using DESeq2::rlog()
:
library(DESeq2)
sanitized.rlg <- rlog(sanitized.dds, blind=TRUE)
1.2 Gene Expression Omnibus series
The GEO RNA-seq data series are
- initial visit: GSE115650
- one-year follow-up visit: GSE140261
1.3 System requirment
- R 3.5.1 or above: ggplot2, pheatmap, and etc.
- Bioconductor (v3.7 or above) packages: DESeq2, GenomicAlignments, goseq, and etc.