Chapter 1 Introduction

This gitbook includes narratives of data analysis and reproducible R codes supporting the manuscript “Logitidinal measures of RNA-seq expression and disease activity in FSHD muscle”, a follow-up study of MRI-informed muscle biopsies correlated MRI with pathology and DUX4 target gene expression in FSHD.

This book has nine parts:

  1. Introduction
  2. Pre-processing and gene counts
  3. The one-year follow-up visit
  4. Changes over two visits
  5. Classes of RNA-seq biopsy sampels
  6. Markers discriminate mildly affected FSHDs from the controls
  7. PAX7 score and disease avtivity
  8. Immune cell infiltration
  9. Biopsy quality control

The analysis is performed by using R (>= 3.5.1) and Bioconductor (>=3.7) packages. Some unessential codes are tedious and long, so I make them invisible from the book but they are all included in the original Rmd files that can be found here in our github master branch.

1.1 Package structure on the master branch

The master branch includes the preprocessed datasets (R/Bioconductor format), original scripts, supplemental tables, and Rmd files that make this gitbook.

-data (*.rda)
  |- sanitized.dds: a DESeqDataSet instance of RNA-seq gene expression matrix
  |- cluster_df: a data.frame instance of k-means clustering results 
  |- FSHD_markers: a DataFrame instance of 53 DUX4-targeted FSHD robust biomarkers  
  |- marker_path_scores: sample marker scores (DUX4/inflamm/extracellular matrix/cell cycle) 
                         based on gene expression alog with MRI characteristics and 
                         histology scores
  |- mri_pathology: a data.frame instance of MRI characteristics and histopathology scores
  |- pax7_targets: a data.frame instance of PAX7-targeted gene expression matrix
  |- year2.dds: a DESeqDataSet instance of the follow-up visit gene expression 
                matrix
  
- inst
  |- gitbook: Rmd files that make this gitbook
  |- extdata: supplemental tables
  |- scripts: shell and original R scripts

Note that sanitized.rlg instance is used throughout this book but not included in the data directory. The user can generate this rlog transformation object from sanitized.dds using DESeq2::rlog():

library(DESeq2)
sanitized.rlg <- rlog(sanitized.dds, blind=TRUE)

1.2 Gene Expression Omnibus series

The GEO RNA-seq data series are
- initial visit: GSE115650
- one-year follow-up visit: GSE140261

1.3 System requirment

  • R 3.5.1 or above: ggplot2, pheatmap, and etc.
  • Bioconductor (v3.7 or above) packages: DESeq2, GenomicAlignments, goseq, and etc.