Elevated plasma complement components in facioscapulohumeral dystrophy
2021-11-26
Preface
The purpose of this book is to support the transparency and reproducibilty of the statistical analysis for our manuscript – Elevated plasma complement components in facioscapulohumeral dystrophy. This book contains details of our analysis approaches and automatically executable R codes; the figures and tables on the manuscript were rendered on the fly by running this book’s source Rmd files. This book is built by R markdown, knitr (Xie 2015), and bookdown (Xie 2020).
For scientific description, we recommend readers to consult the publication (link here).
0.1 System requirement and software
- R \(\geq\) 4.0.3
- The tidyverse project packages
- BiocStyle, Rmarkdown, bookdown, knitr, and tinytex packages
0.2 Samples and datasets
In this repository, we prepared four datasets for first, second, discovery, and combined cohorts. They are all R data.frame
instances, with rows presenting samples and columns the complement components.
In our github repos data folder, we have
dataset | description |
---|---|
table_1.rda | first cohort |
table_2.rda | second cohort |
table_3.rda | discovery cohort including selected samples from first and second cohorts |
table_4_update.rda | comprehensive set of first and second cohorts and re-run samples |
0.3 Methods
We applied the following methods the first, second, and discovery cohorts.
- \(t\)-test. for first and second cohort, we compared FSHDs with controls. Then we examed whether the first and second cohorts give consistent or different signatures.
- Normalization. For each complement (column), wevscaled the protein levels by z-score.
- Composite score. The per-sample composite z-score is the sum of the z-score of all the complement components.
- Partial composite score. For each sample, partial composite score is the sum of selected complement components.
- Discovery cohort. The discovery cohort consists of samples from first and second cohorts. We performed pricipal component analysis to classify different groups of FSHDs and identify the components that are better discriminate FSHDs from controls.
0.3.1 Normalization
z-score: Let \(x_{i, j}\) be the values of the matrix \(X\), where \(i \in \{1, 2, \dots, I\}\) denotes the sample, \(j \in \{1, 2, \dots, J\}\) the complement component. For each complement component, \(i\), \(\overline{X}_j\) denotes the sample mean and \(SD_j\) the standard deviation. For each complement \(i\) and sample \(j\), the normalized score is given by \(z_{i, j} = \frac{x_{i, j} - \overline{X}_j}{SD_j}\). The standardized z-score gives the number of standard deviation by which the value is above or below the observed sample means.
References
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.
Xie, Yihui. 2020. Bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.