Preface

The purpose of this book is to support the transparency and reproducibilty of the statistical analysis for our manuscript – Elevated plasma complement components in facioscapulohumeral dystrophy. This book contains details of our analysis approaches and automatically executable R codes; the figures and tables on the manuscript were rendered on the fly by running this book’s source Rmd files. This book is built by R markdown, knitr (Xie 2015), and bookdown (Xie 2020).

For scientific description, we recommend readers to consult the publication (link here).

0.1 System requirement and software

R \(\geq\) 4.0.3
The tidyverse project packages
BiocStyle, Rmarkdown, bookdown, knitr, and tinytex packages

0.2 Samples and datasets

In this repository, we prepared four datasets for first, second, discovery, and combined cohorts. They are all R data.frame instances, with rows presenting samples and columns the complement components.

In our github repos data folder, we have

Table 0.1: Description of datasets in the data folder
dataset	description
table_1.rda	first cohort
table_2.rda	second cohort
table_3.rda	discovery cohort including selected samples from first and second cohorts
table_4_update.rda	comprehensive set of first and second cohorts and re-run samples

0.3 Methods

We applied the following methods the first, second, and discovery cohorts.

\(t\)-test. for first and second cohort, we compared FSHDs with controls. Then we examed whether the first and second cohorts give consistent or different signatures.
Normalization. For each complement (column), wevscaled the protein levels by z-score.
Composite score. The per-sample composite z-score is the sum of the z-score of all the complement components.
Partial composite score. For each sample, partial composite score is the sum of selected complement components.
Discovery cohort. The discovery cohort consists of samples from first and second cohorts. We performed pricipal component analysis to classify different groups of FSHDs and identify the components that are better discriminate FSHDs from controls.

0.3.1 Normalization

z-score: Let \(x_{i, j}\) be the values of the matrix \(X\), where \(i \in \{1, 2, \dots, I\}\) denotes the sample, \(j \in \{1, 2, \dots, J\}\) the complement component. For each complement component, \(i\), \(\overline{X}_j\) denotes the sample mean and \(SD_j\) the standard deviation. For each complement \(i\) and sample \(j\), the normalized score is given by \(z_{i, j} = \frac{x_{i, j} - \overline{X}_j}{SD_j}\). The standardized z-score gives the number of standard deviation by which the value is above or below the observed sample means.

References

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.

Xie, Yihui. 2020. Bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.