Chapter 9 Biopsy Quality Control
Blood, fat, muscle markers and scored by \(\log_{10}(TPM + 1)\).
9.1 Blood, fat and muscle cell markers
markers <- tibble(cell_type=c(rep("blood", 3), rep("fat", 3), rep("muscle", 3)),
gene_name=c("HBA1", "HBA2", "HBB",
"FASN", "LEP", "SCD",
"ACTA1", "TNNT3", "MYH1")) %>%
dplyr::mutate(gencode_id = get_ensembl(gene_name, rse=sanitized.dds)) %>%
dplyr::mutate(cell_type = factor(cell_type))
## HBA1 HBA2 HBB FASN LEP SCD ACTA1 TNNT3 MYH1
knitr::kable(markers)
cell_type | gene_name | gencode_id |
---|---|---|
blood | HBA1 | ENSG00000206172.8 |
blood | HBA2 | ENSG00000188536.12 |
blood | HBB | ENSG00000244734.3 |
fat | FASN | ENSG00000169710.7 |
fat | LEP | ENSG00000174697.4 |
fat | SCD | ENSG00000099194.5 |
muscle | ACTA1 | ENSG00000143632.14 |
muscle | TNNT3 | ENSG00000130595.16 |
muscle | MYH1 | ENSG00000109061.9 |
9.2 Define score
For each cell type, the marker score for each sample is \(\frac{1}{n}\Sigma_{i=1}^{n}\log_{10}(TPM_i + 1)\), where \(i\) denotes genes and \(n\) is the number of genes of the cell type.
lv <- levels(markers$cell_type)
celltype_tpm <- sapply(lv, function(type) {
id <- markers %>% dplyr::filter(cell_type == type) %>%
pull(gencode_id)
sub <- sanitized.dds[id]
tpm_score <- colMeans(log10(assays(sub)[["TPM"]]+1))
})
celltype_tpm <- as.data.frame(celltype_tpm) %>%
rownames_to_column(var="sample_name") %>%
add_column(pheno_type = sanitized.dds$pheno_type,
classes = sanitized.dds$new_cluster_name,
classes_color = sanitized.dds$cluster_color)
9.3 Visualization
The blood, fat and muscle content scores are given to each sample. Boxplot below illustrates that the FSHD biopsies categorized in the Muscle-Low class are fatter and “blooder” and have less muscle content than the rest of the biopsies.
tmp <- celltype_tpm %>% gather(key=cell_type, value=score, -sample_name,
-pheno_type, -classes, -classes_color) %>%
dplyr::mutate(cell_type = factor(cell_type))
color_manual <- celltype_tpm %>% group_by(classes) %>%
summarise(color=unique(classes_color))
ggplot(tmp, aes(x=classes, y=score, color=classes)) +
geom_boxplot() +
facet_wrap(~ cell_type, scale="free", nrow=2) +
theme_minimal() +
theme(axis.text.x=element_blank(), legend.justification=c(1,0), legend.position=c(1,0)) +
scale_color_manual(values=color_manual$color) +
labs(y="Avg. log10(TPM +1)", x="FSHD classes",
title="Blood, fat, muscle content score by TPM")
9.4 Determine “outliers”
use permutation test? calculate the p-value by constructing mean distribution?
9.5 Determin “bad” biopsies?
9.6 Tables
tb <- dplyr::select(celltype_tpm, -classes_color) %>%
arrange(classes)
knitr::kable(tb, caption="Blood, fat and muscle content index.")
sample_name | blood | fat | muscle | pheno_type | classes |
---|---|---|---|---|---|
01-0041 | 2.585593 | 0.1184236 | 3.862732 | Control | Control |
01-0042 | 2.604758 | 0.3822024 | 3.404674 | Control | Control |
01-0043 | 1.515110 | 0.2021819 | 3.643485 | Control | Control |
01-0044 | 2.256298 | 0.2361598 | 3.655246 | Control | Control |
01-0045 | 2.485550 | 0.3499366 | 3.707235 | Control | Control |
01-0046 | 3.001476 | 0.2935164 | 3.427649 | Control | Control |
01-0048 | 2.154881 | 0.3454669 | 3.692780 | Control | Control |
01-0049 | 2.199308 | 0.3796754 | 3.562354 | Control | Control |
01-0023 | 1.668277 | 0.5237397 | 3.177025 | FSHD | Mild |
01-0032 | 3.282519 | 0.5813589 | 3.790697 | FSHD | Mild |
01-0033 | 2.471344 | 0.2054422 | 3.418840 | FSHD | Mild |
01-0034 | 2.912924 | 0.2439875 | 3.968929 | FSHD | Mild |
01-0036 | 2.292336 | 0.3440663 | 3.634696 | FSHD | Mild |
32-0011 | 3.213565 | 0.2619783 | 3.981536 | FSHD | Mild |
32-0012 | 3.013254 | 0.6171513 | 3.480103 | FSHD | Mild |
32-0014 | 4.299332 | 0.1537942 | 3.158579 | FSHD | Mild |
32-0015 | 2.891994 | 0.3012466 | 4.061947 | FSHD | Mild |
32-0018 | 3.617742 | 0.1898964 | 3.597990 | FSHD | Mild |
32-0019 | 2.280754 | 0.3573577 | 3.984106 | FSHD | Mild |
01-0023b | 2.273485 | 0.7225922 | 3.462251 | FSHD | Mild |
01-0026b | 3.076727 | 0.3619324 | 3.636448 | FSHD | Mild |
01-0036b | 2.757293 | 0.3583505 | 3.653238 | FSHD | Mild |
32-0013b | 2.824210 | 0.6457411 | 3.511864 | FSHD | Mild |
32-0015b | 2.741116 | 0.3704249 | 4.023872 | FSHD | Mild |
32-0017b | 3.553026 | 0.3916796 | 3.089687 | FSHD | Mild |
01-0021 | 2.883033 | 0.4052833 | 3.240677 | FSHD | Moderate |
01-0022-1 | 2.603183 | 1.5294705 | 3.530700 | FSHD | Moderate |
01-0024 | 3.011862 | 0.4650832 | 3.470005 | FSHD | Moderate |
01-0026 | 2.393720 | 0.4211012 | 3.681180 | FSHD | Moderate |
01-0030 | 3.123735 | 1.2808079 | 3.732009 | FSHD | Moderate |
01-0035 | 2.397119 | 0.9189982 | 3.448796 | FSHD | Moderate |
01-0038 | 2.478129 | 0.3810781 | 3.103982 | FSHD | Moderate |
32-0002 | 2.912887 | 0.5254690 | 3.199493 | FSHD | Moderate |
32-0007 | 2.481612 | 0.9001335 | 3.186218 | FSHD | Moderate |
32-0008 | 3.944364 | 1.7901614 | 2.653674 | FSHD | Moderate |
32-0010 | 3.251255 | 0.2792022 | 4.161168 | FSHD | Moderate |
32-0013 | 3.600465 | 0.9677029 | 3.692289 | FSHD | Moderate |
01-0022b | 3.330682 | 1.7365568 | 3.796104 | FSHD | Moderate |
01-0024b | 3.183899 | 0.2378188 | 3.327007 | FSHD | Moderate |
01-0028b | 2.163574 | 0.5596671 | 3.756815 | FSHD | Moderate |
01-0030b | 2.371968 | 0.6290769 | 3.702877 | FSHD | Moderate |
01-0033b | 1.301926 | 0.5558120 | 3.672376 | FSHD | Moderate |
01-0034b | 2.695132 | 0.5194755 | 4.035298 | FSHD | Moderate |
01-0035b | 2.638491 | 1.0832977 | 3.332846 | FSHD | Moderate |
32-0006b | 3.557061 | 1.0401034 | 3.582340 | FSHD | Moderate |
32-0009b | 2.429373 | 1.5374834 | 3.066060 | FSHD | Moderate |
32-0014b | 3.373019 | 0.8191356 | 3.226881 | FSHD | Moderate |
01-0027 | 2.861061 | 1.4552309 | 3.341357 | FSHD | IG-High |
32-0017 | 3.645956 | 1.6709002 | 3.011805 | FSHD | IG-High |
01-0025b | 3.760612 | 1.5639605 | 4.035969 | FSHD | IG-High |
01-0027b | 3.458966 | 1.8465812 | 3.437264 | FSHD | IG-High |
01-0029b | 4.002951 | 1.8184439 | 3.068390 | FSHD | IG-High |
32-0007b | 2.808018 | 1.7246814 | 3.398351 | FSHD | IG-High |
32-0018b | 3.137937 | 1.1417576 | 2.966068 | FSHD | IG-High |
01-0025 | 2.279839 | 0.6538198 | 3.906637 | FSHD | High |
01-0029 | 3.213923 | 1.1415584 | 3.201525 | FSHD | High |
32-0003 | 2.920859 | 1.1202031 | 3.604562 | FSHD | High |
32-0004 | 2.000568 | 1.2400798 | 3.354000 | FSHD | High |
32-0005 | 4.325426 | 1.2125240 | 3.661033 | FSHD | High |
32-0006 | 2.389504 | 1.1906286 | 3.322845 | FSHD | High |
32-0009 | 3.518911 | 2.3004965 | 2.868475 | FSHD | High |
32-0002b1 | 3.798370 | 0.6701894 | 3.311355 | FSHD | High |
32-0005b | 4.165111 | 0.8942373 | 4.057378 | FSHD | High |
01-0037 | 3.820057 | 2.5362662 | 1.146418 | FSHD | Muscle-Low |
32-0016 | 4.296978 | 2.5212618 | 1.444503 | FSHD | Muscle-Low |
01-0037b | 3.680353 | 1.3811564 | 3.556671 | FSHD | Muscle-Low |
32-0010b | 4.148983 | 2.2940258 | 1.548513 | FSHD | Muscle-Low |
32-0012b | 4.738981 | 2.6884333 | 1.356221 | FSHD | Muscle-Low |
32-0016b | 4.944297 | 0.8406186 | 1.415840 | FSHD | Muscle-Low |
#' blood, fat and muscle mean score for each class
tb2 <- tb %>%
group_by(classes) %>%
summarise(blood_mean = mean(blood),
fat_mean = mean(fat),
muscle_mean = mean(muscle))
knitr::kable(tb2)
classes | blood_mean | fat_mean | muscle_mean |
---|---|---|---|
Control | 2.350372 | 0.2884454 | 3.619519 |
Mild | 2.892347 | 0.3900435 | 3.625400 |
Moderate | 2.823931 | 0.8446781 | 3.481763 |
IG-High | 3.382214 | 1.6030794 | 3.322743 |
High | 3.179168 | 1.1581930 | 3.476423 |
Muscle-Low | 4.271608 | 2.0436270 | 1.744694 |