Create results table that has CRISPR scores, Wilcoxon rank-sum test and t tests. The output of the `gimap` package is genetic interaction scores which _is the distance between the observed CRISPR score and the expected CRISPR score._ The expected CRISPR scores are what we expect for the CRISPR values should two genes be unrelated to each other. The further away an observed CRISPR scoreis from its expected the more we suspect genetic interaction. This can be true in a positive way (a CRISPR knockout pair caused more cell proliferation than expected) or in a negative way (a CRISPR knockout pair caused more cell lethality than expected).
The genetic interaction scores are based on a linear model calculated for each sample where `observed_crispr_single` is the outcome variable and `expected_crispr_single` is the predictor variable. For each sample: lm(observed_crispr_single ~ expected_crispr_single)
Using `y = mx+b`, we can fill in the following values: * `y` = observed CRISPR score * `x` = expected CRISPR score * `m` = slope from linear model for this sample * `b` = intercept from linear model for this sample
The intercept and slope from this linear model are used to adjust the CRISPR scores for each sample: single target gi score = observed single crispr - (intercept + slope * expected single crispr) double_target_gi_score = double crispr score - (intercept + slope * expected double crispr) These single and double target genetic interaction scores are calculated at the construct level and are then summarized using a t-test to see if the the distribution of the set of double targeting constructs is significantly different than the overall distribution single targeting constructs. After multiple testing correction, FDR values are reported. Low FDR value for a double construct means high suspicion of paralogs.
calc_gi(.data = NULL, gimap_dataset, use_lfc = FALSE, stats_by_rep = FALSE)
Data can be piped in with tidyverse pipes from function to function. But the data must still be a gimap_dataset
A special dataset structure that is setup using the `setup_data()` function.
Should Log fold change be used to calculate GI scores instead of CRISPR scores? If you do not have negative controls or CRISPR scores you will need to set this to TRUE.
Should statistics be calculated per rep or should replicates be collapsed?
A gimap dataset with statistics and genetic interaction scores calculated. Overall results in the returned object can be obtained using gimap_dataset$overall_results Whereas target level genetic interaction scores can be retrieved using `gimap_dataset$gi_scores`.
# \donttest{
gimap_dataset <- get_example_data("gimap") %>%
gimap_filter() %>%
gimap_annotate(cell_line = "HELA") %>%
gimap_normalize(
timepoints = "day",
missing_ids_file = tempfile()
) %>%
calc_gi()
#> Annotating Data
#>
Downloading: 3.4 kB
Downloading: 3.4 kB
Downloading: 3.4 kB
Downloading: 3.4 kB
#> Downloading: Achilles_common_essentials.csv
#>
|
| | 0%
|
|======================================================================| 100%
#>
|
| | 0%
|
|=============================================== | 67%
|
|======================================================================| 100%
#> Normalizing Log Fold Change
#> Calculating Genetic Interaction scores
saveRDS(gimap_dataset, file.path(tempdir(), "gimap_dataset_final.RDS"))
# }