Quick Start for gimap

For more background on gimap and the calculations done here, read here

Set Up

First let’s create a folder we will save files to.

output_dir <- "output_timepoints"
dir.create(output_dir, showWarnings = FALSE)
example_data <- get_example_data("count")
Setting up data

We’re going to set up three datasets that we will provide to the set_up() function to create a gimap dataset object.

  • counts - the counts generated from pgPEN
  • pg_ids - the IDs that correspond to the rows of the counts and specify the construct
  • sample_metadata - metadata that describes the columns of the counts including their timepoints
counts <- example_data %>%
  select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%

pg_id are just the unique IDs listed in the same order/sorted the same way as the count data.

pg_ids <- example_data %>%

Sample metadata is the information that describes the samples and is sorted the same order as the columns in the count data.

sample_metadata <- data.frame(
  col_names = c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC"),
  day = as.numeric(c("0", "5", "22", "22", "22")),
  rep = as.factor(c("RepA", "RepA", "RepA", "RepB", "RepC"))

We’ll need to provide example_counts, pg_ids and sample_metadata to setup_data().

gimap_dataset <- setup_data(
  counts = counts,
  pg_ids = pg_ids,
  sample_metadata = sample_metadata

It’s ideal to run quality checks first. The run_qc() function will create a report we can look at to assess this.

       output_file = "example_qc_report.Rmd",
       overwrite = TRUE,
       quiet = TRUE)

You can take a look at an example QC report here.

gimap_dataset <- gimap_dataset %>%
  gimap_filter() %>%
  gimap_annotate(cell_line = "HELA") %>%
    timepoints = "day"
  ) %>%
Example output

Genetic interaction is calculated by:

  • rep - indicates which sample from the original the data is from. Note the pretreatment is used for calculation and its statistics are not reported.
  • pgRNA_target - what gene(s) were targeted by this the original pgRNAs for these data
  • mean_expected_cs - the average expected genetic interaction score
  • mean_gi_score - the average observer genetic interaction score
  • target_type - describes whether the CRISPR design is targeting two genes (“gene_gene”), or a gene and a non targeting control (“gene_ctrl”) or a targeting control and a gene (“ctrl_gene”).
  • p_val - p values from the testing whether a double knockout construct is significantly different in its genetic interaction score from single targets.
  • fdr - False discovery rate corrected p values
gimap_dataset$gi_scores %>%
  dplyr::arrange(fdr) %>% 
  head() %>% 
  knitr::kable(format = "html") 
rep pgRNA_target mean_expected_cs mean_observed_cs mean_gi_score target_type p_val fdr
Day22_RepA_late NDEL1_NDE1 -1.8731292 -2.673450 -1.292412 gene_gene 0.0e+00 0.0000270
Day22_RepA_late PFN2_PFN1 -1.5811362 -2.360606 -1.172104 gene_gene 1.0e-07 0.0000598
Day22_RepA_late CNOT8_CNOT7 -0.2525986 -2.298083 -1.985598 gene_gene 9.0e-07 0.0003051
Day22_RepC_late CNOT8_CNOT7 -0.6030327 -2.222123 -1.684182 gene_gene 4.0e-07 0.0004501
Day22_RepC_late SHMT2_SHMT1 -1.0214662 -1.737076 -0.927188 gene_gene 9.0e-07 0.0004535
Day22_RepC_late AKIRIN1_AKIRIN2 -1.7072333 -2.379270 -1.123690 gene_gene 5.9e-06 0.0006682

Plot the results

You can remove any samples from these plots by altering the reps_to_drop argument.

plot_exp_v_obs_scatter(gimap_dataset, reps_to_drop = "Day05_RepA_early")

plot_rank_scatter(gimap_dataset, reps_to_drop = "Day05_RepA_early")

plot_volcano(gimap_dataset, reps_to_drop = "Day05_RepA_early", facet_rep = FALSE)

Here’s how you can save plots like the above.

Plot specific target pair

We can pick out a specific pair to plot.

# "NDEL1_NDE1" is top result so let's plot that
plot_targets_bar(gimap_dataset, target1 = "NDEL1", target2 = "NDE1")

Saving data to a file

We can save all these data as an RDS or the genetic interaction scores themselves to a tsv file.

saveRDS(gimap_dataset, "gimap_dataset_final.RDS")
readr::write_tsv(gimap_dataset$gi_scores, "gi_scores.tsv")

Session Info

This is just for provenance purposes.

