Quick Start for gimap

gimap performs analysis of dual-targeting CRISPR screening data, with the goal of aiding the identification of genetic interactions (e.g. cooperativity, synthetic lethality) in models of disease and other biological contexts. gimap analyzes functional genomic data generated by the pgPEN (paired guide RNAs for genetic interaction mapping) approach, quantifying growth effects of single and paired gene knockouts upon application of a CRISPR library. A multitude of CRISPR screen types can be used for this analysis, with helpful descriptions found in this review (https://www.nature.com/articles/s43586-021-00093-4). Use of pgPEN and GI-mapping in a paired gRNA format can be found here (https://pubmed.ncbi.nlm.nih.gov/34469736/).

library(gimap)
#> Warning: replacing previous import 'dplyr::group_rows' by
#> 'kableExtra::group_rows' when loading 'gimap'
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Set Up

First let’s create a folder we will save files to.

output_dir <- "output_timepoints"
dir.create(output_dir, showWarnings = FALSE)
example_data <- get_example_data("count")
#> Rows: 33170 Columns: 8
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (3): id, seq_1, seq_2
#> dbl (5): Day00_RepA, Day05_RepA, Day22_RepA, Day22_RepB, Day22_RepC
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.

Setting up data

We’re going to set up three datasets that we will provide to the set_up() function to create a gimap dataset object.

  • counts - the counts generated from pgPEN
  • pg_ids - the IDs that correspond to the rows of the counts and specify the construct
  • sample_metadata - metadata that describes the columns of the counts including their timepoints
counts <- example_data %>%
  select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%
  as.matrix()

pg_id are just the unique IDs listed in the same order/sorted the same way as the count data.

pg_ids <- example_data %>%
  dplyr::select("id")

Sample metadata is the information that describes the samples and is sorted the same order as the columns in the count data.

sample_metadata <- data.frame(
  col_names = c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC"),
  day = as.numeric(c("0", "5", "22", "22", "22")),
  rep = as.factor(c("RepA", "RepA", "RepA", "RepB", "RepC"))
)

We’ll need to provide example_counts, pg_ids and sample_metadata to setup_data().

gimap_dataset <- setup_data(
  counts = counts,
  pg_ids = pg_ids,
  sample_metadata = sample_metadata
)

It’s ideal to run quality checks first. The run_qc() function will create a report we can look at to assess this.

run_qc(gimap_dataset,
       output_file = "example_qc_report.Rmd",
       overwrite = TRUE,
       quiet = TRUE)

You can take a look at an example QC report here.

gimap_dataset <- gimap_dataset %>%
  gimap_filter() %>%
  gimap_annotate(cell_line = "HELA") %>%
  gimap_normalize(
    timepoints = "day"
  ) %>%
  calc_crispr() %>%
  calc_gi()
#> Annotating Data
#> Rows: 1884 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): gene, gene_symbol, entrez_id
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Normalizing Log Fold Change
#> 
#> Calculating CRISPR score
#> 
#> Calculating Genetic Interaction scores

We can save all these data as an RDS.

saveRDS(gimap_dataset, "gimap_dataset_final.RDS")

Session Info

This is just for provenance purposes.

sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS 15.0.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.1.4 gimap_0.1.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyr_1.3.1        sass_0.4.9         utf8_1.2.4         generics_0.1.3    
#>  [5] xml2_1.3.6         stringi_1.8.4      hms_1.1.3          digest_0.6.37     
#>  [9] magrittr_2.0.3     RColorBrewer_1.1-3 evaluate_1.0.1     grid_4.4.0        
#> [13] timechange_0.3.0   fastmap_1.2.0      jsonlite_1.8.9     backports_1.5.0   
#> [17] purrr_1.0.2        fansi_1.0.6        viridisLite_0.4.2  scales_1.3.0      
#> [21] textshaping_0.4.0  jquerylib_0.1.4    cli_3.6.3          crayon_1.5.3      
#> [25] rlang_1.1.4        bit64_4.5.2        munsell_0.5.1      withr_3.0.1       
#> [29] cachem_1.1.0       yaml_2.3.10        parallel_4.4.0     tools_4.4.0       
#> [33] tzdb_0.4.0         colorspace_2.1-1   ggplot2_3.5.1      kableExtra_1.4.0  
#> [37] broom_1.0.7        curl_5.2.3         vctrs_0.6.5        R6_2.5.1          
#> [41] lifecycle_1.0.4    lubridate_1.9.3    snakecase_0.11.1   stringr_1.5.1     
#> [45] bit_4.5.0          fs_1.6.4           htmlwidgets_1.6.4  vroom_1.6.5       
#> [49] ragg_1.3.3         janitor_2.2.0      pkgconfig_2.0.3    desc_1.4.3        
#> [53] pkgdown_2.1.1      pillar_1.9.0       bslib_0.8.0.9000   gtable_0.3.6      
#> [57] glue_1.8.0         systemfonts_1.1.0  xfun_0.48          tibble_3.2.1      
#> [61] tidyselect_1.2.1   rstudioapi_0.17.1  knitr_1.48         htmltools_0.5.8.1 
#> [65] rmarkdown_2.28     svglite_2.1.3      readr_2.1.5        pheatmap_1.0.12   
#> [69] compiler_4.4.0