R/02-gimap_filter.R
qc_filter_plasmid.Rd
This function flags and reports which and how many pgRNAs have low log2 CPM values for the plasmid/Day 0 sample/time point. If more than one column is specified as the plasmid sample, we pool all the replicate samples to find the lower outlier and flag constructs for which any plasmid replicate has a log2 CPM value below the cutoff
qc_filter_plasmid(
gimap_dataset,
cutoff = NULL,
filter_plasmid_target_col = NULL
)
The special gimap_dataset from the `setup_data` function which contains the log2 CPM transformed data
default is NULL, the cutoff for low log2 CPM values for the plasmid time period; if not specified, The lower outlier (defined by taking the difference of the lower quartile and 1.5 * interquartile range) is used
default is NULL, and if NULL, will select the first column only; this parameter specifically should be used to specify the plasmid column(s) that will be selected
a named list with the filter `filter` specifying which pgRNAs have low plasmid log2 CPM (column of interest is `plasmid_cpm_filter`) and a report df `reportdf` for the number and percent of pgRNA which have a low plasmid log2 CPM
# \donttest{
gimap_dataset <- get_example_data("gimap", data_dir = tempdir())
qc_filter_plasmid(gimap_dataset)
#> $filter
#> # A tibble: 33,170 × 1
#> plasmid_cpm_filter
#> <lgl>
#> 1 FALSE
#> 2 FALSE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
#> 7 FALSE
#> 8 FALSE
#> 9 FALSE
#> 10 FALSE
#> # ℹ 33,160 more rows
#>
#> $reportdf
#> Plasmid_log2cpmBelowCutoff n percent
#> 1 FALSE 32873 99.1
#> 2 TRUE 297 0.9
#>
# or to specify a cutoff value to be used in the filter rather than the lower
# outlier default
qc_filter_plasmid(gimap_dataset, cutoff = 2)
#> $filter
#> # A tibble: 33,170 × 1
#> plasmid_cpm_filter
#> <lgl>
#> 1 FALSE
#> 2 FALSE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
#> 7 FALSE
#> 8 FALSE
#> 9 FALSE
#> 10 FALSE
#> # ℹ 33,160 more rows
#>
#> $reportdf
#> Plasmid_log2cpmBelowCutoff n percent
#> 1 FALSE 32898 99.18
#> 2 TRUE 272 0.82
#>
# or to specify a different column (or set of columns to select)
qc_filter_plasmid(gimap_dataset, filter_plasmid_target_col = 1:2)
#> $filter
#> # A tibble: 33,170 × 1
#> plasmid_cpm_filter
#> <lgl>
#> 1 FALSE
#> 2 FALSE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
#> 7 FALSE
#> 8 FALSE
#> 9 FALSE
#> 10 FALSE
#> # ℹ 33,160 more rows
#>
#> $reportdf
#> Plasmid_log2cpmBelowCutoff n percent
#> 1 FALSE 31269 94.27
#> 2 TRUE 1901 5.73
#>
# or to specify a cutoff value that will be used in the filter rather than
# the lower outlier default as well as to specify a different column (or set
# of columns) to select
qc_filter_plasmid(gimap_dataset,
cutoff = 1.75,
filter_plasmid_target_col = 1:2
)
#> $filter
#> # A tibble: 33,170 × 1
#> plasmid_cpm_filter
#> <lgl>
#> 1 FALSE
#> 2 FALSE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
#> 7 FALSE
#> 8 FALSE
#> 9 FALSE
#> 10 FALSE
#> # ℹ 33,160 more rows
#>
#> $reportdf
#> Plasmid_log2cpmBelowCutoff n percent
#> 1 FALSE 30818 92.91
#> 2 TRUE 2352 7.09
#>
# }