Reference Data

Reference Data

Reference data used by modules.

Ensembl Data

Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.


10X Genomics

The Hutch has serveral packages from 10X Genomoics.

  • Cell Ranger
  • Cellranger ATAC
  • Cell Ranger ARC
  • Space Ranger Refernce data for 10X Genomics tools is located


Kraken 2 is the newest version of Kraken, a taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. This classifier matches each k-mer within a query sequence to the lowest common ancestor (LCA) of all genomes containing the given k-mer. The k-mer assignments inform the classification algorithm.



STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays. Modules: STAR/2.7.10b-GCC-12.2.0



AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence.



CHOPCHOP (version 3) is a tool for selecting target sites for CRISPR/Cas9, CRISPR/Cpf1, CRISPR/Cas13 or NICKASE/TALEN-directed mutagenesis.


gatk (Broad Institute) Funcotator Data

The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data.


Bowtie2 Index data




PCGR - Personal Cancer Genome Reporter


RSeQC (Picard-Style) Interval Files





tcrdist3 is an open-source python package that enables a broad array of T cell receptor sequence analyses. There are many other applications for analyzing TCR data; DeepTCR, clusTCR, pubtcrs.