gizmo Bio Modules 18.04

  • ADMIXTURE/1.3.0-x86_64 Software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.
  • AGAT/0.9.2-GCC-11.2.0 AGAT: Another GTF/GFF Analysis Toolkit. Suite of tools to handle gene annotations in any GTF/GFF format.
  • ASE/3.22.0-foss-2020b ASE is a python package providing an open source Atomic Simulation Environment in the Python scripting language. From version 3.20.1 we also include the ase-ext package, it contains optional reimplementations in C of functions in ASE. ASE uses it automatically when installed.
  • AUGUSTUS/3.4.0-foss-2020b AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences
  • AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0 AlphaFold can predict protein structures with atomic accuracy even where no similar structure is known
  • AlphaPulldown/2.0.3-foss-2023a-CUDA-12.1.1 AlphaPulldown is a Python package that streamlines protein-protein interaction screens and high-throughput modelling of higher-order oligomers using AlphaFold-Multimer
  • ArrayFire/3.8.1-foss-2019b-CUDA-10.2.89 ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures including CPUs, GPUs, and other hardware acceleration devices.
  • Arriba/2.4.0-GCC-12.2.0 Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data. It was developed for the use in a clinical research setting. Therefore, short runtimes and high sensitivity were important design criteria.
  • Arrow/17.0.0-gfbf-2024a Apache Arrow (incl. PyArrow Python bindings), a cross-language development platform for in-memory data.
  • ArviZ/0.16.1-foss-2023a Exploratory analysis of Bayesian models with Python
  • BAli-Phy/4.0-beta8-gfbf-2022b BAli-Phy estimates multiple sequence alignments and evolutionary trees from DNA, amino acid, or codon sequences.
  • BBMap/38.97-GCC-10.2.0 BBMap short read aligner, and other bioinformatic tools.
  • BCFtools/1.19-GCC-13.2.0 Samtools is a suite of programs for interacting with high-throughput sequencing data. BCFtools
  • BEAST/10.5.0-beta3-GCC-12.3.0-CUDA-12.1.1 BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.
  • BEDOPS/2.4.41-foss-2021b BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster.
  • BEDTools/2.31.0-GCC-12.3.0 BEDTools: a powerful toolset for genome arithmetic. The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM.
  • BGEN-enkre/1.1.7-GCC-11.2.0 This repository contains a reference implementation of the BGEN format, written in C++. The library can be used as the basis for BGEN support in other software, or as a reference for developers writing their own implementations of the BGEN format. Please cite: Band, G. and Marchini, J., “BGEN: a binary file format for imputed genotype and haplotype data”, bioArxiv 308296; doi: https://doi.org/10.1101/308296
  • BLAST+/2.16.0-gompi-2023b Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
  • BLAT/3.5-GCC-8.3.0 BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.
  • BPCells/0.3.0-foss-2024a Single-cell transcriptome sequencing (sc-RNA-seq) experiments allow us to discover new cell types and help us understand how they arise in development. The Monocle 3 package provides a toolkit for analyzing single-cell gene expression experiments.
  • BUStools/0.40.0-foss-2019b bustools is a program for manipulating BUS files for single cell RNA-Seq datasets. It can be used to error correct barcodes, collapse UMIs, produce gene count or transcript compatibility count matrices, and is useful for many other tasks. See the kallisto | bustools website for examples and instructions on how to use bustools as part of a single-cell RNA-seq workflow.
  • BWA/0.7.18-GCCcore-13.3.0 Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.
  • BamTools/2.5.2-GCC-12.2.0 BamTools provides both a programmer’s API and an end-user’s toolkit for handling BAM files.
  • BaseSpaceCLI/1.5.1 BaseSpace is a powerful website where biologists and informaticians can easily store, analyze, and share genetic data. BaseSpace is a commerical product from Illumina.
  • Beagle/5.2.1 Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.
  • Beast/10.5.0-beta3-GCC-12.3.0-beagle-lib-4.0.1-CUDA-12.1.1 BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.
  • Beast2/2.7.7-GCC-12.3.0-beagle-lib-4.0.1-CUDA-12.1.1 BEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.
  • BeautifulSoup/4.12.3-GCCcore-13.3.0 Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping.
  • Bio-DB-HTS/3.01-GCC-13.3.0 Read files using HTSlib including BAM/CRAM, Tabix and BCF database files
  • Bio-SearchIO-hmmer/1.7.3-GCC-10.2.0 Code to parse output from hmmsearch, hmmscan, phmmer and nhmmer, compatible with both version 2 and version 3 of the HMMER package from http://hmmer.org.
  • BioPerl/1.7.8-GCCcore-12.2.0 Bioperl is the product of a community effort to produce Perl code which is useful in biology. Examples include Sequence objects, Alignment objects and database searching objects.
  • Biopython/1.84-foss-2023b Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics.
  • Bismark/0.24.1-GCC-12.2.0 A tool to map bisulfite converted sequence reads and determine cytosine methylation states
  • Bowtie/1.3.0-GCC-10.2.0 Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome.
  • Bowtie2/2.5.4-GCC-13.2.0 Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
  • CD-HIT/4.8.1-foss-2019b CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
  • CITE-seq-Count/1.4.4-foss-2023b-Python-3.11.5 A python package that allows to count antibody TAGS from a CITE-seq and/or cell hashing experiment.
  • CNVkit/0.9.9-foss-2021b-R-4.2.0 A command-line toolkit and Python library for detecting copy number variants and alterations genome-wide from high-throughput sequencing.
  • CRISPRCasTyper/1.2.1-foss-2020a-Python-3.8.2 Detect CRISPR-Cas genes and arrays, and predict the subtype based on both Cas genes and CRISPR repeat sequence.
  • CRISPResso2/2.3.1-foss-2023b CRISPResso2 is a software pipeline designed to enable rapid and intuitive interpretation of genome editing experiments.
  • Cbc/2.10.5-foss-2022b Cbc (Coin-or branch and cut) is an open-source mixed integer linear programming solver written in C++. It can be used as a callable library or using a stand-alone executable.
  • CellBender/0.2.2-foss-2021b-CUDA-11.4.1 CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
  • CellProfiler/4.2.7-foss-2023a CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.
  • CellRanger/10.0.0 Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.
  • CellRanger-ARC/2.1.0 Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell Multiome ATAC + Gene Expression sequencing data to generate a variety of analyses pertaining to gene expression, chromatin accessibility and their linkage. Furthermore, since the ATAC and gene expression measurements are on the very same cell, we are able to perform analyses that link chromatin accessibility and gene expression.
  • CellRanger-ATAC/2.0.0 Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
  • CellRank/2.0.2-foss-2023a CellRank is a toolkit to uncover cellular dynamics based on Markov state modeling of single-cell data. It contains two main modules: kernels compute cell-cell transition probabilities and estimators generate hypothesis based on these.
  • CellTypist/1.6.2-foss-2023a A tool for semi-automatic cell type annotation
  • Cellpose/0.6.5-fosscuda-2020b Cellpose is an anatomical segmentation algorithm written in Python 3 by Carsen Stringer and Marius Pachitariu.
  • Cgl/0.60.8-foss-2023b The COIN-OR Cut Generation Library (Cgl) is a collection of cut generators that can be used with other COIN-OR packages that make use of cuts, such as, among others, the linear solver Clp or the mixed integer linear programming solvers Cbc or BCP. Cgl uses the abstract class OsiSolverInterface (see Osi) to use or communicate with a solver. It does not directly call a solver.
  • Clair3/1.0.4-foss-2022a Clair3 is a germline small variant caller for long-reads. Clair3 makes the best of two major method categories: pileup calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 runs fast and has superior performance, especially at lower coverage. Clair3 is simple and modular for easy deployment and integration.
  • Clp/1.17.9-foss-2023b Clp (Coin-or linear programming) is an open-source linear programming solver. It is primarily meant to be used as a callable library, but a basic, stand-alone executable version is also available.
  • Clustal-Omega/1.2.4-GCC-8.3.0 Clustal Omega is a multiple sequence alignment program for proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary relationships can be seen via viewing Cladograms or Phylograms
  • ClustalW2/2.1-foss-2019b ClustalW2 is a general purpose multiple sequence alignment program for DNA or proteins.
  • Cluster-Buster/0.0-GCC-12.2.0 Cluster-Buster is a program for finding interesting functional regions, such as transcriptional enhancers, in DNA sequences.
  • Cogent_NGS_Immune_Profiler/v2.0-foss-2024a Cogent NGS Immune Profiler (CogentIP) is software designed to analyze sequence data stored in FASTQ files generated by Illumina sequencers from libraries prepared using certain Takara Bio immune profiling kits.
  • CoinUtils/2.11.10-GCC-13.2.0 CoinUtils (Coin-OR Utilities) is an open-source collection of classes and functions that are generally useful to more than one COIN-OR project.
  • Control-FREEC/11.5-GCC-8.3.0 Copy number and genotype annotation from whole genome and whole exome sequencing data.
  • CrossMap/0.7.3-foss-2023b CrossMap is a program for genome coordinates conversion between different assemblies (such as hg18 (NCBI36) <=> hg19 (GRCh37)). It supports commonly used file formats including BAM, CRAM, SAM, Wiggle, BigWig, BED, GFF, GTF and VCF.
  • DANPOS3/3.1.1-foss-2024a-Python-3.12.3 A toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 3
  • DBD-mysql/4.051-GCC-13.3.0 Perl binding for MySQL
  • DB_File/1.859-GCCcore-12.3.0 Perl5 access to Berkeley DB version 1.x.
  • DIAMOND/2.0.13-GCC-11.2.0 Accelerated BLAST compatible local sequence aligner
  • DeepCell/0.11.1-foss-2021b-CUDA-11.4.1 deepcell-tf is a deep learning library for single-cell analysis of biological images.This library allows users to apply pre-existing models to imaging data as well as to develop new deep learning models for single-cell analysis.
  • DeepTCR/2.1.27-foss-2021b-CUDA-11.4.1 DeepTCR is a python package that has a collection of unsupervised and supervised deep learning methods to parse TCRSeq data.
  • Delly/0.9.1-gompi-2020b Delly is an integrated structural variant (SV) prediction method that can discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends, split-reads and read-depth to sensitively and accurately delineate genomic rearrangements throughout the genome.
  • DeltaLake/0.15.1-gfbf-2023a Native Delta Lake Python binding based on delta-rs with Pandas integration. The Delta Lake project aims to unlock the power of the Deltalake for as many users and projects as possible by providing native low-level APIs aimed at developers and integrators, as well as a high-level operations API that lets you query, inspect, and operate your Delta Lake with ease.
  • DendroPy/5.0.1-GCCcore-13.2.0 A Python library for phylogenetics and phylogenetic computing: reading, writing, simulation, processing and manipulation of phylogenetic trees (phylogenies) and characters.
  • DiMSum/1.2.9-foss-2021b-R-4.2.0 An error model and pipeline for analyzing deep mutational scanning (DMS) data and diagnosing common experimental pathologies.
  • EIGENSOFT/7.2.1-foss-2019b The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification correction method (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes.
  • EMAN2/2.3-foss-2019b-Python-2.7.16 EMAN2 is the successor to EMAN1. It is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes.
  • EMBOSS/6.6.0-foss-2023b EMBOSS is ‘The European Molecular Biology Open Software Suite’ . EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.
  • EPACTS/3.3.2-foss-2020b EPACTS is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.
  • Eigen/3.4.0-GCCcore-11.3.0 Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
  • Enrich2/1.3.1-foss-2020b-Python-2.7.18 Enrich2 is a general software tool for processing, analyzing, and visualizing data from deep mutational scanning experiments.
  • Enrich2/1.3.1-foss-2020b-Python-2.7.18 Enrich2 is a general software tool for processing, analyzing, and visualizing data from deep mutational scanning experiments.
  • FASTX-Toolkit/0.0.14-GCCcore-8.3.0 The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
  • FLAIR/2.0-foss-2023a FLAIR (Full-Length Alternative Isoform analysis of RNA) for the correction, isoform definition, and alternative splicing analysis of noisy reads. FLAIR has primarily been used for nanopore cDNA, native RNA, and PacBio sequencing reads.
  • FLASH/2.2.00-foss-2022b FLASH (Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments. FLASH is designed to merge pairs of reads when the original DNA fragments are shorter than twice the length of reads. The resulting longer reads can significantly improve genome assemblies. They can also improve transcriptome assembly when FLASH is used to merge RNA-seq data.
  • FastQC/0.12.1-Java-11 FastQC is a quality control application for high throughput sequence data. It reads in sequence data in a variety of formats and can either provide an interactive application to review the results of several different QC checks, or create an HTML based report which can be integrated into a pipeline.
  • FastTree/2.1.11-GCCcore-11.3.0 FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory.
  • FlashPCA2/2.0-GCC-10.2.0 FlashPCA performs fast principal component analysis (PCA) of single nucleotide polymorphism (SNP) data.
  • Flax/0.8.4-gfbf-2023a-CUDA-12.1.1 Flax is a high-performance neural network library and ecosystem for JAX that is designed for flexibility: Try new forms of training by forking an example and by modifying the training loop, not by adding features to a framework.
  • FreeTDS/1.3.6-GCCcore-11.2.0 FreeTDS is a set of libraries for Unix and Linux that allows your programs to natively talk to Microsoft SQL Server and Sybase databases.
  • GATK/4.4.0.0-GCCcore-12.2.0-Java-17 The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
  • GCTA/1.92.2beta GCTA (Genome-wide Complex Trait Analysis) was originally designed to estimate the proportion of phenotypic variance explained by all genome-wide SNPs for complex traits (the GREML method), and has subsequently extended for many other analyses to better understand the genetic architecture of complex traits.
  • GDAL/3.10.0-foss-2024a GDAL is a translator library for raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats. It also comes with a variety of useful commandline utilities for data translation and processing.
  • GEOS/3.12.2-GCC-13.3.0 GEOS (Geometry Engine
  • GISTIC/2.0.23-GCCcore-8.3.0 GISTIC is a tool to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, GISTIC estimates the background rates for each category as well as defines the boundaries of SCNA regions.
  • GMP/6.3.0-GCCcore-13.3.0 GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers.
  • GRIDSS/2.13.2-foss-2021b GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. GRIDSS includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. GRIDSS calls variants based on alignment-guided positional de Bruijn graph genome-wide break-end assembly, split read, and read pair evidence.
  • GROMACS/2024.4-foss-2023b GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. This is a CPU only build, containing both MPI and threadMPI binaries for both single and double precision. It also contains the gmxapi extension for the single precision MPI build.
  • GSEA/4.3.2 Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
  • Garnett/20220903-foss-2021b-R-4.2.2 Garnett is a software package that faciliates automated cell type classification from single-cell expression data.
  • GenomeSTRiP/2.00.1958-GCCcore-8.3.0-Java-11 Genome STRiP (Genome STRucture In Populations) is a suite of tools for discovery and genotyping of structural variation using whole-genome sequencing data. The methods used in Genome STRiP are designed to find shared variation using data from multiple individuals. Genome STRiP looks both across and within a set of sequenced genomes to detect variation.
  • Globus-CLI/3.29.0-GCCcore-12.2.0 A Command Line Wrapper over the Globus SDK for Python, which provides an interface to Globus services from the shell, and is suited to both interactive and simple scripting use cases.
  • GoPeaks/1.0.0 GoPeaks is a peak caller designed for CUT&TAG/CUT&RUN sequencing data.
  • HDF/4.3.0-GCCcore-13.3.0 HDF (also known as HDF4) is a library and multi-object file format for storing and managing data between machines.
  • HDF5/1.14.5-gompi-2024a HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.
  • HH-suite/3.3.0-gompic-2020b The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).
  • HISAT2/2.2.1-gompi-2021b HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome).
  • HLA-HD/1.6.1-GCC-11.2.0 HLA-HD (HLA typing from High-quality Dictionary) can accurately determine HLA alleles with 6-digit precision from NGS data (fastq format). RNA-Seq data can also be applied.
  • HMMER/3.4-gompi-2023a HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models. In the past, this strength came at significant computational expense, but in the new HMMER3 project, HMMER is now essentially as fast as BLAST.
  • HOME/1.0.0-foss-2019b-Python-3.7.4 HOME (histogram of methylation) is a python package for differential methylation region (DMR) identification. The method uses histogram of methylation features and the linear Support Vector Machine (SVM) to identify DMRs from whole genome bisulfite sequencing (WGBS) data.
  • HOMER/5.1-foss-2023a-R-4.3.2 HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis. It is a collection of command line programs for unix-style operating systems written in Perl and C++. HOMER was primarily written as a de novo motif discovery algorithm and is well suited for finding 8-20 bp motifs in large scale genomics data. HOMER contains many useful tools for analyzing ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and numerous other types of functional genomics sequencing data sets.
  • HTSeq/0.11.3-foss-2020b HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
  • HTSlib/1.21-GCC-13.3.0 A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix
  • Hail/0.2.64-foss-2020b Hail is an open-source, general-purpose, Python-based data analysis tool with additional data types and methods for working with genomic data.
  • HiC-Pro/3.1.0-foss-2021b HiC-Pro was designed to process Hi-C data, from raw fastq files (paired-end Illumina data) to the normalized contact maps.
  • HyPhy/2.5.60-gompi-2022a HyPhy (Hypothesis Testing using Phylogenies) is an open-source software package for the analysis of genetic sequences (in particular the inference of natural selection) using techniques in phylogenetics, molecular evolution, and machine learning
  • IGV/2.15.4-Java-11 This package contains command line utilities for preprocessing, computing feature count density (coverage), sorting, and indexing data files.
  • IGVTools/2.4.16-Java-1.8 This package contains command line utilities for preprocessing, computing feature count density (coverage), sorting, and indexing data files. See also http://www.broadinstitute.org/software/igv/igvtools_commandline.
  • ISL/0.24-GCCcore-11.2.0 isl is a library for manipulating sets and relations of integer points bounded by linear constraints.
  • IgBLAST/1.22.0-x64-linux IgBLAST faclilitates the analysis of immunoglobulin and T cell receptor variable domain sequences.
  • Infernal/1.1.4-foss-2021b Infernal (“INFERence of RNA ALignment”) is for searching DNA sequence databases for RNA structure and sequence similarities.
  • JAGS/4.3.2-foss-2022b JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation
  • Jansson/2.14-GCC-12.3.0 Jansson is a C library for encoding, decoding and manipulating JSON data. Its main features and design principles are: * Simple and intuitive API and data model * Comprehensive documentation * No dependencies on other libraries * Full Unicode support (UTF-8) * Extensive test suite
  • Jellyfish/2.3.0-GCC-10.2.0 Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.
  • Kalign/3.4.0-GCCcore-12.3.0 Kalign is a fast multiple sequence alignment program for biological sequences.
  • Kent_tools/20201201-linux.x86_64 Jim Kent’s tools: collection of tools used by the UCSC genome browser.
  • Keras/2.4.3-foss-2020b Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow.
  • Kraken2/2.1.3-gompi-2022b Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
  • LAME/3.100-GCCcore-10.2.0 LAME is a high quality MPEG Audio Layer III (MP3) encoder licensed under the LGPL.
  • LUMPY/0.3.1-foss-2020b A probabilistic framework for structural variant discovery.
  • Levenshtein/0.25.1-GCCcore-13.2.0 Python extension for computing string edit distances and similarities.
  • LoomXpy/0.4.2-foss-2022b Python package (compatible with SCope) to create .loom files and extend them with other data e.g.: SCENIC regulons
  • MACHINA/1.2-GCC-13.2.0 MACHINA is a computational framework for inferring migration patterns between a primary tumor and metastases using DNA sequencing data.
  • MACS2/2.2.9.1-foss-2022b Model Based Analysis for ChIP-Seq data
  • MACS3/3.0.1-gfbf-2023a Model Based Analysis for ChIP-Seq data
  • MAESTRO/1.2.1-foss-2019b-Python-3.7.4 MAESTRO(Model-based AnalysEs of Single-cell Transcriptome and RegulOme) is a comprehensive single-cell RNA-seq and ATAC-seq analysis suit built using snakemake. MAESTRO combines several dozen tools and packages to create an integrative pipeline, which enables scRNA-seq and scATAC-seq analysis from raw sequencing data (fastq files) all the way through alignment, quality control, cell filtering, normalization, unsupervised clustering, differential expression and peak calling, celltype annotation and transcription regulation analysis.
  • MAFFT/7.526-GCC-13.2.0-with-extensions MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
  • MAGeCK/0.5.9.5-gfbf-2022b Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout (MAGeCK) is a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens (or GeCKO) technology. MAGeCK is developed by Wei Li and Han Xu from Dr. Xiaole Shirley Liu’s lab at Dana-Farber Cancer Institute, and is being actively updated by Wei Li lab from Children’s National Medical Center.
  • MAGeCK-VISPR/0.5.5-Python-3.7.4 MAGeCK-VISPR is a comprehensive quality control, analysis and visualization workflow for CRISPR/Cas9 screens The workflow combines the MAGeCK algorithm to identify essential genes from CRISPR/Cas9 screens considering multiple conditions with VISPR to interactively explore results and quality control in a web-based frontend.
  • MAGeCK-VISPR/0.5.5-Python-3.7.4 MAGeCK-VISPR is a comprehensive quality control, analysis and visualization workflow for CRISPR/Cas9 screens The workflow combines the MAGeCK algorithm to identify essential genes from CRISPR/Cas9 screens considering multiple conditions with VISPR to interactively explore results and quality control in a web-based frontend.
  • MEGAHIT/1.2.9-GCCcore-13.3.0 An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
  • MEGAN/6.25.3-Java-17 MEGAN is a comprehensive toolbox for interactively analyzing microbiome data
  • MEME/5.5.1-gompi-2021b The MEME Suite allows you to: * discover motifs using MEME, DREME (DNA only) or GLAM2 on groups of related DNA or protein sequences, * search sequence databases with motifs using MAST, FIMO, MCAST or GLAM2SCAN, * compare a motif to all motifs in a database of motifs, * associate motifs with Gene Ontology terms via their putative target genes, and * analyse motif enrichment using SpaMo or CentriMo.
  • METIS/5.1.0-GCCcore-12.2.0 METIS is a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. The algorithms implemented in METIS are based on the multilevel recursive-bisection, multilevel k-way, and multi-constraint partitioning schemes.
  • MMseqs2/13-45111-gompi-2021b MMseqs2: ultra fast and sensitive search and clustering suite
  • MPC/1.3.1-GCCcore-12.2.0 Gnu Mpc is a C library for the arithmetic of complex numbers with arbitrarily high precision and correct rounding of the result. It extends the principles of the IEEE-754 standard for fixed precision real floating point numbers to complex numbers, providing well-defined semantics for every operation. At the same time, speed of operation at high precision is a major design goal.
  • MPFR/4.2.1-GCCcore-13.2.0 The MPFR library is a C library for multiple-precision floating-point computations with correct rounding.
  • MUMPS/5.6.1-foss-2022b-metis A parallel sparse direct solver
  • MUMmer/4.0.0rc1-GCCcore-12.3.0 MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. AMOS makes use of it.
  • MUSCLE/5.1.0-GCCcore-12.3.0 MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds. Most users learn everything they need to know about MUSCLE in a few minutes-only a handful of command-line options are needed to perform common alignment tasks.
  • Magic-BLAST/1.5.0-Linux_x86_64 Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome.
  • MariaDB/11.7.0-GCC-13.3.0 MariaDB is an enhanced, drop-in replacement for MySQL. Included engines: myISAM, Aria, InnoDB, RocksDB, TokuDB, OQGraph, Mroonga.
  • MathWorksServiceHost/2024.13.0.2 MathWorks Service Host is a collection of background processes that provide required services to MATLAB and other MathWorks products. Starting from MATLAB Release 2024a, MATLAB requires MathWorks Service Host.
  • MaxQuant/2.7.3.0 MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data. Several labeling techniques as well as label-free quantification are supported.
  • Metal/2020-05-05-GCC-10.2.0 Metal
  • MiXCR/3.0.3-Java-1.8 MiXCR processes big immunome data from raw sequences to quantitated clonotypes
  • MinCED/0.4.2-GCCcore-9.3.0-Java-11 Mining CRISPRs in Environmental Datasets
  • Monocle3/1.3.1-foss-2021b-R-4.2.2 Single-cell transcriptome sequencing (sc-RNA-seq) experiments allow us to discover new cell types and help us understand how they arise in development. The Monocle 3 package provides a toolkit for analyzing single-cell gene expression experiments.
  • MoreRONN/4.9-foss-2019b MoreRONN is the spiritual successor of RONN and is useful for surveying disorder in proteins as well as designing expressible constructs for X-ray crystallography.
  • MotionCor2/1.4.2-gcccuda-2020b MotionCor2 correct anisotropic image motion at the single pixel level across the whole frame, suitable for both single particle and tomographic images. Iterative, patch-based motion detection is combined with spatial and temporal constraints and dose weighting. Cite publication: Shawn Q. Zheng, Eugene Palovcak, Jean-Paul Armache, Yifan Cheng and David A. Agard (2016) Anisotropic Correction of Beam-induced Motion for Improved Single-particle Electron Cryo-microscopy, Nature Methods, submitted. BioArxiv: https://biorxiv.org/content/early/2016/07/04/061960
  • MrBayes/3.2.7a-gompi-2021b MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models.
  • MuSE/2.0.1-GCC-11.2.0 An accurate and ultra-fast somatic mutation calling tool for whole-genome sequencing (WGS) and whole-exome sequencing (WES) data from heterogeneous tumor samples.
  • MultiQC/1.21-foss-2023a Aggregate results from bioinformatics analyses across many samples into a single report. MultiQC searches a given directory for analysis logs and compiles a HTML report. It’s a general use tool, perfect for summarising the output from numerous bioinformatics tools.
  • MutSig/2 MutSig stands for “Mutation Significance”. MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.
  • NAMD/2.14-foss-2020a-mpi NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.
  • NGS/2.11.2-GCCcore-11.2.0 NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing.
  • NextGenMap/0.5.5-GCC-11.2.0 NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime.
  • NextPolish/1.4.1-GCC-13.3.02025-01-08 NextDenovo is a string graph-based de novo assembler for long reads.
  • OpenMM/8.0.0-foss-2022a-CUDA-11.7.0 OpenMM is a toolkit for molecular simulation.
  • OptiType/1.3.5-foss-2019b-Python-2.7.16 OptiType is a novel HLA genotyping algorithm based on integer linear programming, capable of producing accurate 4-digit HLA genotyping predictions from NGS data by simultaneously selecting all major and minor HLA Class I alleles.
  • Osi/0.108.9-GCC-13.2.0 Osi (Open Solver Interface) provides an abstract base class to a generic linear programming (LP) solver, along with derived classes for specific solvers. Many applications may be able to use the Osi to insulate themselves from a specific LP solver. That is, programs written to the OSI standard may be linked to any solver with an OSI interface and should produce correct results. The OSI has been significantly extended compared to its first incarnation. Currently, the OSI supports linear programming solvers and has rudimentary support for integer programming.
  • PDBFixer/1.7-foss-2020b PDBFixer is an easy to use application for fixing problems in Protein Data Bank files in preparation for simulating them.
  • PEAR/0.9.11-GCC-11.3.0 PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.
  • PHASE/2.1.2-GCCcore-8.3.0 PHASE is a program implementing the method for reconstructing haplotypes from population data
  • PLINK/2.00-alpha1-x86_64 Whole-genome association analysis toolset
  • PLINK2/20210826-linux_x86_64 PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data.
  • PLUMED/2.7.2-foss-2020b PLUMED is an open source library for free energy calculations in molecular systems which works together with some of the most popular molecular dynamics engines. Free energy calculations can be performed as a function of many order parameters with a particular focus on biological problems, using state of the art methods such as metadynamics, umbrella sampling and Jarzynski-equation based steered MD. The software, written in C++, can be easily interfaced with both fortran and C/C++ codes.
  • ParMETIS/4.0.3-gompi-2022a ParMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS and includes routines that are especially suited for parallel AMR computations and large scale numerical simulations. The algorithms implemented in ParMETIS are based on the parallel multilevel k-way graph-partitioning, adaptive repartitioning, and parallel multi-constrained partitioning schemes.
  • ParMETIS/4.0.3-gompi-2022a ParMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS and includes routines that are especially suited for parallel AMR computations and large scale numerical simulations. The algorithms implemented in ParMETIS are based on the parallel multilevel k-way graph-partitioning, adaptive repartitioning, and parallel multi-constrained partitioning schemes.
  • PhyML/3.3.20220408-foss-2023a PhyML is a software package that uses modern statistical approaches to analyse alignments of nucleotide or amino acid sequences in a phylogenetic framework.
  • Porechop/0.2.4-GCCcore-12.3.0 Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity
  • PostgreSQL/16.4-GCCcore-13.3.0 PostgreSQL is a powerful, open source object-relational database system. It is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.
  • ProteinMPNN/1.0.1-20230627-foss-2022a-CUDA-11.7.0 A deep learning based protein sequence design method is described that is widely applicable to current design challenges and shows outstanding performance in both in silico and experimental tests.
  • PyRosetta/4.387-gompi-2023a PyRosetta is an interactive Python-based interface to the powerful Rosetta molecular modeling suite. It enables users to design their own custom molecular modeling algorithms using Rosetta sampling methods and energy functions.
  • PyStan/3.5.0-foss-2021b Python interface to Stan, a package for Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo.
  • PyTables/3.9.2-foss-2023a PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browsing, processing and searching very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that data takes much less space (specially if on-flight compression is used) than other solutions such as relational or object oriented databases.
  • PyTorch/2.1.2-foss-2022b Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework that puts Python first.
  • PyTorch-bundle/2.1.2-foss-2023a-CUDA-12.1.1 PyTorch with compatible versions of official Torch extensions.
  • Pyomo/6.4.4-foss-2021b Pyomo is a Python-based open-source software package that supports a diverse set of optimization capabilities for formulating and analyzing optimization models.
  • Pysam/0.22.1-GCC-13.3.0 Pysam is a python module for reading and manipulating Samfiles. It’s a lightweight wrapper of the samtools C-API. Pysam also includes an interface for tabix.
  • QIIME2/2020.11 QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data.
  • QUAST/5.1.0rc1-foss-2020b QUAST evaluates genome assemblies by computing various metrics. It works both with and without reference genomes. The tool accepts multiple assemblies, thus is suitable for comparison.
  • Qhull/2020.2-GCCcore-12.3.0 Qhull computes the convex hull, Delaunay triangulation, Voronoi diagram, halfspace intersection about a point, furthest-site Delaunay triangulation, and furthest-site Voronoi diagram. The source code runs in 2-d, 3-d, 4-d, and higher dimensions. Qhull implements the Quickhull algorithm for computing the convex hull.
  • R-bundle-Bioconductor/3.20-foss-2024a-R-4.4.2 Bioconductor provides tools for the analysis and coprehension of high-throughput genomic data.
  • R-keras/2.2.5.0-foss-2019b-Python-3.7.4-R-3.6.2 Interface to ‘Keras’ https://keras.io, a high-level neural networks ‘API’.
  • RAxML/8.2.12-GCC-10.2.0-pthreads-avx2 RAxML search algorithm for maximum likelihood based inference of phylogenetic trees.
  • RAxML-NG/1.0.3-GCC-10.2.0 RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. Its search heuristic is based on iteratively performing a series of Subtree Pruning and Regrafting (SPR) moves, which allows to quickly navigate to the best-known ML tree.
  • RELION/3.1.2-fosscuda-2020b RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy.
  • RFdiffusion/1.1.0-foss-2022a-CUDA-11.7.0 RFdiffusion is an open source method for structure generation, with or without conditional information (a motif, target etc). It can perform a whole range of protein design challenges as we have outlined in the RFdiffusion paper.
  • RMBlast/2.14.1-gompi-2023a RMBlast is a RepeatMasker compatible version of the standard NCBI BLAST suite. The primary difference between this distribution and the NCBI distribution is the addition of a new program ‘rmblastn’ for use with RepeatMasker and RepeatModeler.
  • RNA-SeQC/2.4.2-foss-2021b Fast, efficient RNA-Seq metrics for quality control and process optimization
  • ROSE/1-GCCcore-8.3.0-Python-2.7.16 To create stitched enhancers, and to separate super-enhancers from typical enhancers using sequencing data (.bam) given a file of previously identified constituent enhancers (.gff)
  • RPostgreSQL/0.7-6-foss-2023b Database interface and ‘PostgreSQL’ driver for ‘R’. This package provides a Database Interface ‘DBI’ compliant driver for ‘R’ to access ‘PostgreSQL’ database systems.
  • RSEM/1.3.3-foss-2019b RNA-Seq by Expectation-Maximization
  • RSeQC/5.0.1-foss-2021b RSeQC provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. Some basic modules quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, transcript level RNA integrity etc.
  • Racon/1.5.0-GCCcore-13.3.0 Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads.
  • Redis/7.0.12-GCC-12.2.0 Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions, and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
  • Regenie/3.1.2-GCC-11.2.0 Regenie is a C++ program for whole genome regression modelling of large genome-wide association studies. It is developed and supported by a team of scientists at the Regeneron Genetics Center.
  • RevBayes/1.1.1-GCC-10.2.0 RevBayes provides an interactive environment for statistical computation in phylogenetics. It is primarily intended for modeling, simulation, and Bayesian inference in evolutionary biology, particularly phylogenetics.
  • SAMtools/1.21-GCC-13.3.0 SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
  • SCOTCH/7.0.3-gompi-2022b Software package and libraries for sequential and parallel graph partitioning, static mapping, and sparse matrix block ordering, and sequential mesh and hypergraph partitioning.
  • SHAPEIT4/4.2.2-foss-2020b SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data.
  • SPAdes/4.1.0-GCC-13.3.0 Genome assembler for single-cell and isolates data sets
  • SQANTI3/1.0-foss-2019b-Python-3.7.4 SQANTI3 is the first module of the Functional IsoTranscriptomics (FIT) framework, that also includes IsoAnnot and tappAS. Used for new long read-defined transcriptome.
  • SRA-Toolkit/3.1.1-gompi-2023b The SRA Toolkit, and the source-code SRA System Development Kit (SDK), will allow you to programmatically access data housed within SRA and convert it from the SRA format
  • STAR/2.7.11b-GCC-13.2.0 STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.
  • STAR-Fusion/1.12.0-foss-2022b STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set.
  • SVG/2.84-foss-2019b-Perl-5.30.0 Perl binding for SVG
  • SVIM/2.0.0-foss-2024a SVIM (pronounced swim) is a structural variant caller for third-generation sequencing reads. It is able to detect and classify the following six classes of structural variation: deletions, insertions, inversions, tandem duplications, interspersed duplications and translocations.
  • SVclone/1.1.2-foss-2021b Cluster structural variants of similar cancer cell fraction (CCF).
  • SYMPHONY/5.7.2-foss-2023b SYMPHONY is an open-source solver for mixed-integer linear programs (MILPs) written in C.
  • Salmon/1.10.1-GCC-12.2.0 Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data.
  • Sambamba/1.0.1-GCC-13.2.0 Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.
  • Seaborn/0.13.2-gfbf-2023b Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
  • SeqAn/2.4.0-GCCcore-8.3.0 SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.
  • SeqKit/2.10.1 SeqKit
  • SeqLib/1.2.0-GCC-11.2.0 C++ interface to HTSlib, BWA-MEM and Fermi.
  • SeqPrep/1.3.2-GCCcore-8.3.0 Tool for stripping adaptors and/or merging paired reads with overlap into single reads.
  • Seurat/5.1.0-foss-2023b-R-4.4.0 Seurat is an R package designed for QC, analysis, and exploration of single cell RNA-seq data.
  • ShapeMapper2/2.3-GCC-13.3.0 ShapeMapper automates the calculation of RNA chemical probing reactivities from mutational profiling (MaP) experiments, in which chemical adducts on RNA are detected as internal mutations in cDNA through reverse transcription and read out by massively parallel sequencing.
  • SlamDunk/0.4.3-foss-2021b SlamDunk is a novel, fully automated software tool for automated, robust, scalable and reproducible SLAMseq data analysis.
  • Sniffles/2.6.2-gfbf-2024a A fast structural variant caller for long-read sequencing, Sniffles2 accurately detect SVs on germline, somatic and population-level for PacBio and Oxford Nanopore read data.
  • SoX/14.4.2-GCCcore-11.3.0 Sound eXchange, the Swiss Army knife of audio manipulation
  • SpaceRanger/2.0.0-GCC-11.2.0 Space Ranger is a set of analysis pipelines that process Visium spatial RNA-seq output and brightfield microscope images in order to detect tissue, align reads, generate feature-spot matrices, perform clustering and gene expression analysis, and place spots in spatial context on the slide image.
  • SparK/2.6.2-GCCcore-10.2.0 SparK
  • SpectrA/1.0.1-GCC-11.2.0 Spectra stands for Sparse Eigenvalue Computation Toolkit as a Redesigned ARPACK. It is a C++ library for large scale eigenvalue problems, built on top of Eigen, an open source linear algebra library.
  • Stacks/2.53-foss-2019b Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
  • StringTie/2.2.3-GCC-12.3.0 StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts
  • Subread/2.0.3-GCC-11.2.0 High performance read alignment, quantification and mutation discovery
  • TGS-GapCloser/1.2.1-GCCcore-13.3.0 A gap-closing software tool that uses error-prone long reads generated by third-generation-sequence techniques (Pacbio, Oxford Nanopore, etc.) or preassembled contigs to fill N-gap in the genome assembly.
  • TOPAZ/0.2.4-foss-2020b Topaz is a pipeline for particle picking in cryo-electron micrographs using neural networks and positive-unlabeled learning
  • TRF/4.09.1-GCCcore-11.3.0 Tandem Repeats Finder: a program to analyze DNA sequences.
  • TRUST4/1.0.7-GCC-11.2.0 Tcr Receptor Utilities for Solid Tissue (TRUST) is a computational tool to analyze TCR and BCR sequences using unselected RNA sequencing data, profiled from solid tissues, including tumors. TRUST4 performs de novo assembly on V, J, C genes including the hypervariable complementarity-determining region 3 (CDR3) and reports consensus of BCR/TCR sequences. TRUST4 then realigns the contigs to IMGT reference gene sequences to report the corresponding information. TRUST4 supports both single-end and paired-end sequencing data with any read length.
  • TagDust/2.33-GCCcore-8.3.0 Raw sequences produced by next generation sequencing (NGS) machines may contain adapter, linker, barcode and fingerprint sequences. TagDust2 is a program to extract and correctly label the sequences to be mapped in downstream pipelines.
  • Telescope/1.0.3-gfbf-2022b Single locus resolution of Transposable ELEment expression using next-generation sequencing.
  • Theano/1.1.2-foss-2020b-PyMC Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
  • TopHat/2.1.1-foss-2019b TopHat is a fast splice junction mapper for RNA-Seq reads.
  • Tracer/1.7.1 Tracer is a program for analysing the trace files generated by Bayesian MCMC runs (that is, the continuous parameter values sampled from the chain). It can be used to analyse runs of BEAST, MrBayes, LAMARC and possibly other MCMC programs.
  • Trim_Galore/0.6.7-GCCcore-11.2.0 Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).
  • Trimmomatic/0.39-Java-11 Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.
  • Trinity/2.12.0-foss-2020b Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads.
  • UMI-tools/1.1.4-foss-2023b Tools for handling Unique Molecular Identifiers in NGS data sets
  • Uni-Core/0.0.3-foss-2023a-CUDA-12.1.1 An efficient distributed PyTorch framework
  • VCFtools/0.1.16-GCC-11.2.0 The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
  • VEP/113.3-GCC-13.3.0 Variant Effect Predictor (VEP) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. Includes EnsEMBL-XS, which provides pre-compiled replacements for frequently used routines in VEP.
  • VSEARCH/2.21.1-GCC-11.2.0 VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.
  • VarScan/2.4.4-Java-11 Variant calling and somatic mutation/CNV detection for next-generation sequencing data
  • ViennaRNA/2.5.1-foss-2021b The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
  • WhatsHap/2.2-foss-2023a WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly. It is especially suitable for long reads, but works also well with short reads.
  • WiggleTools/1.2.4-GCC-8.3.0 The WiggleTools package allows genomewide data files to be manipulated as numerical functions, equipped with all the standard functional analysis operators (sum, product, product by a scalar, comparators), and derived statistics (mean, median, variance, stddev, t-test, Wilcoxon’s rank sum test, etc).
  • XML-Compile/1.63-GCCcore-11.2.0 Perl module for compilation based XML processing
  • XML-LibXML/2.0210-GCCcore-13.3.0 Perl binding for libxml2
  • XeniumRanger/4.0.0 The Xenium In Situ software suite is a set of software applications for analyzing and visualizing in situ gene expression data produced by the Xenium Analyzer. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.
  • alleleCount/4.2.1-GCC-11.2.0 The alleleCount package primarily exists to prevent code duplication between some other projects, specifically AscatNGS and Battenberg. As of v4 the perl code wraps the C implementation of allele counting code for BAM/CRAM processing.
  • ancestry/1.0.0-GCCcore-8.3.0-Python-2.7.16 Fast individual ancestry inference from DNA sequence data leveraging allele frequencies from multiple populations. iAdmix Using population allele frequencies for computing individual admixture estimates
  • angsd/0.933-GCC-8.3.0 Program for analysing NGS data.
  • anndata/0.11.3-foss-2024a anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray
  • annovar/20200607-GCCcore-11.2.0-Perl-5.34.0 ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others).
  • arcasHLA/0.2.0-foss-2019b-Python-3.7.4 arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.
  • bam-readcount/1.0.1-GCC-12.2.0 Count DNA sequence reads in BAM files
  • bam2fastx/1.3.0 Conversion of PacBio BAM files into gzipped fasta and fastq files, including splitting of barcoded data
  • bam2wig/1.5 Conversion of a BAM alignment to wiggle and bigwig coverage files, with flexible reporting options.
  • basicfiltering/1.0.7-foss-2020a-Python-3.8.2 Basic Filtering for; Variant Allele Frequency, Variat Reads, tumor-Normal Variant Allele Frequencey Ratio.
  • bcl-convert/4.0.3-2el7.x86_64 The Illumina BCL Convert v4.0 is a standalone local software app that converts the Binary Base Call (BCL) files produced by Illumina sequencing systems to FASTQ files.
  • bcl2fastq2/2.20.0-foss-2019b bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis.
  • bgen/4.1.3-GCCcore-10.2.0 A BGEN file format reader. It fully supports the BGEN format specifications 1.2 and 1.3.
  • biom-format/2.1.16-foss-2023b The BIOM file format (canonically pronounced biome) is designed to be a general-use format for representing biological sample by observation contingency tables. BIOM is a recognized standard for the Earth Microbiome Project and is a Genomics Standards Consortium supported project.
  • bsddb3/6.2.9-GCCcore-10.2.0 bsddb3 is a nearly complete Python binding of the Oracle/Sleepycat C API for the Database Environment, Database, Cursor, Log Cursor, Sequence and Transaction objects.
  • bx-python/0.13.0-foss-2023b The bx-python project is a Python library and associated set of scripts to allow for rapid implementation of genome scale analyses.
  • cDNA_Cupcake/12.4.0-foss-2019b-Python-3.7.4 cDNA_Cupcake is a miscellaneous collection of Python and R scripts used for analyzing sequencing data.
  • cas-offinder/2.4.1-foss-2023b Cas-OFFinder is OpenCL based, ultrafast and versatile program that searches for potential off-target sites of CRISPR/Cas-derived RNA-guided endonucleases (RGEN).
  • cellranger/2.1.1 Single Cell Analysis Pipelines
  • cellsnp-lite/1.2.3-GCC-13.2.0 Cellsnp-lite is a C/C++ tool for efficient genotyping bi-allelic SNPs on single cells. You can use cellsnp-lite after read alignment to obtain the snp x cell pileup UMI or read count matrices for each allele of given or detected SNPs.
  • clusTCR/1.0.2-foss-2019b-Python-3.7.4 Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity.
  • cooltools/0.7.1-foss-2024a cooltools provides a suite of computational tools with a paired python API and command line access, which facilitates workflows either on high-performance computing clusters or via custom analysis notebooks. As part of the Open2C ecosystem, cooltools also provides detailed introductions to key concepts in Hi-C-data analysis with interactive notebook documentation.
  • cromwell/87 Scientific workflow engine designed for simplicity & scalability.
  • ctffind/4.1.14-fosscuda-2020b Program for finding CTFs of electron micrographs.
  • cutadapt/5.0-GCCcore-13.2.0 Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
  • cuteSV/2.1.2-foss-2024a cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to analyze the signatures to implement sensitive SV detection.
  • dask/2024.9.1-gfbf-2024a Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love.
  • datamash/1.8-GCCcore-11.2.0 GNU datamash performs basic numeric, textual and statistical operations on input data files
  • deepTools/3.5.4.post1-gfbf-2022b deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq.
  • delly/0.8.3 DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
  • dill/0.3.9-GCCcore-13.3.0 dill extends python’s pickle module for serializing and de-serializing python objects to the majority of the built-in python types. Serialization is the process of converting an object to a byte stream, and the inverse of which is converting a byte stream back to on python object hierarchy.
  • dms_tools2/2.6.11-foss-2020b dms_tools2 is a software package for analyzing deep mutational scanning data. It is tailored to analyze libraries created using comprehensive codon mutagenesis of protein-coding of genes.
  • dorado/0.8.0-foss-2023a-CUDA-12.1.1 Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.
  • e3nn/0.3.3-foss-2022a-CUDA-11.7.0 Euclidean neural networks (e3nn) is a python library based on pytorch to create equivariant neural networks for the group O(3).
  • easel/0.48-GCC-12.2.0 Easel supports computational analysis of biological sequences using probabilistic models.
  • edlib/1.3.9.post1-GCC-13.3.0 Lightweight, super fast library for sequence alignment using edit (Levenshtein) distance.
  • eggnog-mapper/2.1.7-foss-2021b EggNOG-mapper is a tool for fast functional annotation of novel sequences. It uses precomputed orthologous groups and phylogenies from the eggNOG database (http://eggnog5.embl.de) to transfer functional information from fine-grained orthologs only. Common uses of eggNOG-mapper include the annotation of novel genomes, transcriptomes or even metagenomic gene catalogs.
  • einops/0.8.0-GCCcore-13.2.0 Flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, jax, and others.
  • epiScanpy/0.4.0-foss-2023a EpiScanpy is a toolkit to analyse single-cell open chromatin (scATAC-seq) and single-cell DNA methylation (for example scBS-seq) data. EpiScanpy is the epigenomic extension of the very popular scRNA-seq analysis tool Scanpy (Genome Biology, 2018) [Wolf18].
  • factera/1.4.4-foss-2019b-Perl-5.30.0 (Fusion And Chromosomal Translocation Enumeration and Recovery Algorithm) is a tool for detection of genomic fusions in paired-end targeted (or genome-wide) sequencing data.
  • faiss/1.7.3-foss-2021b-CUDA-11.4.1 FAISS (Facebook AI Similarity Search) is a library that allows developers to quickly search for embeddings of multimedia documents that are similar to each other.
  • fastNGSadmix/dda93a4-GCC-10.2.0 Program for infering admixture proportions and doing PCA with a single NGS sample. Inferences based on reference panel.
  • fastp/0.23.4-GCC-13.2.0 A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance.
  • fastq-tools/0.8.3-GCC-11.2.0 This package provides a number of small and efficient programs to perform common tasks with high throughput sequencing data in the FASTQ format. All of the programs work with typical FASTQ files as well as gzipped FASTQ files.
  • fermi-lite/20190320-GCCcore-11.2.0 Standalone C library for assembling Illumina short reads in small regions.
  • fgbio/2.0.2 A set of tools to analyze genomic data with a focus on Next Generation Sequencing.
  • fhR/4.4.0-foss-2023b R is a free software environment for statistical computing and graphics.
  • fhSeurat/4.1.1-foss-2021b-R-4.2.0 Seurat is an R package designed for QC, analysis, and exploration of single cell RNA-seq data. fhSeurat module has additional Bioconductor packages for single-cell analysis.
  • freebayes/1.3.2-GCCcore-8.3.0 Bayesian haplotype-based polymorphism discovery and genotyping.
  • gffread/0.12.7-GCCcore-12.3.0 GFF/GTF parsing utility providing format conversions, region filtering, FASTA sequence extraction and more.
  • ggVennDiagram/3484e8-foss-2019b-R-4.0.2 A set of functions to generate high-resolution Venn and Euler plots. Includes handling for several special cases, including two-case scaling, and extensive customization of plot shape and structure.
  • giggle/master-foss-2020b GIGGLE is a genomics search engine that identifies and ranks the significance of shared genomic loci between query features and thousands of genome interval files.
  • gmpy2/2.1.5-GCC-12.3.0 GMP/MPIR, MPFR, and MPC interface to Python 2.6+ and 3.x
  • h5netcdf/1.2.0-foss-2023a A Python interface for the netCDF4 file-format that reads and writes local or remote HDF5 files directly via h5py or h5pyd, without relying on the Unidata netCDF library.
  • h5py/3.12.1-foss-2024a HDF5 for Python (h5py) is a general-purpose Python interface to the Hierarchical Data Format library, version 5. HDF5 is a versatile, mature scientific software library designed for the fast, flexible storage of enormous amounts of data.
  • hyperfreq/1.2.0-foss-2020b Hypermutation analysis software using BetaRat distribution for Bayesian analysis of the relative probability ratio (RPR) of observing mutations in two contexts. Includes Alnclst, for clustering pre-aligned nucleotide sequences.
  • iVar/1.3.2-GCC-11.2.0 iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.
  • infercnvpy/0.4.3-foss-2023a Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.
  • interop/1.1.10-foss-2019b-Python-3.7.4 The Illumina InterOp libraries are a set of common routines used for reading InterOp metric files produced by Illumina sequencers including NextSeq 1k/2k. These libraries are backwards compatible and capable of supporting prior releases of the software, with one exception: GA systems have been excluded.
  • intervene/0.6.4-foss-2019b-Python-3.7.4 Intervene a tool for intersection and visualization of multiple genomic region sets
  • itpp/4.3.1-foss-2019b IT++ is a C++ library of mathematical, signal processing and communication classes and functions. Its main use is in simulation of communication systems and for performing research in the area of communications.
  • jax/0.4.25-gfbf-2023a-CUDA-12.1.1 Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
  • kallisto/0.50.1-foss-2022b kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
  • king/2.2.5 KING is a toolset that makes use of high-throughput SNP data typically seen in a genome-wide association study (GWAS) or a sequencing project. Applications of KING include family relationship inference and pedigree error checking, quality control, population substructure identification, forensics, gene mapping, etc.
  • kneaddata/0.12.0-foss-2022a KneadData is a tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data, especially data from microbiome experiments.
  • leidenalg/0.10.1-foss-2022b Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
  • libStatGen/1.0.15-GCCcore-10.2.0 Useful set of classes for creating statistical genetic programs.
  • libcerf/2.4-GCC-13.2.0 libcerf is a self-contained numeric library that provides an efficient and accurate implementation of complex error functions, along with Dawson, Faddeeva, and Voigt functions.
  • libgtextutils/0.7-GCCcore-8.3.0 ligtextutils is a dependency of fastx-toolkit and is provided via the same upstream
  • loompy/3.0.8-foss-2024a Python implementation of the Loom file format, an efficient file format for large omics datasets
  • lpsolve/5.5.2.11-GCC-10.2.0 Mixed Integer Linear Programming (MILP) solver
  • magma/2.7.2-foss-2023a-CUDA-12.1.1 The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current Multicore+GPU systems.
  • manta/1.6.0 Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow.
  • medaka/1.2.3-foss-2019b-Python-3.7.4 medaka is a tool to create a consensus sequence from nanopore sequencing data.
  • minimap2/2.29-GCCcore-13.3.0 Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR). At present, it works efficiently with query sequences from a few kilobases to ~100 megabases in length at an error rate ~15%. Minimap2 outputs in the PAF or the SAM format. On limited test data sets, minimap2 is over 20 times faster than most other long-read aligners. It will replace BWA-MEM for long reads and contig alignment.
  • monocle3/0.2.2-foss-2019b-R-4.0.2 Single-cell transcriptome sequencing (sc-RNA-seq) experiments allow us to discover new cell types and help us understand how they arise in development. The Monocle 3 package provides a toolkit for analyzing single-cell gene expression experiments.
  • mpath/1.1.3-GCCcore-11.3.0 For now it’s quit simple and get_path_info() method returns information about given path. It can be either a directory or a file path.
  • mrcfile/1.3.0-fosscuda-2020b mrcfile is a Python implementation of the MRC2014 file format, which is used in structural biology to store image and volume data. It allows MRC files to be created and opened easily using a very simple API, which exposes the file’s header and data as numpy arrays. The code runs in Python 2 and 3 and is fully unit-tested. This library aims to allow users and developers to read and write standard- compliant MRC files in Python as easily as possible, and with no dependencies on any compiled libraries except numpy. You can use it interactively to inspect files, correct headers and so on, or in scripts and larger software packages to provide basic MRC file I/O functions.
  • msisensor-pro/1.3.0-GCC-13.3.0 MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input.
  • ncbi-vdb/3.1.1-gompi-2023b The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
  • ncdf4/1.17-foss-2019b ncdf4: Interface to Unidata netCDF (version 4 or earlier) format data files
  • netCDF/4.9.2-gompi-2023b NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
  • netMHCpan/4.1b The NetMHCpan software predicts binding of peptides to any known MHC molecule using artificial neural networks (ANNs).
  • netcdf4-python/1.6.4-foss-2023a Python/numpy interface to netCDF.
  • nf-core/3.3.1-foss-2024a Python package with helper tools for the nf-core community.
  • nullarbor/2.0.20191013 Pipeline to generate complete public health microbiology reports from sequenced isolates
  • numexpr/2.7.1-foss-2019b-Python-2.7.16 The numexpr package evaluates multiple-operator array expressions many times faster than NumPy can. It accepts the expression as a string, analyzes it, rewrites it more efficiently, and compiles it on the fly into code for its internal virtual machine (VM). Due to its integrated just-in-time (JIT) compiler, it does not require a compiler at runtime.
  • oncosnpseq/2.01 OncoSNP-SEQ is an analytical tool for characterising copy number alterations and loss-of-heterozygosity (LOH) events in cancer samples from whole genome sequencing data.
  • ont-guppy-cpu/2.3.7 Guppy is a production basecaller provided by Oxford Nanopore, and uses a command-line interface.
  • openpyxl/3.1.2-GCCcore-12.3.0 A Python library to read/write Excel 2010 xlsx/xlsm files
  • packmol/20.2.2-GCC-10.2.0 Packing Optimization for Molecular Dynamics Simulations
  • parallel-fastq-dump/0.6.7-gompi-2020b parallel fastq-dump wrapper
  • pblat/2.5.1-GCC-11.2.0 Parallel blat based on Jim Kent’s blat
  • philosopher/3.3.11 Philosopher provides easy access to third-party tools and custom algorithms allowing users to develop proteomics analysis, from Peptide Spectrum Matching to annotated protein reports. Philosopher is also tuned for Open Search analysis, providing a modified version of the prophets for peptide validation and protein inference. To this date, Philosopher is the only proteomics toolkit that allows you to process and analyze close and open search results.
  • picard/2.25.1-Java-11 A set of tools (in Java) for working with next generation sequencing data in the BAM format.
  • pipseeker/3.2.0 PIPseeker(TM) analyzes single-cell data obtained with Fluent BioSciences’ proprietary PIPseq™ 3ʹ Single Cell RNA (scRNA-seq) Kits.
  • plink/1.9-20200616 Whole-genome association analysis toolset
  • plinkliftover/0.3.0-foss-2022b PLINKLiftOver is a utility enabling liftOver to work on genomics files from PLINK, allowing one to update the coordinates from one genome reference version to another.
  • polars/0.20.2-gfbf-2023a Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust and this module provides its interface for Python.
  • popscle/0.1-beta-foss-2019b A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
  • pplacer/1.1.alpha19 Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis.
  • prodigal/2.6.3-GCCcore-11.2.0 Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee.
  • prokka/1.14.5-gompi-2020b Prokka is a software tool for the rapid annotation of prokaryotic genomes.
  • pugixml/1.14-GCCcore-12.3.0 pugixml is a light-weight C++ XML processing library
  • pyBigWig/0.3.23-gfbf-2023b A python extension, written in C, for quick access to bigBed files and access to and creation of bigWig files.
  • pyEGA3/5.0.2-GCCcore-12.3.0 A basic Python-based EGA download client
  • pyGenomeTracks/3.5-foss-2019b-Python-3.7.4 pyGenomeTracks aims to produce high-quality genome browser tracks that are highly customizable.
  • pySCENIC/20250316-foss-2022b pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
  • pybedtools/0.9.1-foss-2023a pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python.
  • pycisTarget/1.0.2-foss-2022b pycistarget is a python module to perform motif enrichment analysis in sets of regions with different tools and identify high confidence TF cistromes.
  • pyclone/0.13.1-foss-2019b-Python-2.7.16 PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination.
  • pyclone-vi/0.1.0-foss-2019b-Python-3.7.4 PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination.
  • pyfaidx/0.8.1.2-GCCcore-13.3.0 pyfaidx: efficient pythonic random access to fasta subsequences
  • pyrovelocity/0.4.0-beta.4-foss-2023a-CUDA-12.1.1 is a library for probabilistic inference in minimal models approximating gene expression dynamics from, possibly multimodal, single-cell sequencing data. It provides posterior estimates of gene expression parameters, predictive estimates of gene expression states, and local estimates of cell state transition probabilities.
  • rMATS-turbo/4.1.2-foss-2021b rMATS turbo is the C/Cython version of rMATS (refer to http://rnaseq-mats.sourceforge.net).
  • redis-py/4.6.0-foss-2022b The Python interface to the Redis key-value store.
  • samblaster/0.1.26-GCC-10.2.0 samblaster is a fast and flexible program for marking duplicates in read-id grouped1 paired-end SAM files.
  • savvy/2.0.1-GCC-10.2.0 Interface to various variant calling formats.
  • scArches/0.6.1-foss-2023a Single-cell architecture surgery (scArches) is a package for reference-based analysis of single-cell data.
  • scCODA/0.1.9-foss-2023a scCODA allows for identification of compositional changes in high-throughput sequencing count data, especially cell compositions from scRNA-seq.
  • scGPT/0.2.1-foss-2023a-CUDA-12.1.1 scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI.
  • scVelo/0.3.1-foss-2023a scVelo is a scalable toolkit for estimating and analyzing RNA velocities in single cells using dynamical modeling.
  • scanpy/1.10.4-foss-2024a Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
  • scenicplus/1.0.0-foss-2022b SCENIC+ is a python package to build enhancer driven gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
  • scib/1.1.4-foss-2023a Benchmarking atlas-level data integration in single-cell genomics.
  • scib-metrics/0.5.1-foss-2023a Accelerated and Python-only metrics for benchmarking single-cell integration outputs
  • scikit-bio/0.6.0-foss-2023a scikit-bio is an open-source, BSD-licensed Python 3 package providing data structures, algorithms and educational resources for bioinformatics.
  • scikit-learn/1.6.1-gfbf-2024a Scikit-learn integrates machine learning algorithms in the tightly-knit scientific Python world, building upon numpy, scipy, and matplotlib. As a machine-learning module, it provides versatile tools for data mining and analysis in any field of science and engineering. It strives to be simple and efficient, accessible to everybody, and reusable in various contexts.
  • scipy/1.4.1-foss-2019b-Python-3.7.4 SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension for Python.
  • scrublet/0.2.3-foss-2021b Python code for identifying doublets in single-cell RNA-seq data
  • scvi-tools/1.1.2-foss-2023a scvi-tools (single-cell variational inference tools) is a package for probabilistic modeling and analysis of single-cell omics data, built on top of PyTorch and AnnData.
  • seq2HLA/2.3-foss-2019b-Python-2.7.16 In-silico method written in Python and R to determine HLA genotypes of a sample. seq2HLA takes standard RNA-Seq sequence reads in fastq format as input, uses a bowtie index comprising all HLA alleles and outputs the most likely HLA class I and class II genotypes (in 4 digit resolution), a p-value for each call, and the expression of each class.
  • seqtk/1.3-GCC-10.2.0 Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.
  • seqtools/4.44.1-foss-2019b The SeqTools package contains three tools for visualising sequence alignments: Blixem, Dotter and Belvu.
  • sequenza-utils/3.0.0-GCCcore-8.3.0-Python-3.7.4 Sequenza is a software for the estimation and quantification of purity/ploidy and copy number alteration in sequencing experiments of tumor samples. Sequenza-utils provide command lines programs to transform common NGS file format
  • shrinkwrap/1.1.0-GCCcore-10.2.0 A std::streambuf wrapper for compression formats.
  • snippy/4.6.0-foss-2019b-Perl-5.30.0 Snippy finds SNPs between a haploid reference genome and your NGS sequence reads. It will find both substitutions (snps) and insertions/deletions (indels). Rapid haploid variant calling and core genome alignment.
  • soap-hla/1.0.0-GCCcore-12.2.0 SOAP-HLA is a flow of sequencing data analysis pipeline to type all of the HLA genes in IMGT/HLA database using capture sequenced data or WGS data with high accuracy.
  • spams/2.6.5.4-foss-2021b SPAMS (SPArse Modeling Software) is an optimization toolbox for solving various sparse estimation problems.
  • spglib-python/1.16.0-fosscuda-2020b Spglib for Python. Spglib is a library for finding and handling crystal symmetries written in C.
  • splitpipe/1.3.1-foss-2023b splitpipe tool from Parse Biosciences. The pipeline takes FASTQ files and delivers processed data in the form of a cell-gene count matrix, which serves as the input for various open sources tools such as scanpy and seuratProcess sequencing results with our pipeline. A Parse Bioscience login ID is required to download
  • spoa/4.1.0-GCC-13.3.0 Spoa (SIMD POA) is a c++ implementation of the partial order alignment (POA) algorithm which is used to generate consensus sequences
  • starcode/1.4-GCC-11.2.0 Starcode is a DNA sequence clustering software. Starcode clustering is based on all pairs search within a specified Levenshtein distance (allowing insertions and deletions), followed by a clustering algorithm: Message Passing, Spheres or Connected Components.
  • statsmodels/0.14.4-gfbf-2024a Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.
  • svaba/1.3.0-GCC-13.3.0 SvABA
  • sympy/1.12-gfbf-2023a SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.
  • tabix/0.2.6-GCCcore-8.3.0 Generic indexer for TAB-delimited genome position files
  • tbl2asn/20220427-linux64 Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank
  • unixODBC/2.3.12-GCC-13.2.0 unixODBC provides a uniform interface between application and database driver
  • vcflib/1.0.1-GCCcore-8.3.0 vcflib provides methods to manipulate and interpret sequence variation as it can be described by VCF. The Variant Call Format (VCF) is a flat-file, tab-delimited textual format intended to concisely describe reference-indexed genetic variations between individuals.
  • velocyto.R/0.6-foss-2019b-R-4.0.2 velocyto (velox + κύτος, quick cell) is a package for the analysis of expression dynamics in single cell RNA seq data. In particular, it enables estimations of RNA velocities of single cells by distinguishing unspliced and spliced mRNAs in standard single-cell RNA sequencing protocols (see pre-print below for more information).
  • vt/0.57721-foss-2019b A tool set for short variant discovery in genetic sequence data.
  • wandb/0.16.1-GCC-12.3.0 CLI and Python API for Weights and Biases (wandb), a tool for visualizing and tracking your machine learning experiments.
  • xarray/2023.9.0-gfbf-2023a xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures.
  • zarr/2.18.4-foss-2024a Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing.