Bio Modules 24.04

AGAT/1.4.0-GCC-12.3.0 AGAT: Another GTF/GFF Analysis Toolkit. Suite of tools to handle gene annotations in any GTF/GFF format.
AlphaFold/2.3.2-foss-2023a-CUDA-12.1.1 AlphaFold can predict protein structures with atomic accuracy even where no similar structure is known
Arriba/2.4.0-GCC-12.2.0 Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data. It was developed for the use in a clinical research setting. Therefore, short runtimes and high sensitivity were important design criteria.
BAli-Phy/4.0-beta8-gfbf-2022b easyconfig BAli-Phy estimates multiple sequence alignments and evolutionary trees from DNA, amino acid, or codon sequences.
BCFtools/1.19-GCC-13.2.0 easyconfig Samtools is a suite of programs for interacting with high-throughput sequencing data. BCFtools
BEDTools/2.31.0-GCC-12.3.0 BEDTools: a powerful toolset for genome arithmetic. The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM.
BLAST+/2.14.0-gompi-2022b Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
BWA/0.7.17-GCCcore-12.2.0 Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.
BamTools/2.5.2-GCC-12.3.0 BamTools provides both a programmer’s API and an end-user’s toolkit for handling BAM files.
Beast/10.5.0-beta3-GCC-12.3.0-beagle-lib-4.0.1-CUDA-12.1.1 BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.
Beast2/2.7.7-GCC-12.3.0-beagle-lib-4.0.1-CUDA-12.1.1 easyconfig BEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.
Bio-DB-HTS/3.01-GCC-12.2.0 easyconfig Read files using HTSlib including BAM/CRAM, Tabix and BCF database files
BioPerl/1.7.8-GCCcore-12.3.0 Bioperl is the product of a community effort to produce Perl code which is useful in biology. Examples include Sequence objects, Alignment objects and database searching objects.
Biopython/1.84-foss-2023b Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics.
Bismark/0.24.1-GCC-12.2.0 easyconfig A tool to map bisulfite converted sequence reads and determine cytosine methylation states
Bowtie2/2.5.4-GCC-13.2.0 easyconfig Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
CITE-seq-Count/1.4.4-foss-2023b-Python-3.11.5 easyconfig A python package that allows to count antibody TAGS from a CITE-seq and/or cell hashing experiment.
CRISPResso2/2.3.1-foss-2023b easyconfig CRISPResso2 is a software pipeline designed to enable rapid and intuitive interpretation of genome editing experiments.
Cassiopeia/2.0.0-foss-2023a A Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction.
Cbc/2.10.11-foss-2023a Cbc (Coin-or branch and cut) is an open-source mixed integer linear programming solver written in C++. It can be used as a callable library or using a stand-alone executable.
CellBender/0.3.0-foss-2023a CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
CellRank/2.0.2-foss-2023a-CUDA-12.1.1 CellRank is a toolkit to uncover cellular dynamics based on Markov state modeling of single-cell data. It contains two main modules: kernels compute cell-cell transition probabilities and estimators generate hypothesis based on these.
Cgl/0.60.8-foss-2023b The COIN-OR Cut Generation Library (Cgl) is a collection of cut generators that can be used with other COIN-OR packages that make use of cuts, such as, among others, the linear solver Clp or the mixed integer linear programming solvers Cbc or BCP. Cgl uses the abstract class OsiSolverInterface (see Osi) to use or communicate with a solver. It does not directly call a solver.
Clp/1.17.9-foss-2023b easyconfig Clp (Coin-or linear programming) is an open-source linear programming solver. It is primarily meant to be used as a callable library, but a basic, stand-alone executable version is also available.
CoinUtils/2.11.10-GCC-12.3.0 CoinUtils (Coin-OR Utilities) is an open-source collection of classes and functions that are generally useful to more than one COIN-OR project.
Eigen/3.4.0-GCCcore-13.3.0 Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
FASTA/36.3.8i-GCC-12.2.0 The FASTA programs find regions of local or global (new) similarity between protein or DNA sequences, either by searching Protein or DNA databases, or by identifying local duplications within a sequence.
FastQC/0.12.1-Java-11 FastQC is a quality control application for high throughput sequence data. It reads in sequence data in a variety of formats and can either provide an interactive application to review the results of several different QC checks, or create an HTML based report which can be integrated into a pipeline.
Flax/0.8.4-gfbf-2023a-CUDA-12.1.1 Flax is a high-performance neural network library and ecosystem for JAX that is designed for flexibility: Try new forms of training by forking an example and by modifying the training loop, not by adding features to a framework.
GATK/4.4.0.0-GCCcore-12.2.0-Java-17 The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
GEOS/3.12.1-GCC-13.2.0 easyconfig GEOS (Geometry Engine
GMP/6.3.0-GCCcore-13.2.0 GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers.
HH-suite/3.3.0-gompi-2023a The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).
HMMER/3.4-gompi-2024a HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models. In the past, this strength came at significant computational expense, but in the new HMMER3 project, HMMER is now essentially as fast as BLAST.
HTSlib/1.19.1-GCC-13.2.0 easyconfig A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix
IgBLAST/1.22.0-x64-linux easyconfig IgBLAST faclilitates the analysis of immunoglobulin and T cell receptor variable domain sequences.
InChI/1.07.1-GCC-13.3.0 The IUPAC International Chemical Identifier (InChI TM) is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations.
Infernal/1.1.4-foss-2022b Infernal (“INFERence of RNA ALignment”) is for searching DNA sequence databases for RNA structure and sequence similarities.
JAGS/4.3.2-foss-2023b easyconfig JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation
Kalign/3.4.0-GCCcore-12.3.0 Kalign is a fast multiple sequence alignment program for biological sequences.
Kent_tools/468-GCC-12.3.0 easyconfig Kent utilities: collection of tools used by the UCSC genome browser.
Kraken2/2.1.3-gompi-2022b easyconfig Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
MACS2/2.2.9.1-foss-2022b easyconfig Model Based Analysis for ChIP-Seq data
MACS3/3.0.0-foss-2022b easyconfig Model Based Analysis for ChIP-Seq data
METIS/5.1.0-GCCcore-12.3.0 METIS is a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. The algorithms implemented in METIS are based on the multilevel recursive-bisection, multilevel k-way, and multi-constraint partitioning schemes.
MPC/1.3.1-GCCcore-13.2.0 Gnu Mpc is a C library for the arithmetic of complex numbers with arbitrarily high precision and correct rounding of the result. It extends the principles of the IEEE-754 standard for fixed precision real floating point numbers to complex numbers, providing well-defined semantics for every operation. At the same time, speed of operation at high precision is a major design goal.
MPFR/4.2.1-GCCcore-13.2.0 The MPFR library is a C library for multiple-precision floating-point computations with correct rounding.
MUMPS/5.6.1-foss-2022b-metis A parallel sparse direct solver
MUMmer/4.0.0rc1-GCCcore-12.3.0 MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. AMOS makes use of it.
MUSCLE/5.1.0-GCCcore-12.3.0 easyconfig MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds. Most users learn everything they need to know about MUSCLE in a few minutes-only a handful of command-line options are needed to perform common alignment tasks.
MultiQC/1.21-foss-2023a easyconfig Aggregate results from bioinformatics analyses across many samples into a single report.

MultiQC searches a given directory for analysis logs and compiles a HTML report. It’s a general use tool, perfect for summarising the output from numerous bioinformatics tools.

OpenMM/8.0.0-foss-2023a-CUDA-12.1.1 easyconfig OpenMM is a toolkit for molecular simulation.
Osi/0.108.9-GCC-13.2.0 easyconfig Osi (Open Solver Interface) provides an abstract base class to a generic linear programming (LP) solver, along with derived classes for specific solvers. Many applications may be able to use the Osi to insulate themselves from a specific LP solver. That is, programs written to the OSI standard may be linked to any solver with an OSI interface and should produce correct results. The OSI has been significantly extended compared to its first incarnation. Currently, the OSI supports linear programming solvers and has rudimentary support for integer programming.
Porechop/0.2.4-GCCcore-12.3.0 easyconfig Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity
PyTorch/2.1.2-foss-2023a Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework that puts Python first.
PyTorch-bundle/2.1.2-foss-2023a-CUDA-12.1.1 easyconfig PyTorch with compatible versions of official Torch extensions.
Pysam/0.22.1-GCC-13.3.0 Pysam is a python module for reading and manipulating Samfiles. It’s a lightweight wrapper of the samtools C-API. Pysam also includes an interface for tabix.
Qhull/2020.2-GCCcore-13.2.0 Qhull computes the convex hull, Delaunay triangulation, Voronoi diagram, halfspace intersection about a point, furthest-site Delaunay triangulation, and furthest-site Voronoi diagram. The source code runs in 2-d, 3-d, 4-d, and higher dimensions. Qhull implements the Quickhull algorithm for computing the convex hull.
R-bundle-Bioconductor/3.18-foss-2023a-R-4.3.2 easyconfig Bioconductor provides tools for the analysis and coprehension of high-throughput genomic data.
RAPIDS/24.4-foss-2023a-CUDA-12.1.1 RAPIDS provides unmatched speed with familiar APIs that match the most popular PyData libraries. Built on state-of-the-art foundations like NVIDIA CUDA and Apache Arrow, it unlocks the speed of GPUs with code you already know.
RDKit/2024.03.5-foss-2024a RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.
SAMtools/1.19.2-GCC-13.2.0 easyconfig SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
SCOTCH/7.0.3-gompi-2023a Software package and libraries for sequential and parallel graph partitioning, static mapping, and sparse matrix block ordering, and sequential mesh and hypergraph partitioning.
STAR/2.7.11b-GCC-13.2.0 easyconfig STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.
STAR-Fusion/1.12.0-foss-2022b easyconfig STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set.
SVclone/1.1.2-foss-2022b easyconfig Cluster structural variants of similar cancer cell fraction (CCF).
SYMPHONY/5.7.2-foss-2023b easyconfig SYMPHONY is an open-source solver for mixed-integer linear programs (MILPs) written in C.
Sambamba/1.0.1-GCC-13.2.0 easyconfig Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.
Seaborn/0.13.2-gfbf-2023b easyconfig Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
Shapely/2.0.1-gfbf-2023a Shapely is a BSD-licensed Python package for manipulation and analysis of planar geometric objects. It is based on the widely deployed GEOS (the engine of PostGIS) and JTS (from which GEOS is ported) libraries.
Sniffles/2.5.2-GCC-13.3.0 easyconfig A fast structural variant caller for long-read sequencing, Sniffles2 accurately detect SVs on germline, somatic and population-level for PacBio and Oxford Nanopore read data.
Telescope/1.0.3-20230222-gfbf-2022b easyconfig Single locus resolution of Transposable ELEment expression using next-generation sequencing.
UMI-tools/1.1.4-foss-2023b easyconfig Tools for handling Unique Molecular Identifiers in NGS data sets
anndata/0.10.7-foss-2023b easyconfig anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray
bam-readcount/1.0.1-GCC-12.2.0 easyconfig Count DNA sequence reads in BAM files
biom-format/2.1.15-foss-2023a The BIOM file format (canonically pronounced biome) is designed to be a general-use format for representing biological sample by observation contingency tables. BIOM is a recognized standard for the Earth Microbiome Project and is a Genomics Standards Consortium supported project.
cooler/0.10.2-foss-2023a Cooler is a support library for a storage format, also called cooler, used to store genomic interaction data of any size, such as Hi-C contact matrices.
cutadapt/4.4-GCCcore-12.2.0 Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
easel/0.48-GCC-12.2.0 easyconfig Easel supports computational analysis of biological sequences using probabilistic models.
edlib/1.3.9.post1-GCC-13.3.0 Lightweight, super fast library for sequence alignment using edit (Levenshtein) distance.
einops/0.7.0-GCCcore-12.3.0 Flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, jax, and others.
fastp/0.23.4-GCC-13.2.0 easyconfig A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance.
fhR/4.4.0-foss-2023b easyconfig R is a free software environment for statistical computing and graphics.
gffread/0.12.7-GCCcore-12.2.0 GFF/GTF parsing utility providing format conversions, region filtering, FASTA sequence extraction and more.
gmpy2/2.1.5-GCC-12.3.0 GMP/MPIR, MPFR, and MPC interface to Python 2.6+ and 3.x
jax/0.4.25-gfbf-2023a-CUDA-12.1.1 Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
kallisto/0.50.1-foss-2022b easyconfig kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
libcerf/2.3-GCCcore-12.2.0 libcerf is a self-contained numeric library that provides an efficient and accurate implementation of complex error functions, along with Dawson, Faddeeva, and Voigt functions.
loompy/3.0.7-foss-2023a Python implementation of the Loom file format, an efficient file format for large omics datasets
magma/2.7.2-foss-2023a-CUDA-12.1.1 The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current Multicore+GPU systems.
minimap2/2.26-GCCcore-12.3.0 Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR). At present, it works efficiently with query sequences from a few kilobases to ~100 megabases in length at an error rate ~15%. Minimap2 outputs in the PAF or the SAM format. On limited test data sets, minimap2 is over 20 times faster than most other long-read aligners. It will replace BWA-MEM for long reads and contig alignment.
ncbi-vdb/3.1.1-gompi-2023b easyconfig The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
pyEGA3/5.0.2-GCCcore-12.3.0 A basic Python-based EGA download client
pybedtools/0.9.1-foss-2023a pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python.
pyfaidx/0.8.1.1-GCCcore-12.3.0 pyfaidx: efficient pythonic random access to fasta subsequences
samblaster/0.1.26-GCC-13.2.0 easyconfig samblaster is a fast and flexible program for marking duplicates in read-id grouped1 paired-end SAM files.
scGPT/0.2.1-foss-2023a-CUDA-12.1.1 easyconfig scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI.
scVelo/0.3.1-foss-2023a scVelo is a scalable toolkit for estimating and analyzing RNA velocities in single cells using dynamical modeling.
scanpy/1.10.1-foss-2023b easyconfig Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
scikit-bio/0.6.0-foss-2023a scikit-bio is an open-source, BSD-licensed Python 3 package providing data structures, algorithms and educational resources for bioinformatics.
scvi-tools/1.1.2-foss-2023a-CUDA-12.1.1 scvi-tools (single-cell variational inference tools) is a package for probabilistic modeling and analysis of single-cell omics data, built on top of PyTorch and AnnData.
splitpipe/1.2.1-foss-2023b splitpipe tool from Parse Biosciences. The pipeline takes FASTQ files and delivers processed data in the form of a cell-gene count matrix, which serves as the input for various open sources tools such as scanpy and seuratProcess sequencing results with our pipeline.

A Parse Bioscience login ID is required to download

starcode/1.4-GCC-13.2.0 Starcode is a DNA sequence clustering software. Starcode clustering is based on all pairs search within a specified Levenshtein distance (allowing insertions and deletions), followed by a clustering algorithm: Message Passing, Spheres or Connected Components.
statsmodels/0.14.1-gfbf-2023a Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.
sympy/1.12-gfbf-2023a SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.
wandb/0.16.1-GCC-12.3.0 CLI and Python API for Weights and Biases (wandb), a tool for visualizing and tracking your machine learning experiments.