chorus Bio Modules 24.04
- AGAT/1.4.0-GCC-12.3.0 AGAT: Another GTF/GFF Analysis Toolkit. Suite of tools to handle gene annotations in any GTF/GFF format.
- AlphaFold/2.3.2-foss-2023a-CUDA-12.1.1 AlphaFold can predict protein structures with atomic accuracy even where no similar structure is known
- Arriba/2.4.0-GCC-12.2.0 Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data. It was developed for the use in a clinical research setting. Therefore, short runtimes and high sensitivity were important design criteria.
- BAli-Phy/4.0-beta8-gfbf-2022b easyconfig BAli-Phy estimates multiple sequence alignments and evolutionary trees from DNA, amino acid, or codon sequences.
- BCFtools/1.19-GCC-13.2.0 easyconfig Samtools is a suite of programs for interacting with high-throughput sequencing data. BCFtools
- BEDTools/2.31.0-GCC-12.3.0 BEDTools: a powerful toolset for genome arithmetic. The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM.
- BLAST+/2.14.1-gompi-2023a Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
-
BWA/0.7.17-GCCcore-12.2.0 Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.
- BamTools/2.5.2-GCC-12.3.0 BamTools provides both a programmer’s API and an end-user’s toolkit for handling BAM files.
- Beast/10.5.0-beta3-GCC-12.3.0-beagle-lib-4.0.1-CUDA-12.1.1 BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.
- Beast2/2.7.7-GCC-12.3.0-beagle-lib-4.0.1-CUDA-12.1.1 easyconfig BEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.
- BindCraft/1.1.0-foss-2023a Simple binder design pipeline using AlphaFold2 backpropagation, MPNN, and PyRosetta. Select your target and let the script do the rest of the work and finish once you have enough designs to order!
- Bio-DB-HTS/3.01-GCC-13.3.0 Read files using HTSlib including BAM/CRAM, Tabix and BCF database files
- BioPerl/1.7.8-GCCcore-12.2.0 Bioperl is the product of a community effort to produce Perl code which is useful in biology. Examples include Sequence objects, Alignment objects and database searching objects.
- Biopython/1.84-foss-2024a easyconfig Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics.
- Bismark/0.24.1-GCC-12.2.0 easyconfig A tool to map bisulfite converted sequence reads and determine cytosine methylation states
- Bowtie2/2.5.4-GCC-13.2.0 easyconfig Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
- CITE-seq-Count/1.4.4-foss-2023b-Python-3.11.5 easyconfig A python package that allows to count antibody TAGS from a CITE-seq and/or cell hashing experiment.
- CRISPResso2/2.3.1-foss-2023b easyconfig CRISPResso2 is a software pipeline designed to enable rapid and intuitive interpretation of genome editing experiments.
- Cassiopeia/2.0.0-foss-2023a A Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction.
- Cbc/2.10.11-foss-2023a Cbc (Coin-or branch and cut) is an open-source mixed integer linear programming solver written in C++. It can be used as a callable library or using a stand-alone executable.
-
CellBender/0.3.0-foss-2023a CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
- CellRanger/8.0.0 easyconfig Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.
- CellRanger-ATAC/2.1.0 Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
- CellRank/2.0.2-foss-2023a-CUDA-12.1.1 CellRank is a toolkit to uncover cellular dynamics based on Markov state modeling of single-cell data. It contains two main modules: kernels compute cell-cell transition probabilities and estimators generate hypothesis based on these.
- Cgl/0.60.8-foss-2023b The COIN-OR Cut Generation Library (Cgl) is a collection of cut generators that can be used with other COIN-OR packages that make use of cuts, such as, among others, the linear solver Clp or the mixed integer linear programming solvers Cbc or BCP. Cgl uses the abstract class OsiSolverInterface (see Osi) to use or communicate with a solver. It does not directly call a solver.
- CheckM2/1.1.0-foss-2024a Assessing the quality of metagenome-derived genome bins using machine learning
- Clp/1.17.9-foss-2023a Clp (Coin-or linear programming) is an open-source linear programming solver. It is primarily meant to be used as a callable library, but a basic, stand-alone executable version is also available.
- Cogent_NGS_Immune_Profiler/v2.0-foss-2024a Cogent NGS Immune Profiler (CogentIP) is software designed to analyze sequence data stored in FASTQ files generated by Illumina sequencers from libraries prepared using certain Takara Bio immune profiling kits.
- CoinUtils/2.11.10-GCC-12.3.0 CoinUtils (Coin-OR Utilities) is an open-source collection of classes and functions that are generally useful to more than one COIN-OR project.
- CrossMap/0.7.3-foss-2023b easyconfig CrossMap is a program for genome coordinates conversion between different assemblies (such as hg18 (NCBI36) <=> hg19 (GRCh37)). It supports commonly used file formats including BAM, CRAM, SAM, Wiggle, BigWig, BED, GFF, GTF and VCF.
- DIAMOND/2.1.11-GCC-13.3.0 Accelerated BLAST compatible local sequence aligner
- DendroPy/4.6.1-GCCcore-12.3.0 A Python library for phylogenetics and phylogenetic computing: reading, writing, simulation, processing and manipulation of phylogenetic trees (phylogenies) and characters.
- EPA-ng/0.3.8-GCC-12.3.0 EPA-ng
- ESM-2/2.0.0-foss-2023a-CUDA-12.1.1 ESM-2 outperforms all tested single-sequence protein language models across a range of structure prediction tasks. ESMFold harnesses the ESM-2 language model to generate accurate structure predictions end to end directly from the sequence of a protein.
- Eigen/3.4.0-GCCcore-13.3.0 Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
- FASTA/36.3.8i-GCC-12.2.0 The FASTA programs find regions of local or global (new) similarity between protein or DNA sequences, either by searching Protein or DNA databases, or by identifying local duplications within a sequence.
- FLAIR/2.0-foss-2023a easyconfig FLAIR (Full-Length Alternative Isoform analysis of RNA) for the correction, isoform definition, and alternative splicing analysis of noisy reads. FLAIR has primarily been used for nanopore cDNA, native RNA, and PacBio sequencing reads.
- FastQC/0.12.1-Java-11 FastQC is a quality control application for high throughput sequence data. It reads in sequence data in a variety of formats and can either provide an interactive application to review the results of several different QC checks, or create an HTML based report which can be integrated into a pipeline.
- FastTree/2.1.11-GCCcore-12.3.0 FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory.
- Flax/0.8.4-gfbf-2023a Flax is a high-performance neural network library and ecosystem for JAX that is designed for flexibility: Try new forms of training by forking an example and by modifying the training loop, not by adding features to a framework.
- GATK/4.4.0.0-GCCcore-12.2.0-Java-17 The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
- GEOS/3.12.2-GCC-13.3.0 GEOS (Geometry Engine
-
GMP/6.3.0-GCCcore-13.3.0 GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers.
- GROMACS/2024.4-foss-2023b GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
This is a CPU only build, containing both MPI and threadMPI binaries for both single and double precision.
It also contains the gmxapi extension for the single precision MPI build.
- HH-suite/3.3.0-gompi-2023a The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).
- HMMER/3.4-gompi-2024a HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models. In the past, this strength came at significant computational expense, but in the new HMMER3 project, HMMER is now essentially as fast as BLAST.
- HTSlib/1.21-GCC-13.3.0 A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix
- IgBLAST/1.22.0-x64-linux easyconfig IgBLAST faclilitates the analysis of immunoglobulin and T cell receptor variable domain sequences.
- InChI/1.07.1-GCC-13.3.0 The IUPAC International Chemical Identifier (InChI TM) is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations.
- Infernal/1.1.4-foss-2022b Infernal (“INFERence of RNA ALignment”) is for searching DNA sequence databases for RNA structure and sequence similarities.
- JAGS/4.3.2-foss-2024a JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation
- Kalign/3.4.0-GCCcore-12.3.0 Kalign is a fast multiple sequence alignment program for biological sequences.
- Kent_tools/468-GCC-12.3.0 easyconfig Kent utilities: collection of tools used by the UCSC genome browser.
- Kraken2/2.1.3-gompi-2022b easyconfig Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
- LightGBM/4.6.0-foss-2024a A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
- MACS2/2.2.9.1-foss-2022b easyconfig Model Based Analysis for ChIP-Seq data
- MACS3/3.0.1-gfbf-2023a Model Based Analysis for ChIP-Seq data
- MAFFT/7.520-GCC-12.3.0-with-extensions MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
-
METIS/5.1.0-GCCcore-12.3.0 METIS is a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. The algorithms implemented in METIS are based on the multilevel recursive-bisection, multilevel k-way, and multi-constraint partitioning schemes.
- MPC/1.3.1-GCCcore-13.2.0 Gnu Mpc is a C library for the arithmetic of complex numbers with arbitrarily high precision and correct rounding of the result. It extends the principles of the IEEE-754 standard for fixed precision real floating point numbers to complex numbers, providing well-defined semantics for every operation. At the same time, speed of operation at high precision is a major design goal.
-
MPFR/4.2.1-GCCcore-13.2.0 The MPFR library is a C library for multiple-precision floating-point computations with correct rounding.
- MUMPS/5.6.1-foss-2022b-metis A parallel sparse direct solver
-
MUMmer/4.0.0rc1-GCCcore-12.3.0 MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. AMOS makes use of it.
- MUSCLE/5.1.0-GCCcore-12.3.0 easyconfig MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds. Most users learn everything they need to know about MUSCLE in a few minutes-only a handful of command-line options are needed to perform common alignment tasks.
- MultiQC/1.21-foss-2023a easyconfig Aggregate results from bioinformatics analyses across many samples into a single report.
MultiQC searches a given directory for analysis logs and compiles a HTML report. It’s a general use tool, perfect for summarising the output from numerous bioinformatics tools.
- OpenMM/8.0.0-foss-2023a-CUDA-12.1.1 easyconfig OpenMM is a toolkit for molecular simulation.
- Osi/0.108.9-GCC-12.3.0 Osi (Open Solver Interface) provides an abstract base class to a generic linear programming (LP) solver, along with derived classes for specific solvers. Many applications may be able to use the Osi to insulate themselves from a specific LP solver. That is, programs written to the OSI standard may be linked to any solver with an OSI interface and should produce correct results. The OSI has been significantly extended compared to its first incarnation. Currently, the OSI supports linear programming solvers and has rudimentary support for integer programming.
- PICRUSt2/2.5.2-foss-2023a PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a software for predicting functional abundances based only on marker gene sequences.
- ParMETIS/4.0.3-gompi-2023a ParMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS and includes routines that are especially suited for parallel AMR computations and large scale numerical simulations. The algorithms implemented in ParMETIS are based on the parallel multilevel k-way graph-partitioning, adaptive repartitioning, and parallel multi-constrained partitioning schemes.
- Porechop/0.2.4-GCCcore-12.3.0 easyconfig Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity
-
PyRosetta/4.release-387-gompi-2023a PyRosetta is an interactive Python-based interface to the powerful Rosetta molecular modeling suite. It enables users to design their own custom molecular modeling algorithms using Rosetta sampling methods and energy functions.
- PyTorch/2.1.2-foss-2023a-CUDA-12.1.1 easyconfig Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework that puts Python first.
- PyTorch-bundle/2.1.2-foss-2023a-CUDA-12.1.1 easyconfig PyTorch with compatible versions of official Torch extensions.
- Pysam/0.22.1-GCC-13.3.0 Pysam is a python module for reading and manipulating Samfiles. It’s a lightweight wrapper of the samtools C-API. Pysam also includes an interface for tabix.
- QIIME2/2024.5.0-foss-2023a QIIME 2 is a powerful, extensible, and decentralized microbiome bioinformatics platform that is free, open source, and community developed.
-
Qhull/2020.2-GCCcore-12.2.0 Qhull computes the convex hull, Delaunay triangulation, Voronoi diagram, halfspace intersection about a point, furthest-site Delaunay triangulation, and furthest-site Voronoi diagram. The source code runs in 2-d, 3-d, 4-d, and higher dimensions. Qhull implements the Quickhull algorithm for computing the convex hull.
- R-bundle-Bioconductor/3.20-foss-2024a-R-4.4.2 easyconfig Bioconductor provides tools for the analysis and coprehension of high-throughput genomic data.
- RAPIDS/24.4-foss-2023a-CUDA-12.1.1 RAPIDS provides unmatched speed with familiar APIs that match the most popular PyData libraries. Built on state-of-the-art foundations like NVIDIA CUDA and Apache Arrow, it unlocks the speed of GPUs with code you already know.
- RDKit/2024.03.5-foss-2024a RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.
- SAMtools/1.21-GCC-13.3.0 SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
- SCOTCH/7.0.4-gompi-2023b Software package and libraries for sequential and parallel graph partitioning, static mapping, and sparse matrix block ordering, and sequential mesh and hypergraph partitioning.
- STAR/2.7.11b-GCC-13.2.0 easyconfig STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.
- STAR-Fusion/1.12.0-foss-2022b easyconfig STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set.
- SVclone/1.1.2-foss-2022b easyconfig Cluster structural variants of similar cancer cell fraction (CCF).
- SYMPHONY/5.7.2-foss-2023b easyconfig SYMPHONY is an open-source solver for mixed-integer linear programs (MILPs) written in C.
- Sambamba/1.0.1-GCC-13.2.0 easyconfig Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.
- Seaborn/0.13.2-gfbf-2023a Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- Shapely/2.0.1-gfbf-2023a Shapely is a BSD-licensed Python package for manipulation and analysis of planar geometric objects. It is based on the widely deployed GEOS (the engine of PostGIS) and JTS (from which GEOS is ported) libraries.
- Sniffles/2.5.2-GCC-13.3.0 easyconfig A fast structural variant caller for long-read sequencing, Sniffles2 accurately detect SVs on germline, somatic and population-level for PacBio and Oxford Nanopore read data.
- Telescope/1.0.3-20230222-gfbf-2022b easyconfig Single locus resolution of Transposable ELEment expression using next-generation sequencing.
- UMI-tools/1.1.4-foss-2023b easyconfig Tools for handling Unique Molecular Identifiers in NGS data sets
- UniFrac/1.4-foss-2023a UniFrac is the de facto repository for high-performance phylogenetic diversity calculations. The methods in this repository are based on an implementation of the Strided State UniFrac algorithm which is faster, and uses less memory than Fast UniFrac. Strided State UniFrac supports Unweighted UniFrac, Weighted UniFrac, Generalized UniFrac, Variance Adjusted UniFrac and meta UniFrac, in both double and single precision (fp32). This repository also includes Stacked Faith (manuscript in preparation), a method for calculating Faith’s PD that is faster and uses less memory than the Fast UniFrac-based reference implementation.
- VEP/113.3-GCC-13.3.0 Variant Effect Predictor (VEP) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. Includes EnsEMBL-XS, which provides pre-compiled replacements for frequently used routines in VEP.
- VSEARCH/2.25.0-GCC-12.3.0 VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.
- anndata/0.11.3-foss-2024a easyconfig anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray
- bam-readcount/1.0.1-GCC-12.2.0 easyconfig Count DNA sequence reads in BAM files
-
biom-format/2.1.15-foss-2023a The BIOM file format (canonically pronounced biome) is designed to be a general-use format for representing biological sample by observation contingency tables. BIOM is a recognized standard for the Earth Microbiome Project and is a Genomics Standards Consortium supported project.
- bx-python/0.13.0-foss-2023b easyconfig The bx-python project is a Python library and associated set of scripts to allow for rapid implementation of genome scale analyses.
-
castor/1.8.2-foss-2023a Efficient phylogenetic analyses on massive phylogenies comprising up to millions of tips. Functions include pruning, rerooting, calculation of most-recent common ancestors, calculating distances from the tree root and calculating pairwise distances. Calculation of phylogenetic signal and mean trait depth (trait conservatism), ancestral state reconstruction and hidden character prediction of discrete characters, simulating and fitting models of trait evolution, fitting and simulating diversification models, dating trees, comparing trees, and reading/writing trees in Newick format.
- cooler/0.10.2-foss-2023a Cooler is a support library for a storage format, also called cooler, used to store genomic interaction data of any size, such as Hi-C contact matrices.
- cooltools/0.7.1-foss-2024a easyconfig cooltools provides a suite of computational tools with a paired python API and command line access, which facilitates workflows either on high-performance computing clusters or via custom analysis notebooks. As part of the Open2C ecosystem, cooltools also provides detailed introductions to key concepts in Hi-C-data analysis with interactive notebook documentation.
- cutadapt/4.9-GCCcore-12.3.0 Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
- dorado/0.9.1-foss-2023a-CUDA-12.1.1 Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.
- easel/0.48-GCC-12.2.0 easyconfig Easel supports computational analysis of biological sequences using probabilistic models.
- edlib/1.3.9.post1-GCC-13.3.0 Lightweight, super fast library for sequence alignment using edit (Levenshtein) distance.
- einops/0.7.0-GCCcore-12.3.0 Flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, jax, and others.
- fastp/0.23.4-GCC-13.2.0 easyconfig A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance.
- fhR/4.4.0-foss-2023b easyconfig R is a free software environment for statistical computing and graphics.
-
gappa/0.8.5-GCC-12.3.0 gappa is a collection of commands for working with phylogenetic data. Its main focus are evolutionary placements of short environmental sequences on a reference phylogenetic tree. Such data is typically produced by tools like EPA-ng, RAxML-EPA or pplacer and usually stored in jplace files.
- gffread/0.12.7-GCCcore-12.2.0 GFF/GTF parsing utility providing format conversions, region filtering, FASTA sequence extraction and more.
- gmpy2/2.1.5-GCC-13.2.0 GMP/MPIR, MPFR, and MPC interface to Python 2.6+ and 3.x
- jax/0.4.25-gfbf-2023a Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
- kallisto/0.50.1-foss-2022b easyconfig kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
-
libcerf/2.3-GCCcore-12.3.0 libcerf is a self-contained numeric library that provides an efficient and accurate implementation of complex error functions, along with Dawson, Faddeeva, and Voigt functions.
- loompy/3.0.8-foss-2024a Python implementation of the Loom file format, an efficient file format for large omics datasets
- magma/2.7.2-foss-2023a-CUDA-12.1.1 The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current Multicore+GPU systems.
- minimap2/2.26-GCCcore-12.3.0 Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR). At present, it works efficiently with query sequences from a few kilobases to ~100 megabases in length at an error rate ~15%. Minimap2 outputs in the PAF or the SAM format. On limited test data sets, minimap2 is over 20 times faster than most other long-read aligners. It will replace BWA-MEM for long reads and contig alignment.
- ncbi-vdb/3.1.1-gompi-2023b easyconfig The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
- nf-core/2.14.1-foss-2024a Python package with helper tools for the nf-core community.
- prodigal/2.6.3-GCCcore-13.3.0 Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee.
- pyBigWig/0.3.23-gfbf-2023b easyconfig A python extension, written in C, for quick access to bigBed files and access to and creation of bigWig files.
- pyEGA3/5.0.2-GCCcore-12.3.0 A basic Python-based EGA download client
- pybedtools/0.9.1-foss-2023a pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python.
- pyfaidx/0.8.1.2-GCCcore-13.3.0 pyfaidx: efficient pythonic random access to fasta subsequences
- samblaster/0.1.26-GCC-13.2.0 easyconfig samblaster is a fast and flexible program for marking duplicates in read-id grouped1 paired-end SAM files.
- scGPT/0.2.1-foss-2023a-CUDA-12.1.1 easyconfig scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI.
- scVelo/0.3.1-foss-2023a scVelo is a scalable toolkit for estimating and analyzing RNA velocities in single cells using dynamical modeling.
-
scanpy/1.10.4-foss-2024a easyconfig Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
- scikit-bio/0.6.0-foss-2023a scikit-bio is an open-source, BSD-licensed Python 3 package providing data structures, algorithms and educational resources for bioinformatics.
- scvi-tools/1.1.2-foss-2023a-CUDA-12.1.1 scvi-tools (single-cell variational inference tools) is a package for probabilistic modeling and analysis of single-cell omics data, built on top of PyTorch and AnnData.
- skani/0.2.2-GCCcore-12.3.0 skani
- smithlab/1.2-GCC-13.3.0 easyconfig A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix
- spams/2.6.5.4-foss-2023b easyconfig SPAMS (SPArse Modeling Software) is an optimization toolbox for solving various sparse estimation problems.
- spektral/1.3.1-foss-2023a-CUDA-12.1.1 Spektral is a Python library for graph deep learning, based on the Keras API and TensorFlow 2
- splitpipe/1.2.1-foss-2023b splitpipe tool from Parse Biosciences. The pipeline takes FASTQ files and delivers processed data in the form of a cell-gene count matrix, which serves as the input for various open sources tools such as scanpy and seuratProcess sequencing results with our pipeline.
A Parse Bioscience login ID is required to download
- starcode/1.4-GCC-13.2.0 Starcode is a DNA sequence clustering software. Starcode clustering is based on all pairs search within a specified Levenshtein distance (allowing insertions and deletions), followed by a clustering algorithm: Message Passing, Spheres or Connected Components.
- statsmodels/0.14.4-gfbf-2024a Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.
- sympy/1.12-gfbf-2023a SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.
- wandb/0.16.1-GCC-12.3.0 CLI and Python API for Weights and Biases (wandb), a tool for visualizing and tracking your machine learning experiments.