Below is a list of the software applications installed at CCR and used in various bioinformatics & genomics disciplines.  We do not maintain documentation for each software package but, where possible, have provided a link to the developer's website for your reference.


If you're looking for something that is not on this list, it may be installed.  To see all the software installed on the cluster, run the following command when logged in:

module avail


If you'd like to have software installed, please submit a ticket to CCR Help

  • Bedtools – List under Next Generation Sequence Analysis

  • Bioconductor - provides tools for the analysis and comprehension of high-throughput genomic data.  Bioconductor uses the R statistical programming language, and is open source and open development.

  • Bowtie - an ultrafast, memory-efficient short read aligner.  It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour.  Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small.

  • BLAST (Basic Local Alignment Search Tool) - BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.


  • BWA - A software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.

  • Cufflinks – Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

  • EIGENSOFT - software suite for population genetics methods and the EIGENSTRAT stratification method.

  • EMBOSS - "The European Molecular Biology Open Software Suite" is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. 

  • ensembl-vep - A tool used to determine the effect of variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

  • GATK - Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

  • HTSeq –  Python package that provides infrastructure to process data from high-throughput sequencing assays.

  • lumpy-sv - A general probabilistic framework for structural variant discovery.

  • MACS -  MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites.

  • MATLAB - A high-performance language for technical computing.  It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notations.

  • MrBayes -  A program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.

  • MuTect - A software developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.

  • NgsRelate - A tool used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods.

  • picard - A set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

  • plink - A free, open-source whole genome association analysis tool set, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

  • R statistical package - R is a software environment for statistical computing and graphics.

  • Rosetta & PyRosetta - The Rosetta software suite includes algorithms for computational modeling and analysis of protein structures.  Free for non-commercial use but must be licensed by the research group.  More details can be found on their website

  • Samtools - A suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:
    • BCFtools -  Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP  and short indel sequence variants
    • samtools -  Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
    • HTSlib -  A C library for reading/writing high-throughput sequencing data

  • speedseq - A flexible framework for rapid genome analysis and interpretation.

  • STAR - "Spliced Transcripts Alignment to a Reference"  is a tool for aligning reads from mRNA-Seq experiments to reference genomes

  • tophat - A fast splice junction mapper for RNA-Seq reads.  It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. 

  • velvet - A de novo genomic assembler sepcifically designed for short read sequencing technologies

  • VCFtools - A program package designed for working with VCF files. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.