BAMStats: an interactive desktop GUI tool for summarising Next Generation Sequencing alignments



Downloads

Precompiled executables and source code can be downloaded from the SourceForge download page.

The latest source code can be downloaded with:
svn checkout http://svn.code.sf.net/p/bamstats/code/trunk bamstats

Introduction

Mapping is a prerequisite for most next generation sequencing workflows and the SAM/BAM file format (1) is the de facto standard for storing such large sequence alignments. BAMStats, is a simple software tool built on the Picard Java API (2), which  can calculate and graphically display various metrics derived from SAM/BAM files of value in QC assessments.

Implementation & availability

BAMStats is written in the Java programming language (Java 1.6) and is available, both as pre-compiled executables, and as source code, from from the SourceForge download page. BAMstats is released as open source software under the terms of the GNU General Public License.

Usage and system requirements

Two BAMstats jar executables providing command line (CLI) and graphic user (GUI) interfaces. For example, running the GUI with 6 GB of memory allocated:

java -Xmx6g -jar BAMStats-GUI-1.25.jar

Running the equivalent command line tool:

java -Xmx4g -jar BAMStats-1.25.jar -i <bam file>

Between 2 and 8 GB of memory, depending on reference size, is typically required (specified with the -Xmx flag).

Figure 1. BAMstats screenshot.  In this example, a BAM file encapsulating the mapping of AB SOLiD reads to a reference composed of 26 separate scaffolds is presented. The panel on the right summarises the descriptive statistics of the selected scaffold 7.

Input

BAMStats accepts sorted SAM or BAM files. A bed or gtf formatted feature file, providing information on specific regions of interest (e.g. exons, bait regions, etc.) can also be loaded alongside the SAM/BAM file if required (Fig. 2).

Output

BAMStats provides descriptive statistics for coverage, start positions, MAPQ values, mapped read lengths and edit distances. Metrics are given per reference sequence as described in the SAM/BAM file. Metrics can also be generated for individual features such as exon or bait region, if a suitable bed or gtf file is provided.
GUI output is as illustrated in Fig. 1 and 2. Spreadsheets can be exported in Excel format.
CLI output can either be as text to command line standard output, or in html format for display as a web page.