PacBio offers a full suite of tools to analyze single molecule sequencing data. Bundled together in the SMRT Analysis package, the tools support a broad range of applications including:
- De novo assembly
- Genome finishing and scaffolding
- DNA base modification detection
- Bacterial methylome and motif analysis
- Minor variant detection
- Compound mutations and phasing of distant SNPs
- Highly accurate consensus calling with variant detection
Here are some of the software algorithms included in SMRT Analysis.
The Hierarchical Genome Assembly Process (HGAP) generates high quality (>99.999% accurate) de novo assemblies using a single PacBio library prep. HGAP consists of pre-assembly, de novo assembly with Celera Assembler, and assembly polishing with Quiver.
A highly accurate consensus and variant caller that can generate 99.999% accurate consensus sequences using local realignment and the full range of quality scores associated with PacBio reads.
BLASR (Basic Local Alignment with Successive Refinement)
rapidly maps reads to genomes by finding the highest scoring local alignment or set of local alignments between the read and the genome. Optimized for PacBio's extraordinarily long reads and taking advantage of rich quality values, BLASR maps reads rapidly with high accuracy.
AHA (A Hybrid Assembler) uses PacBio's exceptionally long reads to improve existing assemblies and fill in gaps.
Allora (A Long Read Assembler) is a PacBio de novo assembly algorithm optimized for bacterial and BAC assembly using ultra long reads. Allora uses a traditional overlap-layout-consensus approach based on the open source software package AMOS, tuned to PacBio's long reads and quality values.
Rare and Compound Variants
A minor variant and compound mutation detection algorithm that finds individual and correlated SNPs in mixed samples and reports the percent of reads containing each combination.
3rd Party Tools
PacBio redistributes selected 3rd party tools that work well with single molecule long reads:
- GATK's Unified Genotyper for haploid and diploid variant calling
- Celera® Assembler for scalable genome assembly of PacBio long reads and Circular Consensus Sequencing reads, or of PacBio long reads combined with short reads from Illumina®, 454®, Ion Torrent® or Sanger®.
More information and video explanations of each algorithm is available on DevNet