Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Background

Rastair is a command-line tool that allows the simultaneous detection of genetic variants and methylated positions from short-read sequencing data that was generated using a "mod-C→T" method, such as TAPS or Illumina's 5Base technology.

Traditional bisulfite sequencing (BS-seq) converts all non-modified cytosine (C) to thymine (T). This results in reads that differ substantially from the reference and are thus harder to align. Coverting most C to T also reduces the available information for variant identification. While several tools have been developed to overcome this problem, genetic variant calls from BS-seq remain substantially worse than those derived from whole-genome sequencing data.

In contrast, mod-C→T methods only affect around 60M positions in the human genome, equivalent to only approx. 2% of all nucleotides. This leads to greatly improved sequencing quality, higher mapping rates, and better yield from low-input DNA. It also makes it possible to identify genetic variation - in addition to epigenetic changes - with much higher accuracy. Rastair implements a fast and accurate algorithm to simultaneously provide such high-quality variant and methylation calls.

Latest Updates

Loading posts from @rastair@genomic.social...

Performance

Rastair SNP calls on TAPS+ data

Rastair achieves similar variant-calling accuracy for SNP positions from TAPS+ and 5-Base data as state-of-the-art tools on "pure" whole-genome sequencing data, and significantly better than other tools built for TAPS+ or Bisulfite-seq data.

Meanwhile, rastair is significantly faster than other callers with comparable accuracy:

Calling times

Rastair on 5-Base data

Rastair produces substantially fewer false-positives - at comparable sensitivity - than Illumina's DRAGEN 5-Base pipeline:

Variant call overlapMethylation overlap
Figure 2A, based on file "Demo-5base-gDNA-Sample9-NA12878-100ng-B-F01.hard-filtered.vcf.gz" provided by IlluminaFigure 2B, based on Demo-5base-gDNA-Sample9-NA12878-100ng-B-F01.CX_report.txt.gz provided by Illumina

The Venn diagram on the left shows the overlap of SNPs called by rastair, Illumina's DRAGEN 5-Base pipeline, and the "Genome In A Bottle" truth set. Rastair produces fewer false positives, at the expense of slightly lower sensitivity. F1 Score DRAGEN: 0.899. F1 Score rastair: 0.906

On the right, we plot the agreement in estimated beta between rastair (y-axis) and DRAGEN (x-axis). The straight line off the diagonal with an intercept at DRAGEN beta=0.5 represent heterozygous C>T (and G>A) SNPs where Ts (and As) that are in fact genetic variants are incorrectly counted as methylation. Rastair corrects for this, thus lowering the estimated beta at those loci. There is also a subset of positions where dragen estimates full methylation (beta=1) where rastair estimates beta=0: these are homozygous C>T/G>A SNPs.

License

Rastair is free for academic and other non-commercial use, and the code is available on bitbucket. You can read the details of the license here.

Info

For commercial entities that would like to use rastair beyond internal evaluation, please contact enquiries@innovation.ox.ac.uk quoting reference 24811.

Quick start

Installation

We provide pre-built binaries for Linux (x86), Mac (Apple Silicon) and Mac (Intel). We also provide a docker image. Conda integration is still work in progress, but will happen soon. For build instructions and more details, see the installation page.

Usage

Call methylation at all CpG positions (including CpGs formed by SNPs) from a bam file and output as a tabix-indexed bed file:

rastair call --bed output.bed.gz -r reference.fasta.gz input.bam

Tip

By default, rastair will use all available CPU cores. You can restrict this with -@ 1.

Rastair can also produce variant and methylation calls in VCF format:

rastair call --vcf output.vcf.gz -r reference.fasta.gz input.bam

For a more in-depth look at different use-cases of rastair with practical examples, see the examples section. For an explanation of the output file formats, see BED and VCF sections.

Get help

You can file an issue or question on our issue tracker over on bitbucket!

Citing rastair

A publication for rastair is in progress. We will update this page with a reference to the biorxiv preprint shortly.