Background
Rastair is a command-line tool that allows the simultaneous detection of genetic variants and methylated positions from short-read sequencing data that was generated using a "mod-C→T" method, such as TAPS or Illumina's 5Base technology.
Traditional bisulfite sequencing (BS-seq) converts all non-modified cytosine (C) to thymine (T). This results in reads that differ substantially from the reference and are thus harder to align. Coverting most C to T also reduces the available information for variant identification. While several tools have been developed to overcome this problem, genetic variant calls from BS-seq remain substantially worse than those derived from whole-genome sequencing data.
In contrast, mod-C→T methods only affect around 60M positions in the human genome, equivalent to only approx. 2% of all nucleotides. This leads to greatly improved sequencing quality, higher mapping rates, and better yield from low-input DNA. It also makes it possible to identify genetic variation - in addition to epigenetic changes - with much higher accuracy. Rastair implements a fast and accurate algorithm to simultaneously provide such high-quality variant and methylation calls.
Latest Updates
Performance
Rastair SNP calls on TAPS+ data
Rastair achieves similar variant-calling accuracy for SNP positions from TAPS+ and 5-Base data as state-of-the-art tools on "pure" whole-genome sequencing data, and significantly better than other tools built for TAPS+ or Bisulfite-seq data.
Meanwhile, rastair is significantly faster than other callers with comparable accuracy:

Rastair on 5-Base data
Rastair produces substantially fewer false-positives - at comparable sensitivity - than Illumina's DRAGEN 5-Base pipeline:
| Variant call overlap | Methylation overlap |
|---|---|
![]() | ![]() |
The Venn diagram on the left shows the overlap of SNPs called by rastair, Illumina's DRAGEN 5-Base pipeline, and the "Genome In A Bottle" truth set. Rastair produces fewer false positives, at the expense of slightly lower sensitivity. F1 Score DRAGEN: 0.899. F1 Score rastair: 0.906
On the right, we plot the agreement in estimated beta between rastair (y-axis) and DRAGEN (x-axis). The straight line off the diagonal with an intercept at DRAGEN beta=0.5 represent heterozygous C>T (and G>A) SNPs where Ts (and As) that are in fact genetic variants are incorrectly counted as methylation. Rastair corrects for this, thus lowering the estimated beta at those loci. There is also a subset of positions where dragen estimates full methylation (beta=1) where rastair estimates beta=0: these are homozygous C>T/G>A SNPs.
License
Rastair is free for academic and other non-commercial use, and the code is available on bitbucket. You can read the details of the license here.
For commercial entities that would like to use rastair beyond internal evaluation, please contact enquiries@innovation.ox.ac.uk quoting reference 24811.
Quick start
Installation
We provide pre-built binaries for Linux (x86), Mac (Apple Silicon) and Mac (Intel). We also provide a docker image. Conda integration is still work in progress, but will happen soon. For build instructions and more details, see the installation page.
Usage
Call methylation at all CpG positions (including CpGs formed by SNPs) from a bam file and output as a tabix-indexed bed file:
rastair call --bed output.bed.gz -r reference.fasta.gz input.bam
Rastair can also produce variant and methylation calls in VCF format:
rastair call --vcf output.vcf.gz -r reference.fasta.gz input.bam
For a more in-depth look at different use-cases of rastair with practical examples, see the examples section. For an explanation of the output file formats, see BED and VCF sections.
Get help
You can file an issue or question on our issue tracker over on bitbucket!
Citing rastair
A publication for rastair is in progress. We will update this page with a reference to the biorxiv preprint shortly.

