Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

BED Format

Rastair can output BED files of two different kinds:

  1. CpG sites: A file containing all CpG sites with their methylation status. Generated using the call command and specifying --bed (or using convert).
  2. Per-read methylation: A file containing the methylation status of each CpG site for each read. Generated using the per-read command.

CpG Sites

The BED file for CpG sites contains the following columns:

ColumnDescription
chromChromosome name
startStart position of the CpG site (0-based)
endEnd position of the CpG site (1-based)
nameName of the CpG site (e.g., "CpG1")
beta_estEstimated beta value for methylation (empty string if not present)
strandStrand information (e.g., "+", "-")
unmodNumber of unmethylated reads
modNumber of methylated reads
no_snpNumber of reads not counting as SNPs
snpNumber of reads counting as SNPs
coverageTotal coverage at the CpG site
genotypeC/C, C/T, G/G, G/A, T/T, or A/A
gt_p_scoreP-value for the genotype call
gt_conf_scoreConfidence score for the genotype call
cpgREF if CpG site occurs in reference genome, NEW if it is a de-novo CpG site

Per-Read Methylation

The BED file for per-read methylation contains the following columns:

ColumnDescription
chrChromosome name
startStart position
endEnd position
read_idName of read
mapqMapq of read
orientationOrientation of read, either + or -
insert_sizeAbsolute fragment length (non-directional)
read_lengthRead length
flagFlag of read (decimal, same as in BAM)
num_cpgNumber of CpGs in a read
num_modNumber of modified CpGs
mod_cpgsPositions in read of modified CpGs
unmod_cpgsPositions in read of unmodified CpGs
snp_cpgsPositions in read that are SNPs (mutated)
mod_denovosPositions in read of de-novo CpG that are mutated
unmod_denovosPositions in read of de-novo CpG that are mutated

Note: The positions in reads take indels into account, meaning that the positions are relative to the read, not the reference genome. If --count-clipped is set, it will also include leading clipped bases.