Intro to Variant Call Format
|
A VCF is a table with samples in columns and SNPs (or other variants) in rows.
FORMAT fields contain variant-by-sample data pertaining to genotype calls.
INFO fields contain statistics about each variant.
|
Bioconductor basics
|
FaFile creates a pointer to a reference genome file on your computer.
An index file allows quick access to specific information from large files.
GRanges stores positions within a genome for any type of feature (SNP, exon, etc.)
DNAStringSet stores DNA sequences.
SummarizedExperiment stores the results of a set of assays across a set of samples.
|
Importing a VCF into Bioconductor
|
Index the VCF file with indexTabix if you plan to only import certain ranges.
Use filterVcf to filter variants to a new file without importing data into R.
Use ScanVcfParam to specify which fields, samples, and genomic ranges you want to import.
|
Running statistics on SNP markers
|
|
Working with genome annotations
|
Genome annotations can either be stored as GRanges imported with rtracklayer, or as TxDb imported with GenomicFeatures.
Functions that find overlaps between GRanges objects can be used to identify genes near SNPs.
The predictCoding function in VariantAnnotation identifies amino acid changes caused by SNPs.
|