This lesson is being piloted (Beta version)

Variant Analysis Workshop

Attendees will learn how to read, filter, and interrogate variant calls that are in VCF (Variant Call Format) format with Bioconductor packages in R. Analyses may include some or all of the following topics: assessing minor allele frequency, missing data rate, heterozygosity, and/or linkage disequilibrium; working with genome annotations as TxDb objects; discovering what genes are near SNPs of interest; identifying functional consequences of variants. Although we will not perform genome-wide association studies (GWAS) in this workshop, we will cover many tasks that one might perform before or after GWAS.

Before the workshop starts, please follow the setup instructions.

This workshop was developed by Lindsay Clark at HPCBio, Roy J. Carver Biotechnology Center, University of Illinois, Urbana-Champaign, using a website template developed by The Carpentries.

To register to attend this workshop December 2 & 4, 2020, please see our Eventbrite page. Recordings of the workshop are available on Box.

For guidelines on how to help improve this lesson, please see the contribution guidelines.

Prerequisites

  • Beginner knowledge of bioinformatics concepts
    • Single nucleotide polymorphisms (SNPs)
    • Genome annotations
    • Variant and genotype calling
  • Beginner knowledge of R and Bioconductor
    • Installing packages from Bioconductor
    • Finding help pages
    • Importing data
    • Using functions
    • Indexing and subsetting data
    • Making simple functions
    • Scatter plots with ggplot

Schedule

Setup Download files required for the lesson
00:00 1. Intro to Variant Call Format How are genotypes and metadata stored in a VCF?
00:35 2. Bioconductor basics What are the Bioconductor classes for the types of data we would find in a VCF?
01:40 3. Importing a VCF into Bioconductor How can I import and filter a VCF with Bioconductor?
02:40 4. Running statistics on SNP markers How can I analyze a SNP dataset in R?
03:20 5. Working with genome annotations What genes are near my SNPs of interest?
04:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.