Summary and Schedule
Nextflow is workflow management software which enables the writing of scalable and reproducible scientific workflows. It can integrate various software package and environment management systems such as Docker, Singularity, and Conda. It allows for existing pipelines written in common scripting languages, such as R and Python, to be seamlessly coupled together. It implements a Domain Specific Language (DSL) that simplifies the implementation and running of workflows on cloud or high-performance computing (HPC) infrastructures.
This lesson also introduces nf-core: a community-driven platform, which provide peer reviewed best practice analysis pipelines written in Nextflow.
This lesson motivates the use of Nextflow and nf-core as development tools for building and sharing reproducible data science workflows.
lesson objectives
- The learner will understand the fundamental components of a Nextflow script, including channels, processes and operators.
- The learner will write a multi-step workflow script to align, quantify, and perform QC on an RNA-Seq data in Nextflow DSL2.
- The learner will be able to write a Nextflow configuration file to alter the computational resources allocated to a process.
- The learner will use nf-core to run a community curated pipeline.
Prerequisites
This is an intermediate lesson and assumes familiarity with the core materials covered in the Software Carpentry Lessons. In particular learners need to be familiar with material covered in The Unix Shell. It is helpful to be familiar with using another programming language, to the level of Plotting and Programming in Python or R for Reproducible Scientific Analysis, although this lesson does not specifically rely on Python or R. No previous knowledge of Nextflow, other workflow software, or Groovy is required.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Getting Started with Nextflow |
What is a workflow and what are workflow management systems? Why should I use a workflow management system? What is Nextflow? What are the main features of Nextflow? What are the main components of a Nextflow script? How do I run a Nextflow script? |
Duration: 00h 40m | 2. Workflow parameterisation |
How can I change the data a workflow uses? How can I parameterise a workflow? How can I add my parameters to a file? |
Duration: 01h 05m | 3. Channels |
How do I move data around in Nextflow? How do I handle different types of input, e.g. files and parameters? How do I create a Nextflow channel? How can I use pattern matching to select input files? How do I change the way inputs are handled? |
Duration: 01h 45m | 4. Processes |
How do I run tasks/processes in Nextflow? How do I get data, files and values, into a processes? |
Duration: 02h 30m | 5. Processes Part 2 |
How do I get data, files, and values, out of processes? How do I handle grouped input and output? How can I control when a process is executed? How do I control resources, such as number of CPUs and memory, available to processes? How do I save output/results from a process? |
Duration: 03h 10m | 6. Workflow |
How do I connect channels and processes to create a workflow? How do I invoke a process inside a workflow? |
Duration: 03h 50m | 7. Operators |
How do I perform operations, such as filtering, on channels? What are the different kinds of operations I can perform on channels? How do I combine operations? How can I use a CSV file to process data into a Channel? |
Duration: 04h 30m | 8. Reporting |
How do I get information about my pipeline run? How can I see what commands I ran? How can I create a report from my run? |
Duration: 04h 55m | 9. Nextflow configuration |
What is the difference between the workflow implementation and the
workflow configuration? How do I configure a Nextflow workflow? How do I assign different resources to different processes? How do I separate and provide configuration for different computational systems? How do I change configuration settings from the default settings provided by the workflow? |
Duration: 05h 40m | 10. Workflow caching and checkpointing |
How can I restart a Nextflow workflow after an error? How can I add new data to a workflow without starting from the beginning? Where can I find intermediate data and results? |
Duration: 06h 10m | 11. Simple RNA-Seq pipeline |
How can I create a Nextflow pipeline from a series of unix commands and
input data? How do I log my pipelines parameters? How can I manage my pipeline software requirements? How do I know when my pipeline has finished? How do I see how much resources my pipeline has used? |
Duration: 07h 10m | 12. Deploying nf-core pipelines |
Where can I find best-practice Nextflow bioinformatic pipelines? How do I run nf-core pipelines? How do I configure nf-core pipelines to use my data? How do I reference nf-core pipelines? |
Duration: 07h 50m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Setup
There are two options presented here on how setup your computer to complete the exercises in this workshop.
- Running locally on your personal computer
- Running the exercises in your browser using a remote environment called Gitpod.
Running locally on your personal computer
Training directory
Each learner should setup a training folder
e.g. nf-training
There are three items that you need to download:
- The training software.
- The training dataset.
- The workshop scripts.
Training software
A list of software with version required for this training is listed below:
Software | Version |
---|---|
Nextflow | 20.10.0 |
nf-core/tools | 1.12.1 |
salmon | 1.5 |
fastqc | 0.11 |
multiqc | 1.10 |
python | 3.8 |
conda
The simplest way to install the software for this course is using conda.
To install conda see here.
An environment file is provided here environment.yml
BASH
# You can use either wget or curl to download content from the web via the command line.
# wget
wget https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml
# curl
curl -L -o environment.yml https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml
To create the training environment run:
Then activate the environment by running
Training scripts
To aid in the delivery of the lesson, the scripts mentioned in each episode, can be found in the respective episode folders in the github repository. https://github.com/carpentries-incubator/workflows-nextflow/tree/main/episodes/files/scripts
To get the scripts associated with each episode you will need to download the scripts folder from the github repository.
Below is a series of commands to download and unpack scripts folder.
BASH
# get the gitrepo as a zip file
wget https://github.com//carpentries-incubator/workflows-nextflow/archive/main.zip
#or
curl -L -o main.zip https://github.com//carpentries-incubator/workflows-nextflow/archive/main.zip
# unzip the script file
unzip main.zip 'workflows-nextflow-main/episodes/files/scripts*' -d .
# mv the scripts folder to the nf-training folder
mv workflows-nextflow-main/episodes/files/scripts .
# remove the zip file and the git repo
rm -r workflows-nextflow-main main.zip
The nextflow scripts for each episode, can be found in the respective episode folders inside this the scripts folder.
Data
Inside the nf-training
folder download the workshop
dataset from Figshare, https://figshare.com/articles/dataset/RNA-seq_training_dataset/14822481
BASH
wget --content-disposition https://ndownloader.figshare.com/files/28531743
# or curl
curl -L -o data.tar.gz https://ndownloader.figshare.com/files/28531743
Unpack gzipped tar file:
Visual Studio Code editor setup
Any text editor can be used to write Nextflow scripts. A recommended code editor is Visual Studio Code.
Go to Visual Studio Code and you should see a download button. The button or buttons should be specific to your platform and the download package should be installable.
Nextflow language support in Visual Studio Code
You can add Nextflow language support in Visual Studio Code by clicking the install button on the Nextflow language extension.
Nextflow install without conda
Nextflow can be used on any POSIX-compatible system (Linux, macOS, etc), and on Windows through WSL. It requires Bash 3.2 (or later) and Java 11 (or later, up to 22) to be installed
Nextflow installation
Install the latest version of Nextflow copy & pasting the following snippet in a terminal window:
Running exercises remotely in your web-browser through Gitpod.
Gitpod is a cloud-based computing environment that is accessed using your web-browser. You can click the button below to open up a Gitpod instance ready for training. This Gitpod environment comes with the tools necessary for the exercises already installed. You’ll be presented with a VSCode-like interface in your browser, which has a file explorer panel on the left, a main panel in which to view and edit files, and a panel below that includes a terminal in which to run unix commands.
Gitpod sessions automatically close after some period of inactivity. To open your session again, go to the Gitpod Dashboard where you can find and reopen any session.
Gitpod gives each user 10 hours usage allocation per month ( 50 if you connect your LinkedIn account).