Getting Started with Nextflow


  • A workflow is a sequence of tasks that process a set of data.
  • A workflow management system (WfMS) is a computational platform that provides an infrastructure for the set-up, execution and monitoring of workflows.
  • Nextflow is a workflow management system that comprises both a runtime environment and a domain specific language (DSL).
  • Nextflow scripts comprise of channels for controlling inputs and outputs, and processes for defining workflow tasks.
  • You run a Nextflow script using the nextflow run command.

Workflow parameterisation


  • Pipeline parameters are specified by prepending the prefix params to a variable name, separated by dot character.
  • To specify a pipeline parameter on the command line for a Nextflow run use --variable_name syntax.
  • You can add parameters to a JSON formatted file and pass them to the script using option -params-file.

Channels


  • Channels must be used to import data into Nextflow.
  • Nextflow has two different kinds of channels: queue channels and value channels.
  • Data in value channels can be used multiple times in workflow.
  • Data in queue channels are consumed when they are used by a process or an operator.
  • Channel factory methods, such as Channel.of, are used to create channels.
  • Channel factory methods have optional parameters e.g., checkIfExists, that can be used to alter the creation and behaviour of a channel.

Processes


  • A Nextflow process is an independent step in a workflow.
  • Processes contain up to five definition blocks including: directives, inputs, outputs, when clause and finally a script block.
  • The script block contains the commands you would like to run.
  • A process should have a script but the other four blocks are optional.
  • Inputs are defined in the input block with a type qualifier and a name.

Processes Part 2


  • Outputs to a process are defined using the output blocks.
  • You can group input and output data from a process using the tuple qualifier.
  • The execution of a process can be controlled using the when declaration and conditional statements.
  • Files produced within a process and defined as output can be saved to a directory using the publishDir directive.

Workflow


  • A Nextflow workflow is defined by invoking processes inside the workflow scope.
  • A process is invoked like a function inside the workflow scope passing any required input parameters as arguments. e.g. FASTQC(reads_ch).
  • Process outputs can be accessed using the out attribute for the respective process object or assigning the output to a Nextflow variable.
  • Multiple outputs from a single process can be accessed using the list syntax [] and it’s index or by referencing the a named process output .

Operators


  • Nextflow operators are methods that allow you to modify, set or view channels.
  • Operators can be separated in to several groups; filtering , transforming , splitting , combining , forking and Maths operators
  • To use an operator use the dot notation after the Channel object e.g. my_ch.view().
  • You can parse text items emitted by a channel, that are formatted using the CSV format, using the splitCsv operator.

Reporting


  • Nextflow can produce a custom execution report with run information using the log command.
  • You can generate a report using the -t option specifying a template file.

Nextflow configuration


  • Nextflow configuration can be managed using a Nextflow configuration file.
  • Nextflow configuration files are plain text files containing a set of properties.
  • You can define process specific settings, such as cpus and memory, within the process scope.
  • You can assign different resources to different processes using the process selectors withName or withLabel.
  • You can define a profile for different configurations using the profiles scope. These profiles can be selected when launching a pipeline execution by using the -profile command-line option
  • Nextflow configuration settings are evaluated in the order they are read-in.

Workflow caching and checkpointing


  • Nextflow automatically keeps track of all the processes executed in your pipeline via checkpointing.
  • Nextflow caches intermediate data in task directories within the work directory.
  • Nextflow caching and checkpointing allows re-entrancy into a workflow after a pipeline error or using new data, skipping steps that have been successfully executed.
  • Re-entrancy is enabled using the -resume option.

Simple RNA-Seq pipeline


  • Nextflow can combined tasks (processes) and manage data flows using channels into a single pipeline/workflow.
  • A Workflow can be parameterise using params . These value of the parameters can be captured in a log file using log.info
  • Nextflow can handle a workflow’s software requirements using several technologies including the conda package and enviroment manager.
  • Workflow steps are connected via their inputs and outputs using Channels.
  • Intermediate pipeline results can be transformed using Channel operators such as combine.
  • Nextflow can execute an action when the pipeline completes the execution using the workflow.onComplete event handler to print a confirmation message.
  • Nextflow is able to produce multiple reports and charts providing several runtime metrics and execution information using the command line options -with-report, -with-trace, -with-timeline and produce a graph using -with-dag.

Deploying nf-core pipelines


  • nf-core is a community-led project to develop a set of best-practice pipelines built using the Nextflow workflow management system.
  • The nf-core tool (nf-core) is a suite of helper tools that aims to help people run and develop nf-core pipelines.
  • nf-core pipelines can be found using nf-core list, or by checking the nf-core website.
  • nf-core launch nf-core/<pipeline> can be used to write a parameter file for an nf-core pipeline. This can be supplied to the pipeline using the -params-file option.
  • An nf-core workflow is run using nextflow run nf-core/<pipeline> syntax.
  • nf-core pipelines can be reconfigured by using custom config files and/or adding command line parameters.