Using nextflow to create Bioinformatics Pipelines (Part 1 — Why use Nextflow?)

Faraz Ahmed
3 min readMar 23, 2022
nextflow.io

Howdy folks! I am writing this to document and share my journey as I learn and implement nextflow into my bioinformatics workflows.

What is nextflow?

Nextflow is a Domain-Specific Language (DSL) that is developed by the friendly folks at the Seqera Labs. It is created specifically to handle common bioinformatics data-sets and data-structures and has many tricks up its sleeve that allows for fast prototyping and deployments on any platform.

Who is it for?

If you work in a scenario where you need to process hundred’s of DNA/RNA sequenced samples every week then nextflow is for you!

Many bioinformatics tasks or processes that you run, for example running fastqc are sequential by default and will eat away a lot of compute time as you wait to run the next step in your workflow. e.g. to run fastqc on all of your fastq files, you may use a for loop like below to get the results.

for i in *.fastq.gz ; do fastqc $i ; done

Note that this loop is sequential, meaning it will only process one file at a time and if fastqc takes 10 minutes to process one fastq file, it can add up very quickly as you process hundreds of samples altogether.

--

--