Member-only story

Using nextflow to create Bioinformatics Pipelines (Part 1 — Why use Nextflow?)

3 min readMar 23, 2022

Howdy folks! I am writing this to document and share my journey as I learn and implement nextflow into my bioinformatics workflows.

What is nextflow?

Nextflow is a Domain-Specific Language (DSL) that is developed by the friendly folks at the Seqera Labs. It is created specifically to handle common bioinformatics data-sets and data-structures and has many tricks up its sleeve that allows for fast prototyping and deployments on any platform.

Who is it for?

If you work in a scenario where you need to process hundred’s of DNA/RNA sequenced samples every week then nextflow is for you!

Many bioinformatics tasks or processes that you run, for example running fastqc are sequential by default and will eat away a lot of compute time as you wait to run the next step in your workflow. e.g. to run fastqc on all of your fastq files, you may use a for loop like below to get the results.

for i in *.fastq.gz ; do fastqc $i ; done

Note that this loop is sequential, meaning it will only process one file at a time and if fastqc takes 10 minutes to process one fastq file, it can add up very quickly as you process hundreds of samples altogether.

Using nextflow to create Bioinformatics Pipelines (Part 1 — Why use Nextflow?)

What is nextflow?

Who is it for?

Written by Faraz Ahmed

No responses yet