Nextflow goes to university!

  • Marcel Ribeiro-Dantas
  • 24 July 2023

The Nextflow project originated from within an academic research group, so perhaps it’s no surprise that education is an essential part of the Nextflow and nf-core communities. Over the years, we have established several regular training resources: we have a weekly online seminar series called nf-core/bytesize and run hugely popular bi-annual Nextflow and nf-core community training online. In 2022, Seqera established a new community and growth team, funded in part by a grant from the Chan Zuckerberg Initiative “Essential Open Source Software for Science” grant. We are all former bioinformatics researchers from academia and part of our mission is to build resources and programs to support academic institutions. We want to help to provide leading edge, high-quality, Nextflow and nf-core training for Masters and Ph.D. students in Bioinformatics and other related fields.

We recently held one of our first such projects, a collaboration with the Bioinformatics Multidisciplinary Environment, BioME at the Federal University of Rio Grande do Norte (UFRN) in Brazil. The UFRN is one of the largest universities in Brazil with over 40,000 enrolled students, hosting one of the best-ranked bioinformatics programs in Brazil, attracting students from all over the country. The BioME department runs courses for Masters and Ph.D. students, including a flexible course dedicated to cutting-edge bioinformatics techniques. As part of this, we were invited to run an 8-day Nextflow and nf-core graduate course. Participants attended 5 days of training seminars and presented a Nextflow project at the end of the course. Upon successful completion of the course, participants received graduate program course credits as well as a Seqera Labs certified certificate recognizing their knowledge and hands-on experience 😎.

The course participants included one undergraduate student, Master’s students, Ph.D. students, and postdocs with very diverse backgrounds. While some had prior Nextflow and nf-core experience and had already attended Nextflow training, others had never used it. Unsurprisingly, they all chose very different project topics to work on and present to the rest of the group. At the end of the course, eleven students chose to undergo the final project evaluation for the Seqera certification. They all passed with flying colors!

Picture with some of the students that attended the course

Final projects

Final hands-on projects are very useful not only to practice new skills but also to have a tangible deliverable at the end of the course. It could be the first step of a long journey with Nextflow, especially if you work on a project that lives on after the course concludes. Participants were given complete freedom to design a project that was relevant to them and their interests. Many students were very satisfied with their projects and intend to continue working on them after the course conclusion.

Euryale 🐍

João Vitor Cavalcante, along with collaborators, had developed and published a Snakemake pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences called MEDUSA. During the course, after seeing the huge potential of Nextflow, he decided to fully translate this pipeline to Nextflow, but with a new name: Euryale. You can check the result here 😍 Why Euryale? In Greek mythology, Euryale was one of the three gorgons, a sister to Medusa 🤓

Bringing Nanopore to Google Batch ☁️

The Customer Workflows Group at Oxford Nanopore Technologies (ONT) has adopted Nextflow to develop and distribute general-purpose pipelines for its customers. One of these pipelines, wf-alignment, takes a FASTQ directory and a reference directory and outputs a minimap2 alignment, along with samtools stats and an HTML report. Both samtools stats and the HTML report generated by this pipeline are well suited for Nextflow Tower’s Reports feature. However, Danilo Imparato noticed that the pipeline lacked support for using Google Cloud as compute environment and decided to work on this limitation on his final project, which included fixing a few bugs specific to running it on Google Cloud and making the reports available on Nextflow Tower 🤯

Nextflow applied to Economics! 🤩

Galileu Nobre is studying Economical Sciences and decided to convert his scripts into a Nextflow pipeline for his final project. The goal of the pipeline is to estimate the demand for health services in Brazil based on data from the 2019 PNS (National Health Survey), (a) treating this database to contain only the variables we will work with, (b) running a descriptive analysis to determine the data distribution in order to investigate which models would be best applicable. In the end, two regression models, Poisson, and the Negative Binomial, are used to estimate the demand. His work is an excellent example of applying Nextflow to fields outside of traditional bioinformatics 😉.

Whole Exome Sequencing 🧬

For her final project, Rafaella Ferraz used nf-core/tools to write a whole-exome sequencing analysis pipeline from scratch. She applied her new skills using nf-core modules and sub-workflows to achieve this and was able to launch and monitor her pipeline using Nextflow Tower. Kudos to Rafaella! 👏🏻

RNASeq with contamination 🧫

In her final project, Iara Souza developed a bioinformatics pipeline that analyzed RNA-Seq data when it’s required to have an extra pre-filtering step. She needed this for analyzing data from RNA-Seq experiments performed in cell culture, where there is a high probability of contamination of the target transcriptome with the host transcriptome. Iara was able to learn how to use nf-core/tools and benefit from all the “batteries included” that come with it 🔋😬

SARS-CoV-2 Genome assembly and lineage classification 🦠

Diego Teixeira has been working with SARS-CoV-2 genome assembly and lineage classification. As his final project, he wrote a Nextflow pipeline aggregating all tools and analyses he’s been doing, allowing him to be much more efficient in his work and have a reproducible pipeline that can easily be shared with collaborators.

In the nf-core project, there are almost a thousand modules ready to plug in your pipeline, together with dozens of full-featured pipelines. However, in many situations, you’ll need a custom pipeline. With that in mind, it’s very useful to master the skills of Nextflow scripting so that you can take advantage of everything that is available, both building new pipelines and modifying public ones.

Exciting experience!

It was an amazing experience to see what each participant had worked on for their final projects! 🤯 They were all able to master the skills required to write Nextflow pipelines in real-life scenarios, which can continue to be used well after the end of the course. For people just starting their adventure with Nextflow, it can feel overwhelming to use nf-core tools with all the associated best practices, but students surprised me by using nf-core tools from the very beginning and having their project almost perfectly fitting the best practices 🤩

We’d love to help out with more university bioinformatics courses like this. If you think your institution could benefit from such an experience, please don’t hesitate to reach out to us at community@seqera.io. We would love to hear from you!

nextflow nf-core