This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting.
Task execution caching and checkpointing is an essential feature of any modern workflow manager and Nextflow provides an automated caching mechanism with every workflow execution. When using the
-resume flag, successfully completed tasks are skipped and the previously cached results are used in downstream tasks. But understanding the specifics of how it works and debugging situations when the behaviour is not as expected is a common source of frustration.
The mechanism works by assigning a unique ID to each task. This unique ID is used to create a separate execution directory, called the working directory, where the tasks are executed and the results stored. A task’s unique ID is generated as a 128-bit hash number obtained from a composition of the task’s:
The ability to create components, libraries or module files has been among the most requested feature ever over the years.
The implementation of this feature has opened the possibility for many fantastic improvements to Nextflow and its syntax. We are extremely excited as it results in a radical new way of writing Nextflow applications! So much so, that we are referring to these changes as DSL 2.
Since this is still a preview technology and, above all, to not break any existing applications, to enable the new syntax you will need to add the following line at the beginning of your workflow script:
A module file simply consists of one .. (click here to read more)
We are excited to announce the new Nextflow 19.04.0 stable release!
This version includes numerous bug fixes, enhancement and new features.
In this release, we are making the new interactive rich output using ANSI escape characters as the default logging option. This produces a much more readable and easy to follow log of the running workflow execution.
The ANSI log is implicitly disabled when the nextflow is launched in the background i.e. when using the
-bg option. It can also be explicitly disabled using the
-ansi-log false option or setting the
NXF_ANSI_LOG=false variable in your launching environment.
The support for NCBI SRA archive was introduced in the previous edge release. Given the very positive reaction, we are graduating this feature into the stable release for general availability.
It's time for the monthly Nextflow release for March, edge version 19.03. This is another great release with some cool new features, bug fixes and improvements.
This sees the introduction of the long-awaited sequence read archive (SRA) channel factory. The SRA is a key public repository for sequencing data and run in coordination between The National Center for Biotechnology Information (NCBI), The European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ).
This feature originates all the way back in 2015 and was worked on during a 2018 Nextflow hackathon. It was brought to fore again thanks to the release of Phil Ewels' excellent SRA Explorer. The SRA channel factory allows users to pull read data in FASTQ format directly from SRA by referencing a study, accession ID or even a keyword. It works in a similar way to
fromFilePairs, returning a sample ID .. (click here to read more)
Google Cloud and WuXi NextCODE are dedicated to advancing the state of the art in biomedical informatics, especially through open source, which allows developers to collaborate broadly and deeply.
WuXi NextCODE is itself a user of Nextflow, and Google Cloud has many customers that use Nextflow. Together, we’ve collaborated to deliver Google Cloud Platform (GCP) support for Nextflow using the Google Pipelines API. Pipelines API is a managed computing service that allows the execution of containerized workloads .. (click here to read more)
Today marks an important milestone in the Nextflow project. We are thrilled to announce three important changes to better meet users’ needs and ground the project on a solid foundation upon which to build a vibrant ecosystem of tools and data analysis applications for genomic research and beyond.
Nextflow was originally licensed as GPLv3 open source software more than five years ago. GPL is designed to promote the adoption and spread of open source software and culture. On the other hand it has also some controversial side-effects, such as the one on derivate works and legal implications which make the use of GPL released software a headache in many organisations. We have previously discussed these concerns in this blog post and, after community feedback, have opted to change the project license to Apache 2.0.
This is a popular permissive free software license written by .. (click here to read more)
One key feature of Nextflow is the ability to automatically pull and execute a workflow application directly from a sharing platform such as GitHub. We realised this was critical to allow users to properly track code changes and releases and, above all, to enable the seamless sharing of workflow projects.
Nextflow never wanted to implement its own centralised workflow registry because we thought that in order for a registry to be viable and therefore useful, it should be technology agnostic and it should be driven by a consensus among the wider user community.
This is exactly what the Dockstore project is designed for and for this reason we are thrilled to announce that Dockstore has just released the support for Nextflow workflows in its latest release! .. (click here to read more)
Over past week there was some discussion on social media regarding the Nextflow license and its impact on users' workflow applications.
… don’t use Nextflow, yo. https://t.co/Paip5W1wgG— Konrad Rudolph 👨🔬💻 (@klmr) July 10, 2018
GPL is generally considered toxic to companies due to fear of the viral nature of the license.— Jeff Gentry (@geoffjentry) July 10, 2018
Nextflow aims to ease the development of large scale, reproducible workflows allowing
developers to focus on the main application logic and to rely on best community tools and best practices.
For this reason we are very excited to announce that the latest Nextflow version (
0.30.0) finally provides built-in support for Conda.
Conda is a popular package manager that simplifies the installation of software packages and the configuration of complex software environments. Above all, it provides access to large tool and software package collections maintained by domain specific communities such as Bioconda and BioBuild.
The native integration with Nextflow allows researchers to develop workflow applications in a rapid and easy repeatable manner, reusing community tools, whilst taking advantage of the configuration flexibility, portability and scalability provided by Nextflow.
Nextflow automatically creates and activates the Conda environment(s) given the dependencies specified by each process.
Dependencies are specified by using the click here to read more)
Nextflow is growing up. The past week marked five years since the first commit of the project on GitHub. Like a parent reflecting on their child attending school for the first time, we know reaching this point hasn’t been an entirely solo journey, despite Paolo's best efforts!
A lot has happened recently and we thought it was time to highlight some of the recent evolutions. We also take the opportunity to extend the warmest of thanks to all those who have contributed to the development of Nextflow as well as the fantastic community of users who consistently provide ideas, feedback and the occasional late night banter on the Gitter channel.
Here are a few neat developments churning out of the birthday cake mix.
nf-core is a community effort to provide a home for high quality, production-ready, curated analysis pipelines built using Nextflow. The project has been initiated and is being .. (click here to read more)
This is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max describes how they deploy complex cancer data analysis pipelines using Nextflow and Singularity. We are very happy to share their experience across the Nextflow community.
Cancer Analysis Workflow (CAW for short) is a Nextflow based analysis pipeline developed for the analysis of tumour: normal pairs. It is developed in collaboration with two infrastructures within Science for Life Laboratory: National Genomics Infrastructure (NGI), in The Stockholm Genomics Applications Development Facility to be precise and National Bioinformatics Infrastructure Sweden (NBIS).
CAW is based on GATK Best Practices for the preprocessing of FastQ files, then uses various variant calling tools to look for somatic SNVs and small indels (MuTect1, MuTect2, Strelka, Freebayes), (GATK HaplotyeCaller), for structural variants(click here to read more)
The latest Nextflow release (0.26.0) includes built-in support for AWS Batch, a managed computing service that allows the execution of containerised workloads over the Amazon EC2 Container Service (ECS).
This feature allows the seamless deployment of Nextflow pipelines in the cloud by offloading the process executions as managed Batch jobs. The service takes care to spin up the required computing instances on-demand, scaling up and down the number and composition of the instances to best accommodate the actual workload resource needs at any point in time.
AWS Batch shares with Nextflow the same vision regarding workflow containerisation i.e. each compute task is executed in its own Docker container. This dramatically simplifies the workflow deployment through the download of a few container images. This common design background made the support for AWS Batch a natural extension for Nextflow.
Batch is organised in Compute Environments, Job queues, Job definitions and .. (click here to read more)
Last week saw the inaugural Nextflow meeting organised at the Centre for Genomic Regulation (CRG) in Barcelona. The event combined talks, demos, a tutorial/workshop for beginners as well as two hackathon sessions for more advanced users.
Nearly 50 participants attended over the two days which included an entertaining tapas course during the first evening!
One of the main objectives of the event was to bring together Nextflow users to work together on common interest projects. There were several proposals for the hackathon sessions and in the end five diverse ideas were chosen for communal development ranging from new pipelines through to the addition of new features in Nextflow.
The proposals and outcomes of each the projects, which can be found in the issues section of this GitHub repository, have been summarised below.
The HTML tracing project aims to generate a rendered version of the Nextflow trace file to .. (click here to read more)
The Common Workflow Language (CWL) is a specification for defining workflows in a declarative manner. It has been implemented to varying degrees by different software packages. Nextflow and CWL share a common goal of enabling portable reproducible workflows.
We are currently investigating the automatic conversion of CWL workflows into Nextflow scripts to increase the portability of workflows. This work is being developed as the cwl2nxf project, currently in early prototype stage.
Our first phase of the project was to determine mappings of CWL to Nextflow and familiarize ourselves with how the current implementation of the converter supports a number of CWL specific features.
Inputs in the CWL workflow file are initially parsed as channels or other Nextflow input types. Each step specified in the workflow is then parsed independently. At the time of writing subworkflows are not supported, each step must be a CWL
CommandLineTool .. (click here to read more)
We are excited to announce the first Nextflow workshop that will take place at the Barcelona Biomedical Research Park building (PRBB) on 14-15th September 2017.
This event is open to everybody who is interested in the problem of computational workflow reproducibility. Leading experts and users will discuss the current state of the Nextflow technology and how it can be applied to manage -omics analyses in a reproducible manner. Best practices will be introduced on how to deploy real-world large-scale genomic applications for precision medicine.
During the hackathon, organized for the second day, participants will have the opportunity to learn how to write self-contained, replicable data analysis pipelines along with Nextflow expert developers.
|10.00||Welcome & introduction |
Comparative Bioinformatics, CRG, Spain