Running Nextflow on AWS Batch

Lee Pang

Genomics Specialist, Amazon Web Services, USA

The amount of genomic sequence data has exponentially increased year over year since the introduction and continuous improvement of NextGen Sequencing techniques nearly a decade ago. While traditionally this data was processed using on-premise computing clusters, the scale of recent datasets, and the processing throughput needed for clinical sequencing applications can easily exceed the capabilities and availability of such hardware. In contrast, the cloud offers unlimited computing resources that can be leveraged for highly efficient, performant, and cost effective genomics analysis workloads.

Nextflow is a highly scalable reactive open source workflow framework that runs on infrastructure ranging from personal laptops, on-premise HPC clusters, and in the cloud using services like AWS Batch, a fully managed batch processing service from Amazon Web Services.

This tutorial will walk you through how to setup AWS infrastructure and Nextflow to run genomics analysis pipelines in the cloud. You will learn how to create AWS Batch Compute Environments and Job Queues that leverage Amazon EC2 Spot and On-Demand instances. Using these resources, you will build architecture that runs Nextflow entirely on AWS Batch in a cost effective and scalable fashion. You will also learn how to process large genomic datasets from Public and Private S3 Buckets. By the end of the session, you will have the resources you need to build a genomics workflow environment with Nextflow on AWS, both from scratch and using automated mechanisms like CloudFormation.


Lee is a Technical Business Development Manager specializing in Genomics and Life Sciences workloads on AWS. He has over 10 years of hands-on experience as a practicing research scientist and software engineer in bioinformatics, computational systems biology, and data science developing tools ranging from high throughput pipelines for *omics data processing to HIPAA compliant software for clinical data capture and analysis.


To attend Nextflow Camp 2019 register at this link.