Parallel and scalable workflow for the analysis of RNA modifications using Oxford Nanopore direct RNA sequencing

Luca Cozzuto

Bioinformatics Research Scientist, Centre for Genomic Regulation (CRG), Spain

The direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA (cDNA), and as such, is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced. Although the technology has been publicly available since 2017, the complexity of the raw current intensity output data generated by nanopore sequencing, together with lack of systematic and reproducible pipelines for the analysis of direct RNA sequencing datasets, have greatly hindered the access of this technology to the general user. Here we provide an in silico scalable and parallelizable workflow for the analysis of direct RNA sequencing reads, which converts raw current intensities into multiple types of processed data, providing metrics of the quality of the run, per-gene counts, RNA modification predictions and polyA tail length predictions. The workflow, which has been built using the Nextflow framework and is distributed with Docker and Singularity containers, can be executed on any Unix-compatible OS on a computer, cluster or cloud without the need of installing any additional software or dependencies. Moreover, the workflow is easily scalable, as it can incorporate updated software versions or algorithms that may be released in the future in a modular manner. We expect that our pipeline will make the analysis of direct RNA sequencing datasets highly simplified and accessible to the non-bioinformatic expert, and thus boost our understanding of the epitranscriptome with single molecule resolution.


Luca Cozzuto is a bioinformatician expert of NGS data who studied biotechnology at University of Naples Federico II. He got his PhD at European School of Molecular Medicine and since 2010 is part of the Bioinformatics Core at Center for Genomics Regulation in Barcelona. He is mainly involved in providing data analysis, pipeline developments and training to researchers.


