Working with files
Opening files
To access and work with files, use the file()
method, which returns a file system object given a file path string:
myFile = file('some/path/to/my_file.file')
The file()
method can reference both files and directories, depending on what the string path refers to in the file system.
When using the wildcard characters *
, ?
, []
and {}
, the argument is interpreted as a glob path matcher and the file()
method returns a list object holding the paths of files whose names match the specified pattern, or an empty list if no match is found:
listOfFiles = file('some/path/*.fa')
Note
The file()
method does not return a list if only one file is matched. Use the files()
method to always return a list.
Note
A double asterisk (**
) in a glob pattern works like *
but also searches through subdirectories.
By default, wildcard characters do not match directories or hidden files. For example, if you want to include hidden files in the result list, enable the hidden
option:
listWithHidden = file('some/path/*.fa', hidden: true)
Note
To compose paths, instead of string interpolation, use the resolve()
method or the /
operator:
def dir = file('s3://bucket/some/data/path')
def sample1 = dir.resolve('sample.bam') // correct
def sample2 = dir / 'sample.bam'
def sample3 = file("$dir/sample.bam") // correct (but verbose)
def sample4 = "$dir/sample.bam" // incorrect
Getting file attributes
The file()
method returns a Path, which has several methods for retrieving metadata about the file:
def path = file('/some/path/file.txt')
assert path.baseName == 'file'
assert path.extension == 'txt'
assert path.name == 'file.txt'
assert path.parent == '/some/path'
Tip
In Groovy, any method that looks like get*()
can also be accessed as a field. For example, myFile.getName()
is equivalent to myFile.name
, myFile.getBaseName()
is equivalent to myFile.baseName
, and so on.
See the Path reference for the list of available methods.
Reading and writing
Reading and writing an entire file
Given a file variable, created with the file()
method as shown previously, reading a file is as easy as getting the file’s text
property, which returns the file content as a string:
print myFile.text
Similarly, you can save a string to a file by assigning it to the file’s text
property:
myFile.text = 'Hello world!'
Binary data can managed in the same way, just using the file property bytes
instead of text
. Thus, the following example reads the file and returns its content as a byte array:
binaryContent = myFile.bytes
Or you can save a byte array to a file:
myFile.bytes = binaryContent
Note
The above assignment overwrites any existing file contents, and implicitly creates the file if it doesn’t exist.
Warning
The above methods read and write the entire file contents at once, in a single variable or buffer. For this reason, when dealing with large files it is recommended that you use a more memory efficient approach, such as reading/writing a file line by line or using a fixed size buffer.
Appending to a file
In order to append a string value to a file without erasing existing content, you can use the append()
method:
myFile.append('Add this line\n')
Or use the left shift operator, a more idiomatic way to append text content to a file:
myFile << 'Add a line more\n'
Reading a file line by line
In order to read a text file line by line you can use the method readLines()
provided by the file object, which returns the file content as a list of strings:
myFile = file('some/my_file.txt')
allLines = myFile.readLines()
for( line : allLines ) {
println line
}
This can also be written in a more idiomatic syntax:
file('some/my_file.txt')
.readLines()
.each { println it }
Warning
The method readLines()
reads the entire file at once and returns a list containing all the lines. For this reason, do not use it to read big files.
To process a big file, use the method eachLine()
, which reads only a single line at a time into memory:
count = 0
myFile.eachLine { str ->
println "line ${count++}: $str"
}
Advanced file reading
The classes Reader
and InputStream
provide fine-grained control for reading text and binary files, respectively.
The method newReader()
creates a Reader object for the given file that allows you to read the content as single characters, lines or arrays of characters:
myReader = myFile.newReader()
String line
while( line = myReader.readLine() ) {
println line
}
myReader.close()
The method withReader()
works similarly, but automatically calls the close()
method for you when you have finished processing the file. So, the previous example can be written more simply as:
myFile.withReader {
String line
while( line = it.readLine() ) {
println line
}
}
The methods newInputStream()
and withInputStream()
work similarly. The main difference is that they create an InputStream object useful for writing binary data.
See the Path reference for the list of available methods.
Advanced file writing
The Writer
and OutputStream
classes provide fine-grained control for writing text and binary files, respectively, including low-level operations for single characters or bytes, and support for big files.
For example, given two file objects sourceFile
and targetFile
, the following code copies the first file’s content into the second file, replacing all U
characters with X
:
sourceFile.withReader { source ->
targetFile.withWriter { target ->
String line
while( line=source.readLine() ) {
target << line.replaceAll('U','X')
}
}
}
See the Path reference for the list of available methods.
Filesystem operations
Methods for performing filesystem operations such as copying, deleting, and directory listing are documented in the Path reference.
Listing directories
The simplest way to list a directory is to use list()
or listFiles()
, which return a collection of first-level elements (files and directories) of a directory:
for( def file : file('any/path').list() ) {
println file
}
Additionally, the eachFile()
method allows you to iterate through the first-level elements only (just like listFiles()
). As with other each*()
methods, eachFile()
takes a closure as a parameter:
myDir.eachFile { item ->
if( item.isFile() ) {
println "${item.getName()} - size: ${item.size()}"
}
else if( item.isDirectory() ) {
println "${item.getName()} - DIR"
}
}
Copying files
In general, you should not need to manually copy files, because Nextflow will automatically stage files in and out of the task environment based on the definition of process inputs and outputs. Ideally, any operation which transforms files should be encapsulated in a process, in order to leverage Nextflow’s staging capabilities as much as possible.
Remote files
Nextflow can work with many kinds of remote files and objects using the same interface as for local files. The following protocols are supported:
HTTP(S) / FTP (
http://
,https://
,ftp://
)Amazon S3 (
s3://
)Azure Blob Storage (
az://
)Google Cloud Storage (
gs://
)
To reference a remote file, simple specify the URL when opening the file:
pdb = file('http://files.rcsb.org/header/5FID.pdb')
You can then access it as a local file as described previously:
println pdb.text
Note
Not all operations are supported for all protocols. In particular, writing and directory listing are not supported for HTTP(S) and FTP paths.
Note
Additional configuration may be required to work with cloud object storage (e.g. to authenticate with a private bucket). Refer to the respective page for each cloud storage provider for more information.