Scripts

The Nextflow scripting language is an extension of the Groovy programming language. Groovy is a powerful programming language for the Java virtual machine. The Nextflow syntax has been specialized to ease the writing of computational pipelines in a declarative manner.

Nextflow can execute any piece of Groovy code or use any library for the JVM platform.

For a detailed description of the Groovy programming language, reference these links:

Below you can find a crash course in the most important language constructs used in the Nextflow scripting language.

Warning

Nextflow uses UTF-8 as the default character encoding for source files. Make sure to use UTF-8 encoding when editing Nextflow scripts with your preferred text editor.

Warning

Nextflow scripts have a maximum size of 64 KiB. To avoid this limit for large pipelines, consider moving pipeline components into separate files and including them as modules.

Groovy basics

Hello world

To print something is as easy as using one of the print or println methods.

println "Hello, World!"

The only difference between the two is that the println method implicitly appends a newline character to the printed string.

Variables

To define a variable, simply assign a value to it:

x = 1
println x

x = new java.util.Date()
println x

x = -3.1499392
println x

x = false
println x

x = "Hi"
println x

Lists

A List object can be defined by placing the list items in square brackets:

myList = [1776, -1, 33, 99, 0, 928734928763]

You can access a given item in the list with square-bracket notation (indexes start at 0):

println myList[0]

In order to get the length of the list use the size method:

println myList.size()

Learn more about lists:

Maps

Maps are used to store associative arrays (also known as dictionaries). They are unordered collections of heterogeneous, named data:

scores = ["Brett": 100, "Pete": "Did not finish", "Andrew": 86.87934]

Note that each of the values stored in the map can be of a different type. Brett is an integer, Pete is a string, and Andrew is a floating-point number.

We can access the values in a map in two main ways:

println scores["Pete"]
println scores.Pete

To add data to or modify a map, the syntax is similar to adding values to list:

scores["Pete"] = 3
scores["Cedric"] = 120

You can also use the + operator to add two maps together:

new_scores = scores + ["Pete": 3, "Cedric": 120]

When adding two maps, the first map is copied and then appended with the keys from the second map. Any conflicting keys are overwritten by the second map.

Tip

Copying a map with the + operator is a safer way to modify maps in Nextflow, specifically when passing maps through channels. This way, a new instance of the map will be created, and any references to the original map won’t be affected.

Learn more about maps:

Multiple assignment

An array or a list object can used to assign to multiple variables at once:

(a, b, c) = [10, 20, 'foo']
assert a == 10 && b == 20 && c == 'foo'

The three variables on the left of the assignment operator are initialized by the corresponding item in the list.

Read more about Multiple assignment in the Groovy documentation.

Conditional Execution

One of the most important features of any programming language is the ability to execute different code under different conditions. The simplest way to do this is to use the if construct:

x = Math.random()
if( x < 0.5 ) {
    println "You lost."
}
else {
    println "You won!"
}

Strings

Strings can be defined by enclosing text in single or double quotes (' or " characters):

println "he said 'cheese' once"
println 'he said "cheese!" again'

Strings can be concatenated with +:

a = "world"
print "hello " + a + "\n"

String interpolation

There is an important difference between single-quoted and double-quoted strings: Double-quoted strings support variable interpolations, while single-quoted strings do not.

In practice, double-quoted strings can contain the value of an arbitrary variable by prefixing its name with the $ character, or the value of any expression by using the ${expression} syntax, similar to Bash/shell scripts:

foxtype = 'quick'
foxcolor = ['b', 'r', 'o', 'w', 'n']
println "The $foxtype ${foxcolor.join()} fox"

x = 'Hello'
println '$x + $y'

This code prints:

The quick brown fox
$x + $y

Multi-line strings

A block of text that span multiple lines can be defined by delimiting it with triple single or double quotes:

text = """
    hello there James
    how are you today?
    """

Note

Like before, multi-line strings inside double quotes support variable interpolation, while single-quoted multi-line strings do not.

As in Bash/shell scripts, terminating a line in a multi-line string with a \ character prevents a newline character from separating that line from the one that follows:

myLongCmdline = """
    blastp \
    -in $input_query \
    -out $output_file \
    -db $blast_database \
    -html
    """

result = myLongCmdline.execute().text

In the preceding example, blastp and its -in, -out, -db and -html switches and their arguments are effectively a single line.

Warning

When using backslashes to continue a multi-line command, make sure to not put any spaces after the backslash, otherwise it will be interpreted by the Groovy lexer as an escaped space instead of a backslash, which will make your script incorrect. It will also print this warning:

unknown recognition error type: groovyjarjarantlr4.v4.runtime.LexerNoViableAltException

Regular expressions

Regular expressions are the Swiss Army knife of text processing. They provide the programmer with the ability to match and extract patterns from strings.

Regular expressions are available via the ~/pattern/ syntax and the =~ and ==~ operators.

Use =~ to check whether a given pattern occurs anywhere in a string:

assert 'foo' =~ /foo/       // return TRUE
assert 'foobar' =~ /foo/    // return TRUE

Use ==~ to check whether a string matches a given regular expression pattern exactly.

assert 'foo' ==~ /foo/       // return TRUE
assert 'foobar' ==~ /foo/    // return FALSE

It is worth noting that the ~ operator creates a Java Pattern object from the given string, while the =~ operator creates a Java Matcher object.

x = ~/abc/
println x.class
// prints java.util.regex.Pattern

y = 'some string' =~ /abc/
println y.class
// prints java.util.regex.Matcher

Regular expression support is imported from Java. Java’s regular expression language and API is documented in the Pattern class.

You may also be interested in this post: Groovy: Don’t Fear the RegExp.

String replacement

To replace pattern occurrences in a given string, use the replaceFirst and replaceAll methods:

x = "colour".replaceFirst(/ou/, "o")
println x
// prints: color

y = "cheesecheese".replaceAll(/cheese/, "nice")
println y
// prints: nicenice

Capturing groups

You can match a pattern that includes groups. First create a matcher object with the =~ operator. Then, you can index the matcher object to find the matches: matcher[0] returns a list representing the first match of the regular expression in the string. The first list element is the string that matches the entire regular expression, and the remaining elements are the strings that match each group.

Here’s how it works:

programVersion = '2.7.3-beta'
m = programVersion =~ /(\d+)\.(\d+)\.(\d+)-?(.+)/

assert m[0] == ['2.7.3-beta', '2', '7', '3', 'beta']
assert m[0][1] == '2'
assert m[0][2] == '7'
assert m[0][3] == '3'
assert m[0][4] == 'beta'

Applying some syntactic sugar, you can do the same in just one line of code:

programVersion = '2.7.3-beta'
(full, major, minor, patch, flavor) = (programVersion =~ /(\d+)\.(\d+)\.(\d+)-?(.+)/)[0]

println full    // 2.7.3-beta
println major   // 2
println minor   // 7
println patch   // 3
println flavor  // beta

Removing part of a string

You can remove part of a String value using a regular expression pattern. The first match found is replaced with an empty String:

// define the regexp pattern
wordStartsWithGr = ~/(?i)\s+Gr\w+/

// apply and verify the result
('Hello Groovy world!' - wordStartsWithGr) == 'Hello world!'
('Hi Grails users' - wordStartsWithGr) == 'Hi users'

Remove the first 5-character word from a string:

assert ('Remove first match of 5 letter word' - ~/\b\w{5}\b/) == 'Remove match of 5 letter word'

Remove the first number with its trailing whitespace from a string:

assert ('Line contains 20 characters' - ~/\d+\s+/) == 'Line contains characters'

Functions

Functions can be defined using the following syntax:

def <function name> ( arg1, arg, .. ) {
    <function body>
}

For example:

def foo() {
    'Hello world'
}

def bar(alpha, omega) {
    alpha + omega
}

The above snippet defines two simple functions, that can be invoked in the workflow script as foo(), which returns 'Hello world', and bar(10, 20), which returns the sum of two parameters (30 in this case).

Functions implicitly return the result of the last statement. Additionally, the return keyword can be used to explicitly exit from a function and return the specified value. For example:

def fib( x ) {
    if( x <= 1 )
        return x

    fib(x-1) + fib(x-2)
}

Closures

Briefly, a closure is a block of code that can be passed as an argument to a function. Thus, you can define a chunk of code and then pass it around as if it were a string or an integer.

More formally, you can create functions that are defined as first-class objects.

square = { it * it }

The curly brackets around the expression it * it tells the script interpreter to treat this expression as code. The it identifier is an implicit variable that represents the value that is passed to the function when it is invoked.

Once compiled the function object is assigned to the variable square as any other variable assignments shown previously. Now we can do something like this:

println square(9)

and get the value 81.

This is not very interesting until we find that we can pass the function square as an argument to other functions or methods. Some built-in functions take a function like this as an argument. One example is the collect method on lists:

[ 1, 2, 3, 4 ].collect(square)

This expression says: Create an array with the values 1, 2, 3 and 4, then call its collect method, passing in the closure we defined above. The collect method runs through each item in the array, calls the closure on the item, then puts the result in a new array, resulting in:

[ 1, 4, 9, 16 ]

For more methods that you can call with closures as arguments, see the Groovy GDK documentation.

By default, closures take a single parameter called it, but you can also create closures with multiple, custom-named parameters. For example, the method Map.each() can take a closure with two arguments, to which it binds the key and the associated value for each key-value pair in the Map. Here, we use the obvious variable names key and value in our closure:

printMapClosure = { key, value ->
    println "$key = $value"
}

[ "Yue" : "Wu", "Mark" : "Williams", "Sudha" : "Kumari" ].each(printMapClosure)

Prints:

Yue = Wu
Mark = Williams
Sudha = Kumari

Closures can also access variables outside of their scope, and they can be used anonymously, that is without assigning them to a variable. Here is an example that demonstrates both of these things:

myMap = ["China": 1, "India": 2, "USA": 3]

result = 0
myMap.keySet().each { result += myMap[it] }

println result

A closure can also declare local variables that exist only for the lifetime of the closure:

result = 0
myMap.keySet().each {
  def count = myMap[it]
  result += count
}

Warning

Local variables should be declared using a qualifier such as def or a type name, otherwise they will be interpreted as global variables, which could lead to a race condition.

Learn more about closures in the Groovy documentation

Syntax sugar

Groovy provides several forms of “syntax sugar”, or shorthands that can make your code easier to read.

Some programming languages require every statement to be terminated by a semi-colon. In Groovy, semi-colons are optional, but they can still be used to write multiple statements on the same line:

println 'Hello!' ; println 'Hello again!'

When calling a function, the parentheses around the function arguments are optional:

// full syntax
printf('Hello %s!\n', 'World')

// shorthand
printf 'Hello %s!\n', 'World'

It is especially useful when calling a function with a closure parameter:

// full syntax
[1, 2, 3].each({ println it })

// shorthand
[1, 2, 3].each { println it }

If the last argument is a closure, the closure can be written outside of the parentheses:

// full syntax
[1, 2, 3].inject('result:', { accum, v -> accum + ' ' + v })

// shorthand
[1, 2, 3].inject('result:') { accum, v -> accum + ' ' + v }

Note

In some cases, you might not be able to omit the parentheses because it would be syntactically ambiguous. You can use the groovysh REPL console to play around with Groovy and figure out what works.

Implicit variables

Script implicit variables

The following variables are implicitly defined in the script global execution scope:

baseDir: Deprecated since version 20.04.0: Use projectDir instead; The directory where the main workflow script is located.
launchDir: New in version 20.04.0.; The directory where the workflow is run.
moduleDir: New in version 20.04.0.; The directory where a module script is located for DSL2 modules or the same as projectDir for a non-module script.
nextflow: Dictionary like object representing nextflow runtime information (see Nextflow metadata).
params: Dictionary like object holding workflow parameters specifying in the config file or as command line options.
projectDir: New in version 20.04.0.; The directory where the main script is located.
secrets: New in version 24.02.0-edge.; Dictionary like object holding workflow secrets. Read the Secrets page for more information.
workDir: The directory where tasks temporary files are created.
workflow: Dictionary like object representing workflow runtime information (see Runtime metadata).

Configuration implicit variables

The following variables are implicitly defined in the Nextflow configuration file:

baseDir: Deprecated since version 20.04.0: Use projectDir instead; The directory where the main workflow script is located.
launchDir: New in version 20.04.0.; The directory where the workflow is run.
projectDir: New in version 20.04.0.; The directory where the main script is located.

Process implicit variables

The following variables are implicitly defined in the task object of each process:

attempt: The current task attempt
hash: Available only in exec: blocks; The task unique hash ID
index: The task index (corresponds to task_id in the execution trace)
name: Available only in exec: blocks; The current task name
process: The current process name
workDir: Available only in exec: blocks; The task unique directory

The task object also contains the values of all process directives for the given task, which allows you to access these settings at runtime. For examples:

process foo {
  script:
  """
  some_tool --cpus $task.cpus --mem $task.memory
  """
}

In the above snippet the task.cpus holds the value for the cpus directive and the task.memory the current value for memory directive depending on the actual setting given in the workflow configuration file.

See Process directives for details.

Implicit functions

The following functions are available in Nextflow scripts:

branchCriteria( closure ): Create a branch criteria to use with the branch operator.
error( message = null ): Throw a script runtime error with an optional error message.
exit( exitCode = 0, message = null ): Deprecated since version 22.06.0-edge: Use error() instead; Stop the pipeline execution and return an exit code and optional error message.
file( filePattern, options = [:] ): Get one or more files from a path or glob pattern. Returns a Path or list of Paths if there are multiple files. See Files and I/O.
files( filePattern, options = [:] ): Convenience method for file() that always returns a list.
groupKey( key, size ): Create a grouping key to use with the groupTuple operator.
multiMapCriteria( closure ): Create a multi-map criteria to use with the multiMap operator.
sendMail( params ): Send an email. See Mail & Notifications.
tuple( collection ): Create a tuple object from the given collection.
tuple( ... args ): Create a tuple object from the given arguments.

Implicit classes

The following classes are imported by default in Nextflow scripts:

java.lang.*
java.util.*
java.io.*
java.net.*
groovy.lang.*
groovy.util.*
java.math.BigInteger
java.math.BigDecimal
java.nio.file.Path

Additionally, Nextflow imports several new classes which are described below.

Channel

The Channel class provides the channel factory methods. See Channel factories for more information.

Duration

A Duration represents some duration of time.

You can create a duration by adding a time unit suffix to an integer, e.g. 1.h. The following suffixes are available:

Unit	Description
`ms`, `milli`, `millis`	Milliseconds
`s`, `sec`, `second`, `seconds`	Seconds
`m`, `min`, `minute`, `minutes`	Minutes
`h`, `hour`, `hours`	Hours
`d`, `day`, `days`	Days

You can also create a duration with Duration.of():

// integer value (milliseconds)
oneSecond = Duration.of(1000)

// simple string value
oneHour = Duration.of('1h')

// complex string value
complexDuration = Duration.of('1day 6hours 3minutes 30seconds')

Durations can be compared like numbers, and they support basic arithmetic operations:

a = 1.h
b = 2.h

assert a < b
assert a + a == b
assert b - a == a
assert a * 2 == b
assert b / 2 == a

The following methods are available for a Duration object:

getDays(), toDays(): Get the duration value in days (rounded down).
getHours(), toHours(): Get the duration value in hours (rounded down).
getMillis(), toMillis(): Get the duration value in milliseconds.
getMinutes(), toMinutes(): Get the duration value in minutes (rounded down).
getSeconds(), toSeconds(): Get the duration value in seconds (rounded down).

MemoryUnit

A MemoryUnit represents a quantity of bytes.

You can create a memory unit by adding a unit suffix to an integer, e.g. 1.GB. The following suffixes are available:

Unit	Description
`B`	Bytes
`KB`	Kilobytes
`MB`	Megabytes
`GB`	Gigabytes
`TB`	Terabytes
`PB`	Petabytes
`EB`	Exabytes
`ZB`	Zettabytes

Note

Technically speaking, a kilobyte is equal to 1000 bytes, whereas 1024 bytes is called a “kibibyte” and abbreviated as “KiB”, and so on for the other units. In practice, however, kilobyte is commonly understood to mean 1024 bytes, and Nextflow follows this convention in its implementation as well as this documentation.

You can also create a memory unit with MemoryUnit.of():

// integer value (bytes)
oneKilobyte = MemoryUnit.of(1024)

// string value
oneGigabyte = MemoryUnit.of('1 GB')

Memory units can be compared like numbers, and they support basic arithmetic operations:

a = 1.GB
b = 2.GB

assert a < b
assert a + a == b
assert b - a == a
assert a * 2 == b
assert b / 2 == a

The following methods are available for a MemoryUnit object:

getBytes(), toBytes(): Get the memory value in bytes (B).
getGiga(), toGiga(): Get the memory value in gigabytes (rounded down), where 1 GB = 1024 MB.
getKilo(), toKilo(): Get the memory value in kilobytes (rounded down), where 1 KB = 1024 B.
getMega(), toMega(): Get the memory value in megabytes (rounded down), where 1 MB = 1024 KB.
toUnit( unit ): Get the memory value in terms of a given unit (rounded down). The unit can be one of: 'B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB'.

ValueObject

ValueObject is an AST transformation for classes and enums, which simply combines AutoClone and Immutable. It is useful for defining custom “record” types.

Files and I/O

Opening files

To access and work with files, use the file() method, which returns a file system object given a file path string:

myFile = file('some/path/to/my_file.file')

The file() method can reference both files and directories, depending on what the string path refers to in the file system.

When using the wildcard characters *, ?, [] and {}, the argument is interpreted as a glob path matcher and the file() method returns a list object holding the paths of files whose names match the specified pattern, or an empty list if no match is found:

listOfFiles = file('some/path/*.fa')

Note

The file() method does not return a list if only one file is matched. Use the files() method to always return a list.

Note

A double asterisk (**) in a glob pattern works like * but also searches through subdirectories.

By default, wildcard characters do not match directories or hidden files. For example, if you want to include hidden files in the result list, enable the hidden option:

listWithHidden = file('some/path/*.fa', hidden: true)

Note

To compose paths, instead of string interpolation, use the resolve() method or the / operator:

def dir = file('s3://bucket/some/data/path')
def sample1 = dir.resolve('sample.bam')         // correct
def sample2 = dir / 'sample.bam'
def sample3 = file("$dir/sample.bam")           // correct (but verbose)
def sample4 = "$dir/sample.bam"                 // incorrect

The following options are available:

checkIfExists: When true, throws an exception if the specified path does not exist in the file system (default: false)
followLinks: When true, follows symbolic links when traversing a directory tree, otherwise treats them as files (default: true)
glob: When true, interprets characters *, ?, [] and {} as glob wildcards, otherwise handles them as normal characters (default: true)
hidden: When true, includes hidden files in the resulting paths (default: false)
maxDepth: Maximum number of directory levels to visit (default: no limit)
type: Type of paths returned, can be 'file', 'dir' or 'any' (default: 'file')

Getting file attributes

The file() method returns a Path, so any method defined for Path can also be used in a Nextflow script.

Additionally, the following methods are also defined for Paths in Nextflow:

exists()

Returns true if the file exists.

getBaseName()

Gets the file name without its extension, e.g. /some/path/file.tar.gz -> file.tar.

getExtension()

Gets the file extension, e.g. /some/path/file.txt -> txt.

getName()

Gets the file name, e.g. /some/path/file.txt -> file.txt.

getSimpleName()

Gets the file name without any extension, e.g. /some/path/file.tar.gz -> file.

getParent()

Gets the file parent path, e.g. /some/path/file.txt -> /some/path.

getScheme()

Gets the file URI scheme, e.g. s3://some-bucket/foo.txt -> s3.

isDirectory()

Returns true if the file is a directory.

isEmpty()

Returns true if the file is empty or does not exist.

isFile()

Returns true if it is a regular file (i.e. not a directory).

isHidden()

Returns true if the file is hidden.

isLink()

Returns true if the file is a symbolic link.

lastModified()

Returns the file last modified timestamp in Unix time (i.e. milliseconds since January 1, 1970).

size()

Gets the file size in bytes.

toUriString()

Gets the file path along with the protocol scheme:

def ref = file('s3://some-bucket/foo.txt')

assert ref.toString() == '/some-bucket/foo.txt'
assert "$ref" == '/some-bucket/foo.txt'
assert ref.toUriString() == 's3://some-bucket/foo.txt'

Tip

In Groovy, any method that looks like get*() can also be accessed as a field. For example, myFile.getName() is equivalent to myFile.name, myFile.getBaseName() is equivalent to myFile.baseName, and so on.

Reading and writing

Reading and writing an entire file

Given a file variable, created with the file() method as shown previously, reading a file is as easy as getting the file’s text property, which returns the file content as a string:

print myFile.text

Similarly, you can save a string to a file by assigning it to the file’s text property:

myFile.text = 'Hello world!'

Binary data can managed in the same way, just using the file property bytes instead of text. Thus, the following example reads the file and returns its content as a byte array:

binaryContent = myFile.bytes

Or you can save a byte array to a file:

myFile.bytes = binaryContent

Note

The above assignment overwrites any existing file contents, and implicitly creates the file if it doesn’t exist.

Warning

The above methods read and write the entire file contents at once, in a single variable or buffer. For this reason, when dealing with large files it is recommended that you use a more memory efficient approach, such as reading/writing a file line by line or using a fixed size buffer.

Appending to a file

In order to append a string value to a file without erasing existing content, you can use the append() method:

myFile.append('Add this line\n')

Or use the left shift operator, a more idiomatic way to append text content to a file:

myFile << 'Add a line more\n'

Reading a file line by line

In order to read a text file line by line you can use the method readLines() provided by the file object, which returns the file content as a list of strings:

myFile = file('some/my_file.txt')
allLines = myFile.readLines()
for( line : allLines ) {
    println line
}

This can also be written in a more idiomatic syntax:

file('some/my_file.txt')
    .readLines()
    .each { println it }

Warning

The method readLines() reads the entire file at once and returns a list containing all the lines. For this reason, do not use it to read big files.

To process a big file, use the method eachLine(), which reads only a single line at a time into memory:

count = 0
myFile.eachLine { str ->
    println "line ${count++}: $str"
}

Advanced file reading

The classes Reader and InputStream provide fine-grained control for reading text and binary files, respectively.

The method newReader() creates a Reader object for the given file that allows you to read the content as single characters, lines or arrays of characters:

myReader = myFile.newReader()
String line
while( line = myReader.readLine() ) {
    println line
}
myReader.close()

The method withReader() works similarly, but automatically calls the close() method for you when you have finished processing the file. So, the previous example can be written more simply as:

myFile.withReader {
    String line
    while( line = it.readLine() ) {
        println line
    }
}

The methods newInputStream() and withInputStream() work similarly. The main difference is that they create an InputStream object useful for writing binary data.

The following methods are useful for reading files:

eachByte( closure ): Iterates over the file byte by byte, applying the specified closure.
eachLine( closure ): Iterates over the file line by line, applying the specified closure.
getBytes(): Returns the file content as a byte array.
getText(): Returns the file content as a string value.
newInputStream(): Returns an InputStream object to read a binary file.
newReader(): Returns a Reader object to read a text file.
readLines(): Reads the file line by line and returns the content as a list of strings.
withInputStream( closure ): Opens a file for reading and lets you access it with an InputStream object.
withReader( closure ): Opens a file for reading and lets you access it with a Reader object.

Advanced file writing

The Writer and OutputStream classes provide fine-grained control for writing text and binary files, respectively, including low-level operations for single characters or bytes, and support for big files.

For example, given two file objects sourceFile and targetFile, the following code copies the first file’s content into the second file, replacing all U characters with X:

sourceFile.withReader { source ->
    targetFile.withWriter { target ->
        String line
        while( line=source.readLine() ) {
            target << line.replaceAll('U','X')
        }
    }
}

The following methods are available for writing to files:

append( text ): Appends a string value to a file without replacing existing content.
newOutputStream(): Creates an OutputStream object that allows you to write binary data to a file.
newPrintWriter(): Creates a PrintWriter object that allows you to write formatted text to a file.
newWriter(): Creates a Writer object that allows you to save text data to a file.
setBytes( bytes ): Writes a byte array to a file. Equivalent to setting the bytes property.
setText( text ): Writes a string value to a file. Equivalent to setting the text property.
withOutputStream( closure ): Applies the specified closure to an OutputStream object, closing it when finished.
withPrintWriter( closure ): Applies the specified closure to a PrintWriter object, closing it when finished.
withWriter( closure ): Applies the specified closure to a Writer object, closing it when finished.
write( text ): Writes a string to a file, replacing any existing content.

Filesystem operations

The following methods are available for manipulating files and directories in a filesystem:

copyTo( target )

Copies a source file or directory to a target file or directory.

When copying a file to another file: if the target file already exists, it will be replaced.

file('/some/path/my_file.txt').copyTo('/another/path/new_file.txt')

When copying a file to a directory: the file will be copied into the directory, replacing any file with the same name.

file('/some/path/my_file.txt').copyTo('/another/path')

When copying a directory to another directory: if the target directory already exists, the source directory will be copied into the target directory, replacing any sub-directory with the same name. If the target path does not exist, it will be created automatically.

file('/any/dir_a').moveTo('/any/dir_b')

The result of the above example depends on the existence of the target directory. If the target directory exists, the source is moved into the target directory, resulting in the path /any/dir_b/dir_a. If the target directory does not exist, the source is just renamed to the target name, resulting in the path /any/dir_b.

Note

The copyTo() method follows the semantics of the Linux command cp -r <source> <target>, with the following caveat: while Linux tools often treat paths ending with a slash (e.g. /some/path/name/) as directories, and those not (e.g. /some/path/name) as regular files, Nextflow (due to its use of the Java files API) views both of these paths as the same file system object. If the path exists, it is handled according to its actual type (i.e. as a regular file or as a directory). If the path does not exist, it is treated as a regular file, with any missing parent directories created automatically.

delete()

Deletes the file or directory at the given path, returning true if the operation succeeds, and false otherwise:

myFile = file('some/file.txt')
result = myFile.delete()
println result ? "OK" : "Cannot delete: $myFile"

If a directory is not empty, it will not be deleted and delete() will return false.

deleteDir()

Deletes a directory and all of its contents.

file('any/path').deleteDir()

getPermissions()

Returns a file’s permissions using the symbolic notation, e.g. 'rw-rw-r--'.

list()

Returns the first-level elements (files and directories) of a directory as a list of strings.

listFiles()

Returns the first-level elements (files and directories) of a directory as a list of Paths.

mkdir()

Creates a directory at the given path, returning true if the directory is created successfully, and false otherwise:

myDir = file('any/path')
result = myDir.mkdir()
println result ? "OK" : "Cannot create directory: $myDir"

If the parent directories do not exist, the directory will not be created and mkdir() will return false.

mkdirs()

Creates a directory at the given path, including any nonexistent parent directories:

file('any/path').mkdirs()

mklink( linkName, options = [:] )

Creates a filesystem link to a given path:

myFile = file('/some/path/file.txt')
myFile.mklink('/user/name/link-to-file.txt')

Available options:

hard: When true, creates a hard link, otherwise creates a soft (aka symbolic) link (default: false).
overwrite: When true, overwrites any existing file with the same name, otherwise throws a FileAlreadyExistsException (default: false).

moveTo( target )

Moves a source file or directory to a target file or directory. Follows the same semantics as copyTo().

renameTo( target )

Rename a file or directory:

file('my_file.txt').renameTo('new_file_name.txt')

setPermissions( permissions )

Sets a file’s permissions using the symbolic notation:

myFile.setPermissions('rwxr-xr-x')

setPermissions( owner, group, other )

Sets a file’s permissions using the numeric notation, i.e. as three digits representing the owner, group, and other permissions:

myFile.setPermissions(7,5,5)

Listing directories

The simplest way to list a directory is to use list() or listFiles(), which return a collection of first-level elements (files and directories) of a directory:

for( def file : file('any/path').list() ) {
    println file
}

Additionally, the eachFile() method allows you to iterate through the first-level elements only (just like listFiles()). As with other each*() methods, eachFile() takes a closure as a parameter:

myDir.eachFile { item ->
    if( item.isFile() ) {
        println "${item.getName()} - size: ${item.size()}"
    }
    else if( item.isDirectory() ) {
        println "${item.getName()} - DIR"
    }
}

The following methods are available for listing and traversing directories:

eachDir( closure ): Iterates through first-level directories only. Read more
eachDirMatch( nameFilter, closure ): Iterates through directories whose names match the given filter. Read more
eachDirRecurse( closure ): Iterates through directories depth-first (regular files are ignored). Read more
eachFile( closure ): Iterates through first-level files and directories. Read more
eachFileMatch( nameFilter, closure ): Iterates through files and directories whose names match the given filter. Read more
eachFileRecurse( closure ): Iterates through files and directories depth-first. Read more

Fetching HTTP/FTP files

Nextflow integrates seamlessly with the HTTP(S) and FTP protocols for handling remote resources the same as local files. Simply specify the resource URL when opening the file:

pdb = file('http://files.rcsb.org/header/5FID.pdb')

Then, you can access it as a local file as described previously:

println pdb.text

The above one-liner prints the content of the remote PDB file. Previous sections provide code examples showing how to stream or copy the content of files.

Note

Write and list operations are not supported for HTTP(S) and FTP files.

Splitting and counting records

The following methods are defined for Paths for splitting and counting records:

countFasta(): Counts the number of records in a FASTA file. See the splitFasta operator for available options.
countFastq(): Counts the number of records in a FASTQ file. See the splitFastq operator for available options.
countJson(): Counts the number of records in a JSON file. See the splitJson operator for available options.
countLines(): Counts the number of lines in a text file. See the splitText operator for available options.
splitCsv(): Splits a CSV file into a list of records. See the splitCsv operator for available options.
splitFasta(): Splits a FASTA file into a list of records. See the splitFasta operator for available options.
splitFastq(): Splits a FASTQ file into a list of records. See the splitFastq operator for available options.
splitJson(): Splits a JSON file into a list of records. See the splitJson operator for available options.
splitText(): Splits a text file into a list of lines. See the splitText operator for available options.