Input and output

Input files

FASTQ files

AdapterRemoval v3 expects standard FASTQ files as input specified using the --in-file1 and --in-file2 options. DNA sequences and quality scores are expected to be written on a single line, and DNA sequences are expected to contain only standard bases A, C, G, T, and N (ambiguous bases). AdapterRemoval v3 supports reading of FASTQ files with Phred+33, Phred+64, and Solexa encoded quality scores. Input may optionally be gzip compressed.

When specifying paired reads, the FASTQ record names are expected to be identical excepting for the last two characters. The last two characters typically indicate the mate number, and consist of a matching separator character followed by either 1 or 2. Examples include /1 and /2, and .1 and .2.

Table of adapters

The table used with --adapter-table is expected to be a plain-text table containing one or two whitespace-separated columns, with each line containing the same number of columns:

AGATCGGAAGAGCACACGTCTGAACTCCAGTCA  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA   AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG

Column one corresponds to --adapter1 and column two corresponds to --adapter2. Only one column is required in single-end mode, while both columns are required in paired-end mode.

Table of barcodes

The table used with --barcode-table is expected to be a plain-text table containing two or three whitespace-separated columns, with each line containing the same number of columns:

sample_1 ATGCGGA TGAATCT
sample_2 ATGGATT ATAGTGA
sample_7 CAAAACT TCGCTGC

Sample names in the first column may only contain letters, numbers and underscores _. The barcodes in the second column are expected to be found at the 5' of mate 1 reads and the barcode in the third column is expected to be found at the 5' of mate 2 reads.

If --barcode-orientation explicit is used, the table is expected to contain a fourth column specifying the orientation of each barcode pair:

sample_1 ATGCGGA TGAATCT forward
sample_1 TGAATCT ATGCGGA reverse
sample_2 ATGGATT ATAGTGA reverse
sample_7 CAAAACT TCGCTGC forward

The following names for orientations are supported: forward, reverse, fwd, rev, +, and -. This table is used as is, and the "missing" orientations are not automatically generated.

Output files

During a normal run, AdapterRemoval will generate a combination of trimmed reads and report files. For a description of how this is controlled, see Detailed overview. In the following examples, the value ${prefix} is used to represent the path set using --out-prefix.

FASTQ output

When not performing demultiplexing, AdapterRemoval will generate the following files:

  • ${prefix}.r1.fastq.gz

  • ${prefix}.r2.fastq.gz (paired-end mode)

  • ${prefix}.singleton.fastq.gz (paired-end mode)

  • ${prefix}.merged.fastq.gz (paired-end mode, if read merging is enabled)

If demultiplexing is enabled, the following files will be generated with --out-prefix, with the key ${sample} additionally replaced with each sample name:

  • ${prefix}.unidentified.r1.fastq.gz

  • ${prefix}.unidentified.r2.fastq.gz (paired-end mode)

  • ${prefix}.${sample}.r1.fastq.gz

  • ${prefix}.${sample}.r2.fastq.gz (paired-end mode)

  • ${prefix}.${sample}.singleton.fastq.gz (paired-end mode)

  • ${prefix}.${sample}.merged.fastq.gz (paired-end mode, if read merging is enabled)

The unidentified files contain any reads for which no sample could be identified using the supplied barcode sequences. Note that no trimming, filtering, or merging is performed if --demultiplex-only is used.

SAM/BAM files

When using --out-format with .sam, .sam.gz, or .bam, AdapterRemoval defaults to writing a single file containing all output, including discarded reads. For example, for BAM output:

  • ${prefix}.bam

Or when demultiplexing is enabled, replacing the key ${sample} with each sample name:

  • ${prefix}.json

  • ${prefix}.html

  • ${prefix}.unidentified.bam

  • ${prefix}.${sample}.bam

This can be overridden as described above, using the individual --out options. To exclude discarded reads from the output, use --out-discarded /dev/null.

SAM/BAM tags

When demultiplexing, AdapterRemoval will assign read-groups based on the barcodes, using the provided sample names. For samples with multiple barcodes, a read-group will be created for each barcode or barcode pair. When --barcode-orientation is used, AdapterRemoval will additionally record the orientation of the barcodes using or tags in the corresponding @RG headers.

Additional read-group information can be set using --read-group, which also adds read-group information when not demultiplexing.

HTML report

When using the --out-html option or when using --out-prefix, AdapterRemoval writes a single-file HTML report using the default filename ${prefix}.html.

An example of the HTML report generated by AdapterRemoval can be found here: Example HTML report

Warning

The HTML report currently requires an active internet connection, due to depending on external JavaScript files for drawing figures.

JSON report

When using the --out-json option or when using --out-prefix, AdapterRemoval writes a JSON report using the default filename ${prefix}.json.

An example of the JSON report generated by AdapterRemoval can be found here: Example JSON report. A full JSON schema defining the content of the report can be found here: Report JSON schema