Getting started

To run AdapterRemoval on single-end FASTQ data, simply specify the location of FASTQ file(s) using the --file1 command-line options:

adapterremoval3 --file1 myreads_1.fastq.gz

To run AdapterRemoval on paired-end FASTQ data, specify the location of the mate 1 and mate 2 FASTQ files using the --file1 and --file2 command-line options:

adapterremoval3 --file1 myreads_1.fastq.gz --file2 myreads_2.fastq.gz

The files may be uncompressed or gzip-compressed. When run in this manner, AdapterRemoval will save the trimmed reads in the current working directly, using filenames starting with 'your_output'. This behavior may be changed using the --basename option, or using specific options for each output file. See the Input and output page for more information about files generated by AdapterRemoval.

More examples of common usage may be found in the Example usage section of the documentation.

A note on specifying adapters

AdapterRemoval uses the expected adapter sequences as part of the trimming/alignment process (see Detailed overview). It is therefore extremely important to specify the correct adapter sequences when running AdapterRemoval on a dataset that does not make use of these adapters. Failure to do so will result in the wrong sequences being trimmed, and actual adapter sequences being left in the resulting "trimmed" reads.

By default, AdapterRemoval is setup to trim the published Illumina TruSeq sequences, which should be applicable to most Illumina data, corresponding to the following command-line options:

adapterremoval3 --adapter1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

For BGISEQ/DNBSEQ/MGISEQ data, the published BGI adapter sequences should be used:

adapterremoval3 --adapter1 AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA --adapter2 AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG

Adapter sequences are specified in the read orientation when using the --adapter1 and --adapter2 command-line options, directly corresponding to the sequence that is observed in the FASTQ files produced by the base calling software. If we were processing data generated using the above TrueSeq adapters, then we would therefore expect to find those sequences as-is in our FASTQ files (assuming that the read lengths are sufficiently long and that insert sizes are sufficiently short):

$ grep "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" file1.fastq
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACAAGAAT
CTGGAGTTCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTGAAAAAAA
GGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTGCAAATTGAAAACAC

$ grep "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" file2.fastq
CAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTCAAAAAAAGAAAAACATCTTG
GAACTCCAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTCAAAAAAAATAGA
GAACTAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTCAAAAACATAAGACCTA

For paired-end data, the --identify-adapters mode may be used to verify the choice of adapters, by attempting to reconstruct the adapter sequence directly from the FASTQ reads. See the Example usage section for a demonstration of this functionality.

An 'N' in an adapter sequence is treated as a wildcard. An N will align against any other base, including Ns, but do not affect the score of the resulting alignment and are not counted as for the purpose of filters such as --minadapteroverlap.