Installation guide

We recommend that you use Miniconda with bioconda to install amplimap and its requirements (such as Python 3.6, read aligners, etc). If you have Docker you can also use our Dockerfile instead: Installing amplimap through Docker.

If your machine already has all of the required software installed you can also install amplimap through pip. For more details, see Installing amplimap through pip.

Installing amplimap through Conda

1. Install Miniconda 3

Download and install Miniconda with Python 3 from the conda website.

2. Install amplimap environment

Download amplimap’s environment.yml file and use it to install amplimap and its requirements into a new conda environment:

conda env create --file environment.yml

If you want to run germline variant calling and annotation you also need to download and install Annovar manually. Make sure you also download the relevant indices for the reference genome you want to use and add the directory containing the Annovar scripts to your PATH environment variable.

3. Activate amplimap environment

Load the amplimap environnent by running this command:

conda activate amplimap

You only need to run this command once per session, e.g. when you open a new terminal window.

Your command line prompt should now start with (amplimap). Run amplimap --version to confirm that the correct version of amplimap has been installed and activated.

4. Set up your reference genome and indices

Download the DNA (FASTA) file for the reference genome that you want to use, for example from the Ensembl FTP or iGenomes. When in doubt we recommend using the primary_assembly file from Ensembl, for example Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.

Once you have downloaded this file you need to prepare it for use in amplimap:

# decompress the file (if it ends in .gz)
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
# index FASTA file
samtools faidx Homo_sapiens.GRCh38.dna.primary_assembly.fa
# create dictionary for GATK
picard CreateSequenceDictionary R=Homo_sapiens.GRCh38.dna.primary_assembly.fa
# build bwa index, if you want to use bwa:
bwa index Homo_sapiens.GRCh38.dna.primary_assembly.fa
# build bowtie2 index, if you want to use bowtie2:
bowtie2-build Homo_sapiens.GRCh38.dna.primary_assembly.fa Homo_sapiens.GRCh38.dna.primary_assembly.fa

5. Update amplimap configuration

Finally, we recommend that you add the paths of the reference genome files to your config_default.yaml. This way, you don’t need to specify these paths in every single directory-specific config.yaml. To find out where this file is located run:

amplimap --basedir

Open the file config_default.yaml at this location and look for the settings under paths: corresponding to the indices you created.

Replace these with the full paths to your files. If you haven’t generated one of the files leave the corresponding setting empty. For example, if you generated indices for bwa and bowtie2 (but not STAR or Annovar) and always used the same FASTA filename as the prefix:

paths:
  hg38:
    bwa: "/home/user/amplimap/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
    bowtie2: "/home/user/amplimap/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
    fasta: "/home/user/amplimap/Homo_sapiens.GRCh38.dna.primary_assembly.fa"

If you are working with a different reference genome change hg38: to the appropriate abbreviation (e.g. mm10:) and also update the line genome_name: "hg38" below.

If you are using Annovar make sure you also provide the path to its indices directory under paths: and adjust the protocols/operations under annotate: annovar: protocols: to match the indices you have downloaded.

Save the file and confirm that the settings are being read correctly by looking at the output of amplimap --print-config.

6. Run amplimap!

Now you are ready to run amplimap! Prepare a working directory (see Input and the working directory), change into it using cd and then run amplimap to get started.

If you get a message about the command not being found please make sure you activated the conda environment as described above.

Installing amplimap through Docker

We also have a Docker image available. To use this, install Docker and then prefix your amplimap commands with docker run koelling/amplimap, forwarding directories from your host into the docker container using Docker’s -v parameter.

For example, here are some commands you could use to prepare indices for an E. coli reference genome FASTA located under ~/references/ecoli.fasta and then run amplimap on some example data located in ~/data/example_wd:

# download the docker image (only need to run this once)
docker pull koelling/amplimap

# check version
docker run koelling/amplimap amplimap --version

# build indices for ~/references/ecoli.fasta
docker run -v ~/references:/references koelling/amplimap samtools faidx /references/ecoli.fasta
docker run -v ~/references:/references koelling/amplimap picard CreateSequenceDictionary R=/references/ecoli.fasta
docker run -v ~/references:/references koelling/amplimap bwa index /references/ecoli.fasta

# run amplimap with working directory ~/data/example_wd
docker run -v ~/references:/references -v ~/data:/data koelling/amplimap amplimap --working-directory=/data/example_wd coverages pileups variants

Note that in this example you would have to provide the paths to your reference genome in the ~/data/example_wd/config.yaml file:

paths:
  ecoli:
    bwa: "/references/ucsc.ecoli.fasta"
    fasta: "/references/ucsc.ecoli.fasta"
general:
  genome_name: "ecoli"

You can avoid having to specify these paths every time by running a shell inside the Docker container and adding your reference genome to your config_default.yaml as described here: 4. Set up your reference genome and indices.

To get a bash shell inside the Docker container:

docker run -t -i koelling/amplimap /bin/bash

To annotate variant calls, also install Annovar inside the Docker container and add the path to the Annovar indices to your config. Make sure you also add the directory containing the Annovar Perl scripts to your PATH so that amplimap can find them.

Installing amplimap through pip

If you already have all of the required external software available (see Requirements) you can install amplimap directly through pip. Please note that this requires Python 3.5 or 3.6 and does not currently work with Python 3.7 due to problems with the pysam package. It also does not work with any Python version lower than 3.5.

If you do not have the dependencies and the right version of Python available please see Installing amplimap through Conda.

# you may need to use `pip` instead of `pip3`
pip3 install amplimap

If this does not work, you can try to install it manually:

# install required python3 packages
# you may need to use `pip` instead of `pip3`
pip3 install setuptools Cython numpy

# download and install amplimap
# you may need to use `python` instead of `python3`
git clone --depth=1 https://github.com/koelling/amplimap.git
cd amplimap
python3 setup.py install

You can also download our requirements.txt file, which contains a full list of all Python packages used by amplimap, and a known working version.

To finish setting up amplimap you probably want to add the paths to the reference genome files you will be using (e.g. bwa index and reference genome fasta) to the Default configuration. See 4. Set up your reference genome and indices for more details.

Requirements

Please note that, other than the Linux environment and the reference genome files, all requirements will be installed automatically when you install amplimap through conda.

  • Linux environment (should also work on MacOS, Windows 10 Linux Subsystem)
  • Python 3.5 or 3.6 with setuptools, Cython and numpy
    • Further Python dependencies are listed in requirements.txt but can also be installed automatically by setup.py.
  • Required software:
    • At least one read aligner: BWA (tested with v0.7.12), Bowtie2 (tested with v2.2.5), STAR (tested with v2.5.1b)
    • bedtools (tested with v2.27.1)
    • samtools (tested with v1.5)
  • Additional software for germline variant calling (optional):
    • At least one variant caller: Platypus 0.8.1+, GATK 4+, Octopus
    • Annovar (tested with v2015-06-17)
    • bcftools (tested with v1.5)
  • Additional software for low-frequency variant calling (optional):
    • Mutect2 (from GATK 4, tested with v4.0)
  • Additional software for capture probe processing (optional):
    • Picard Tools 2+ (tested with v2.3.0)
  • Reference genome FASTA file, with indices