Installation guide¶
We recommend that you use Miniconda with bioconda to install amplimap and its requirements (such as Python 3.6, read aligners, etc). If you have Docker you can also use our Dockerfile instead: Installing amplimap through Docker.
If your machine already has all of the required software installed you can also install amplimap through pip. For more details, see Installing amplimap through pip.
Installing amplimap through Conda¶
1. Install Miniconda 3¶
Download and install Miniconda with Python 3 from the conda website.
2. Install amplimap environment¶
Download amplimap’s environment.yml file
and use it to install amplimap and its requirements
into a new conda environment:
conda env create --file environment.yml
If you want to run germline variant calling and annotation you also need to download and install
Annovar manually. Make sure you also download
the relevant indices for the reference genome you want to use and add the directory containing the Annovar scripts to your PATH
environment variable.
3. Activate amplimap environment¶
Load the amplimap environnent by running this command:
conda activate amplimap
You only need to run this command once per session, e.g. when you open a new terminal window.
Your command line prompt should now start with (amplimap)
.
Run amplimap --version
to confirm that the correct version of
amplimap has been installed and activated.
4. Set up your reference genome and indices¶
Download the DNA (FASTA) file for the reference genome that you want to use, for example from the Ensembl
FTP
or iGenomes.
When in doubt we recommend using the
primary_assembly file from Ensembl, for example Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
.
Once you have downloaded this file you need to prepare it for use in amplimap:
# decompress the file (if it ends in .gz)
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
# index FASTA file
samtools faidx Homo_sapiens.GRCh38.dna.primary_assembly.fa
# create dictionary for GATK
picard CreateSequenceDictionary R=Homo_sapiens.GRCh38.dna.primary_assembly.fa
# build bwa index, if you want to use bwa:
bwa index Homo_sapiens.GRCh38.dna.primary_assembly.fa
# build bowtie2 index, if you want to use bowtie2:
bowtie2-build Homo_sapiens.GRCh38.dna.primary_assembly.fa Homo_sapiens.GRCh38.dna.primary_assembly.fa
5. Update amplimap configuration¶
Finally, we recommend that you add the paths of the reference genome files to your config_default.yaml
.
This way, you don’t need to specify these paths in every single directory-specific config.yaml
.
To find out where this file is located run:
amplimap --basedir
Open the file config_default.yaml
at this location and look for the settings under paths:
corresponding to the indices you created.
Replace these with the full paths to your files. If you haven’t generated one of the files leave the corresponding setting empty. For example, if you generated indices for bwa and bowtie2 (but not STAR or Annovar) and always used the same FASTA filename as the prefix:
paths:
hg38:
bwa: "/home/user/amplimap/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
bowtie2: "/home/user/amplimap/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
fasta: "/home/user/amplimap/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
If you are working with a different reference genome change hg38:
to the appropriate abbreviation (e.g. mm10:
)
and also update the line genome_name: "hg38"
below.
If you are using Annovar make sure you also provide the path to its indices directory under paths:
and adjust the protocols/operations under annotate: annovar: protocols:
to match the indices you
have downloaded.
Save the file and confirm that the settings are being read correctly by looking at the output of amplimap --print-config
.
6. Run amplimap!¶
Now you are ready to run amplimap! Prepare a working directory
(see Input and the working directory), change into it using cd
and then run
amplimap
to get started.
If you get a message about the command not being found please make sure you activated the conda environment as described above.
Installing amplimap through Docker¶
We also have a Docker image
available.
To use this, install Docker and then
prefix your amplimap commands with docker run koelling/amplimap
,
forwarding directories from your host into the docker container using
Docker’s -v
parameter.
For example, here are some commands you could use to prepare
indices for an E. coli reference genome FASTA
located under ~/references/ecoli.fasta
and then run amplimap
on some example data
located in ~/data/example_wd
:
# download the docker image (only need to run this once)
docker pull koelling/amplimap
# check version
docker run koelling/amplimap amplimap --version
# build indices for ~/references/ecoli.fasta
docker run -v ~/references:/references koelling/amplimap samtools faidx /references/ecoli.fasta
docker run -v ~/references:/references koelling/amplimap picard CreateSequenceDictionary R=/references/ecoli.fasta
docker run -v ~/references:/references koelling/amplimap bwa index /references/ecoli.fasta
# run amplimap with working directory ~/data/example_wd
docker run -v ~/references:/references -v ~/data:/data koelling/amplimap amplimap --working-directory=/data/example_wd coverages pileups variants
Note that in this example you would have to provide the paths to your reference genome
in the ~/data/example_wd/config.yaml
file:
paths:
ecoli:
bwa: "/references/ucsc.ecoli.fasta"
fasta: "/references/ucsc.ecoli.fasta"
general:
genome_name: "ecoli"
You can avoid having to specify these paths every time by running a shell inside the Docker container
and adding your reference genome to your config_default.yaml
as described here: 4. Set up your reference genome and indices.
To get a bash shell inside the Docker container:
docker run -t -i koelling/amplimap /bin/bash
To annotate variant calls, also install Annovar inside the Docker container
and add the path to the Annovar indices to your config.
Make sure you also add the directory containing
the Annovar Perl scripts to your PATH
so that amplimap can find them.
Installing amplimap through pip¶
If you already have all of the required external software available (see Requirements) you can install amplimap directly through pip. Please note that this requires Python 3.5 or 3.6 and does not currently work with Python 3.7 due to problems with the pysam package. It also does not work with any Python version lower than 3.5.
If you do not have the dependencies and the right version of Python available please see Installing amplimap through Conda.
# you may need to use `pip` instead of `pip3`
pip3 install amplimap
If this does not work, you can try to install it manually:
# install required python3 packages
# you may need to use `pip` instead of `pip3`
pip3 install setuptools Cython numpy
# download and install amplimap
# you may need to use `python` instead of `python3`
git clone --depth=1 https://github.com/koelling/amplimap.git
cd amplimap
python3 setup.py install
You can also download our requirements.txt file
,
which contains a full list of all Python packages used by amplimap, and a known
working version.
To finish setting up amplimap you probably want to add the paths to the reference genome files you will be using (e.g. bwa index and reference genome fasta) to the Default configuration. See 4. Set up your reference genome and indices for more details.
Requirements¶
Please note that, other than the Linux environment and the reference genome files, all requirements will be installed automatically when you install amplimap through conda.
- Linux environment (should also work on MacOS, Windows 10 Linux Subsystem)
- Python 3.5 or 3.6 with setuptools, Cython and numpy
- Further Python dependencies are listed in
requirements.txt
but can also be installed automatically bysetup.py
.
- Further Python dependencies are listed in
- Required software:
- At least one read aligner: BWA (tested with v0.7.12), Bowtie2 (tested with v2.2.5), STAR (tested with v2.5.1b)
- bedtools (tested with v2.27.1)
- samtools (tested with v1.5)
- Additional software for germline variant calling (optional):
- At least one variant caller: Platypus 0.8.1+, GATK 4+, Octopus
- Annovar (tested with v2015-06-17)
- bcftools (tested with v1.5)
- Additional software for low-frequency variant calling (optional):
- Mutect2 (from GATK 4, tested with v4.0)
- Additional software for capture probe processing (optional):
- Picard Tools 2+ (tested with v2.3.0)
- Reference genome FASTA file, with indices