========================================= Installation guide ========================================= We recommend that you use `Miniconda `_ with `bioconda `_ to install amplimap and its requirements (such as Python 3.6, read aligners, etc). If you have `Docker `_ you can also use our Dockerfile instead: :ref:`installation-docker`. If your machine already has all of the required software installed you can also install amplimap through pip. For more details, see :ref:`installation-pip`. .. _installation-miniconda: Installing amplimap through Conda ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Install Miniconda 3 ----------------------- Download and install `Miniconda with Python 3 `_ from the conda website. 2. Install amplimap environment -------------------------------- Download amplimap's :download:`environment.yml file <../environment.yml>` and use it to install amplimap and its requirements into a new conda environment: :: conda env create --file environment.yml .. conda create --name amplimap 'python>=3.4' pip setuptools numpy cython bwa bowtie2 star bedtools samtools bcftools gatk4 picard .. conda activate amplimap .. #conda env export > environment.yml If you want to run germline variant calling and annotation you also need to `download and install Annovar `_ manually. Make sure you also download the relevant indices for the reference genome you want to use and add the directory containing the Annovar scripts to your ``PATH`` environment variable. 3. Activate amplimap environment ------------------------------------------------ Load the amplimap environnent by running this command: :: conda activate amplimap You only need to run this command once per session, e.g. when you open a new terminal window. Your command line prompt should now start with ``(amplimap)``. Run ``amplimap --version`` to confirm that the correct version of amplimap has been installed and activated. .. _installation-setup: 4. Set up your reference genome and indices ------------------------------------------- Download the DNA (FASTA) file for the reference genome that you want to use, for example from the `Ensembl FTP `_ or `iGenomes `_. When in doubt we recommend using the primary_assembly file from Ensembl, for example ``Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz``. Once you have downloaded this file you need to prepare it for use in amplimap: :: # decompress the file (if it ends in .gz) gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz # index FASTA file samtools faidx Homo_sapiens.GRCh38.dna.primary_assembly.fa # create dictionary for GATK picard CreateSequenceDictionary R=Homo_sapiens.GRCh38.dna.primary_assembly.fa # build bwa index, if you want to use bwa: bwa index Homo_sapiens.GRCh38.dna.primary_assembly.fa # build bowtie2 index, if you want to use bowtie2: bowtie2-build Homo_sapiens.GRCh38.dna.primary_assembly.fa Homo_sapiens.GRCh38.dna.primary_assembly.fa 5. Update amplimap configuration ------------------------------------------ Finally, we recommend that you add the paths of the reference genome files to your ``config_default.yaml``. This way, you don't need to specify these paths in every single directory-specific ``config.yaml``. To find out where this file is located run: :: amplimap --basedir Open the file ``config_default.yaml`` at this location and look for the settings under ``paths:`` corresponding to the indices you created. Replace these with the full paths to your files. If you haven't generated one of the files leave the corresponding setting empty. For example, if you generated indices for bwa and bowtie2 (but not STAR or Annovar) and always used the same FASTA filename as the prefix: :: paths: hg38: bwa: "/home/user/amplimap/Homo_sapiens.GRCh38.dna.primary_assembly.fa" bowtie2: "/home/user/amplimap/Homo_sapiens.GRCh38.dna.primary_assembly.fa" fasta: "/home/user/amplimap/Homo_sapiens.GRCh38.dna.primary_assembly.fa" If you are working with a different reference genome change ``hg38:`` to the appropriate abbreviation (e.g. ``mm10:``) and also update the line ``genome_name: "hg38"`` below. If you are using Annovar make sure you also provide the path to its indices directory under ``paths:`` and adjust the protocols/operations under ``annotate: annovar: protocols:`` to match the indices you have downloaded. Save the file and confirm that the settings are being read correctly by looking at the output of ``amplimap --print-config``. 6. Run amplimap! ------------------- Now you are ready to run amplimap! Prepare a working directory (see :ref:`usage`), change into it using ``cd`` and then run ``amplimap`` to get started. If you get a message about the command not being found please make sure you activated the conda environment as described above. .. _installation-docker: Installing amplimap through Docker ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We also have a `Docker image `_ available. To use this, install `Docker `_ and then prefix your amplimap commands with ``docker run koelling/amplimap``, forwarding directories from your host into the docker container using Docker's ``-v`` parameter. For example, here are some commands you could use to prepare indices for an *E. coli* :download:`reference genome FASTA <../sample_data/ecoli.fasta>` located under ``~/references/ecoli.fasta`` and then run amplimap on some :download:`example data <../sample_data/example_wd.tar>` located in ``~/data/example_wd``: :: # download the docker image (only need to run this once) docker pull koelling/amplimap # check version docker run koelling/amplimap amplimap --version # build indices for ~/references/ecoli.fasta docker run -v ~/references:/references koelling/amplimap samtools faidx /references/ecoli.fasta docker run -v ~/references:/references koelling/amplimap picard CreateSequenceDictionary R=/references/ecoli.fasta docker run -v ~/references:/references koelling/amplimap bwa index /references/ecoli.fasta # run amplimap with working directory ~/data/example_wd docker run -v ~/references:/references -v ~/data:/data koelling/amplimap amplimap --working-directory=/data/example_wd coverages pileups variants Note that in this example you would have to provide the paths to your reference genome in the ``~/data/example_wd/config.yaml`` file: :: paths: ecoli: bwa: "/references/ucsc.ecoli.fasta" fasta: "/references/ucsc.ecoli.fasta" general: genome_name: "ecoli" You can avoid having to specify these paths every time by running a shell inside the Docker container and adding your reference genome to your ``config_default.yaml`` as described here: :ref:`installation-setup`. To get a bash shell inside the Docker container: :: docker run -t -i koelling/amplimap /bin/bash To annotate variant calls, also install Annovar inside the Docker container and add the path to the Annovar indices to your config. Make sure you also add the directory containing the Annovar Perl scripts to your ``PATH`` so that amplimap can find them. .. _installation-pip: Installing amplimap through pip ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you already have all of the required external software available (see :ref:`installation-requirements`) you can install amplimap directly through pip. Please note that this **requires Python 3.5 or 3.6** and does not currently work with Python 3.7 due to problems with the pysam package. It also does not work with any Python version lower than 3.5. If you do not have the dependencies and the right version of Python available please see :ref:`installation-miniconda`. :: # you may need to use `pip` instead of `pip3` pip3 install amplimap If this does not work, you can try to install it manually: :: # install required python3 packages # you may need to use `pip` instead of `pip3` pip3 install setuptools Cython numpy # download and install amplimap # you may need to use `python` instead of `python3` git clone --depth=1 https://github.com/koelling/amplimap.git cd amplimap python3 setup.py install You can also :download:`download our requirements.txt file <../requirements.txt>`, which contains a full list of all Python packages used by amplimap, and a known working version. To finish setting up amplimap you probably want to add the paths to the reference genome files you will be using (e.g. bwa index and reference genome fasta) to the :ref:`default-config`. See :ref:`installation-setup` for more details. .. _installation-requirements: Requirements ~~~~~~~~~~~~~~~ Please note that, other than the Linux environment and the reference genome files, all requirements **will be installed automatically** when you install amplimap through conda. - Linux environment (should also work on MacOS, Windows 10 Linux Subsystem) - Python 3.5 or 3.6 with setuptools, Cython and numpy - Further Python dependencies are listed in ``requirements.txt`` but can also be installed automatically by ``setup.py``. - Required software: - At least one read aligner: BWA (tested with v0.7.12), Bowtie2 (tested with v2.2.5), STAR (tested with v2.5.1b) - bedtools (tested with v2.27.1) - samtools (tested with v1.5) - Additional software for germline variant calling (optional): - At least one variant caller: Platypus 0.8.1+, GATK 4+ - Annovar (tested with v2015-06-17) - bcftools (tested with v1.5) - Additional software for low-frequency variant calling (optional): - Mutect2 (from GATK 4, tested with v4.0) - Additional software for capture probe processing (optional): - Picard Tools 2+ (tested with v2.3.0) - Reference genome FASTA file, with indices