Code documentation

Common

amplimap.common.find_umi_groups(umi_counts: dict, id_offset=0) → dict

Calculate dict with UMI group ID (values) for each raw UMI (keys).

Parameters:
  • umi_counts (dict) – Counts for each raw UMI sequence, keys must be bytes (not unicode strings).
  • id_offset (int) – Optional offset to add to all UMI IDs.
amplimap.common.make_extended_read_name(original_name: str, probe: str, umi: str) → str

Generate read name that contains probe name and UMI.

amplimap.common.parse_extended_read_name(extended_read_name: str) -> (<class 'str'>, <class 'str'>, <class 'str'>)

Obtain original read name, probe name and UMI from extended read name.

File reader

exception amplimap.reader.AmplimapReaderException(e: Exception, filename: str, should_have_header: bool = None)

Will be raised if reading one of the standard input files has failed, with the aim of providing a more useful error message in the user-visible output.

amplimap.reader.get_code_versions(path: str = '.') → dict

Get the file modification times of common code files for versioning.

amplimap.reader.get_file_hashes(path: str = '.') → dict

Get SHA256 hashes for common input files, so we can make sure these didn’t change between runs.

amplimap.reader.merge_probes_by_id(rows: pandas.core.frame.DataFrame) → pandas.core.series.Series

Merge multiple probes.csv rows with the same probe ID together, to handle cases where MIPGEN generated multiple version because of a SNP.

amplimap.reader.process_probe_design(design: pandas.core.frame.DataFrame, reference_type: str = 'genome') → pandas.core.frame.DataFrame

Read amplimap probes.csv file and return pandas dataframe.

amplimap.reader.read_and_convert_heatseq_probes(path: str) → pandas.core.frame.DataFrame

UNTESTED: Read probes file from Roche heatseq in CSV format and generate an amplimap probes.csv from it.

amplimap.reader.read_and_convert_mipgen_probes(path: str) → pandas.core.frame.DataFrame

Read probes file from MIPGEN in CSV format and generate an amplimap probes.csv from it.

amplimap.reader.read_new_probe_design(path: str, reference_type: str = 'genome') → pandas.core.frame.DataFrame

Read amplimap probes.csv file and return pandas dataframe.

amplimap.reader.read_sample_info(path: str) → pandas.core.frame.DataFrame

Read amplimap sample_info.csv file and return pandas dataframe.

amplimap.reader.read_snps_txt(path: str, reference_type: str = 'genome') → pandas.core.frame.DataFrame

Read amplimap snps.txt file and return pandas dataframe.

amplimap.reader.read_targets(path: str, check_overlaps: bool = False, reference_type: str = 'genome', file_type: str = 'bed') → pandas.core.frame.DataFrame

Read amplimap targets.csv or targets.bed file and return pandas dataframe.

amplimap.reader.write_targets_bed(path: str, targets: pandas.core.frame.DataFrame)

Write targets dataframe to bed file.

Read parser

Alignment stats

Coverage

amplimap.coverage.aggregate(input, output)

Read coverage summary files and create aggregate files.

Parameters:
  • input – dict containing ‘csvs’, the list of csvs fils to aggregate, and optionally ‘sample_info’, a table with additional sample annotation
  • output – dict containing paths for output files: merged, min_coverage, cov_per_bp, fraction_zero_coverage
amplimap.coverage.fraction_10x_coverage(coverage)

Calculate fraction of bases with coverage 10 or more.

amplimap.coverage.fraction_30x_coverage(coverage)

Calculate fraction of bases with coverage 30 or more.

amplimap.coverage.fraction_zero_coverage(coverage)

Calculate fraction of bases with coverage 0.

amplimap.coverage.process_file(input: str, output: str)

Read raw bedtools coverage file, calculate summary statistics and output them as CSV file.

Parameters:
  • input – path to a bedtools coverage file
  • output – path to the summary CSV file

Variants

amplimap.variants.calculate_del_score(merged: pandas.core.frame.DataFrame)

Add a column DeleteriousScore to dataframe which contains a count of how many tools have assigned this variant a deletious scores.

Score ranges from 0-6, corresponding to the tools SIFT, Polyphen2, LRT, MutationTaster, GERP++ and phyloP100way_vertebrate.

Additionally, any stopgain, frameshift or splicing variants are always set to 6.

amplimap.variants.find_closest_exon(row: pandas.core.series.Series, gexs: dict) → int

Get distance of variant to the closest exon of the gene it has been annotated with.

amplimap.variants.load_gene_exons(file: str, genes: list) → dict

Load list of exon chr, strand, start and end locations for each gene in genes.

amplimap.variants.make_summary(input: list, output: list, config: dict, exon_table_path: str = None)

Load merged Annovar CSV file (plus targets and sample info), process them and output a new CSV file.

amplimap.variants.make_summary_condensed(input, output)

Make condensed summary table that only contains a subset of columns.

amplimap.variants.make_summary_dataframe(merged: pandas.core.frame.DataFrame, targets: pandas.core.frame.DataFrame = None, sample_info: pandas.core.frame.DataFrame = None, genome_name: str = None, include_gbrowse_links: bool = False, include_exon_distance: bool = False, include_score: bool = False, exon_table_path: str = None) → pandas.core.frame.DataFrame

Process merged Annovar dataframe (with optional targets and sample_info data frames) into a large summary table.

amplimap.variants.make_summary_excel(input, output)

UNTESTED: make Excel table for merged table

amplimap.variants.merge_variants(input, output)

Merge individual Annovar CSV files together.

Pileup

exception amplimap.pileup.PileupGroupFilterException(filter_column)

Raised when UMI group fails a pileup filter, with filter_column being the column to count this in.

exception amplimap.pileup.PileupRowFilterException(filter_column, skip_read_pair=True)

Raised when row fails a pileup filter, with filter_column being the column to count this in.

amplimap.pileup.get_al_mate_starts(al: pysam.libcalignedsegment.AlignedSegment)

Get set of mate starts for AlignedSegment, always giving read1 first and read2 second

amplimap.pileup.get_group_consensus(group_calls, min_consensus_count=1, min_consensus_fraction=0.51, ignore_groups=False, debug=False)

Calculate consensus call, count and phred for UMI group.

amplimap.pileup.get_pileup_row(chrom, pos_0, raw_coverage=0, target_id=None, target_type=None, ref=None, validate_probe_targets=False)

Get ordered dict of columns for pileup table.

amplimap.pileup.process_pileup_base(ref, probes_dict, targets_dict, snps_dict, region_index, pileup_base, min_consensus_count, min_consensus_fraction, min_mapq, min_baseq, ignore_groups, group_with_mate_positions, validate_probe_targets, filter_softclipped, no_probe_data, read_metadata, filtered_pair_qnames, umi_to_group, debug)

Process a single basepair of pileup, generating a pileup table row.

amplimap.pileup.process_pileup_read(pr, probes_dict, reference_name, reference_pos_0, read_metadata, min_mapq, ignore_groups, group_with_mate_positions, validate_probe_targets, filter_softclipped, no_probe_data, debug)

Process single pileup read.

amplimap.pileup.process_pileup_row(row, seen_probes, call_groups, snps_dict, ignore_groups, min_consensus_count, min_consensus_fraction, min_baseq, ref, debug=False)

Generate a pileup table row from processed read data and calculate some additional stats and annotation.

amplimap.pileup.record_read_in_group(read_calls, my_call, my_phred, my_umi, read_name)

Add call data from read to read_calls dictionary of name -> call.

Others

exception amplimap.naive_mapper.AmplimapNoAlignment
amplimap.naive_mapper.align_and_find_cigar(read, ref, debug=False, reverse=False) → tuple

Align read and reference sequence using parwise global alignment and return start offset and CIGAR ops.

amplimap.naive_mapper.create_bam(sample, files_in1, files_in2, ref_fasta, probes_dict, output, debug=False)

Create a BAM file with reads placed at their expected locations, adjusted through pairwise alignment to the target sequences.

This will give reasonable results as long as probes capture the exact target sequences, but will generate alignments with many mismatches if there are any discrepancies.

amplimap.naive_mapper.find_cigar_for_alignment(read_len, alignment, debug) → tuple

Generate tuple of start offset and CIGAR operations for given alignment.

amplimap.run.check_config_keys(default_config, my_config, path=[])

Recursively check that config keys provided in my_config also exist in default_config (ignoring ‘paths’ and ‘clusters’).

amplimap.run.compare_config_dicts(my_config, used_config, path=[])

Recursively search for differences in values between two dicts.

amplimap.run.main(argv=None)

Run amplimap.