Code documentation¶

Common¶

amplimap.common.find_umi_groups(umi_counts: dict, id_offset=0) → dict¶

Calculate dict with UMI group ID (values) for each raw UMI (keys).

Parameters:	umi_counts (dict) – Counts for each raw UMI sequence, keys must be bytes (not unicode strings). id_offset (int) – Optional offset to add to all UMI IDs.

amplimap.common.make_extended_read_name(original_name: str, probe: str, umi: str) → str¶: Generate read name that contains probe name and UMI.

amplimap.common.parse_extended_read_name(extended_read_name: str) -> (<class 'str'>, <class 'str'>, <class 'str'>)¶: Obtain original read name, probe name and UMI from extended read name.

File reader¶

exception amplimap.reader.AmplimapReaderException(e: Exception, filename: str, should_have_header: bool = None)¶: Will be raised if reading one of the standard input files has failed, with the aim of providing a more useful error message in the user-visible output.

amplimap.reader.get_code_versions(path: str = '.') → dict¶: Get the file modification times of common code files for versioning.

amplimap.reader.get_file_hashes(path: str = '.') → dict¶: Get SHA256 hashes for common input files, so we can make sure these didn’t change between runs.

amplimap.reader.merge_probes_by_id(rows: pandas.core.frame.DataFrame) → pandas.core.series.Series¶: Merge multiple probes.csv rows with the same probe ID together, to handle cases where MIPGEN generated multiple version because of a SNP.

amplimap.reader.process_probe_design(design: pandas.core.frame.DataFrame, reference_type: str = 'genome') → pandas.core.frame.DataFrame¶: Read amplimap probes.csv file and return pandas dataframe.

amplimap.reader.read_and_convert_heatseq_probes(path: str) → pandas.core.frame.DataFrame¶: UNTESTED: Read probes file from Roche heatseq in CSV format and generate an amplimap probes.csv from it.

amplimap.reader.read_and_convert_mipgen_probes(path: str) → pandas.core.frame.DataFrame¶: Read probes file from MIPGEN in CSV format and generate an amplimap probes.csv from it.

amplimap.reader.read_new_probe_design(path: str, reference_type: str = 'genome') → pandas.core.frame.DataFrame¶: Read amplimap probes.csv file and return pandas dataframe.

amplimap.reader.read_sample_info(path: str) → pandas.core.frame.DataFrame¶: Read amplimap sample_info.csv file and return pandas dataframe.

amplimap.reader.read_snps_txt(path: str, reference_type: str = 'genome') → pandas.core.frame.DataFrame¶: Read amplimap snps.txt file and return pandas dataframe.

amplimap.reader.read_targets(path: str, check_overlaps: bool = False, reference_type: str = 'genome', file_type: str = 'bed') → pandas.core.frame.DataFrame¶: Read amplimap targets.csv or targets.bed file and return pandas dataframe.

amplimap.reader.write_targets_bed(path: str, targets: pandas.core.frame.DataFrame)¶: Write targets dataframe to bed file.

Read parser¶

Alignment stats¶

Coverage¶

amplimap.coverage.aggregate(input, output)¶

Read coverage summary files and create aggregate files.

Parameters:	input – dict containing ‘csvs’, the list of csvs fils to aggregate, and optionally ‘sample_info’, a table with additional sample annotation output – dict containing paths for output files: merged, min_coverage, cov_per_bp, fraction_zero_coverage

amplimap.coverage.fraction_10x_coverage(coverage)¶: Calculate fraction of bases with coverage 10 or more.

amplimap.coverage.fraction_30x_coverage(coverage)¶: Calculate fraction of bases with coverage 30 or more.

amplimap.coverage.fraction_zero_coverage(coverage)¶: Calculate fraction of bases with coverage 0.

amplimap.coverage.process_file(input: str, output: str)¶

Read raw bedtools coverage file, calculate summary statistics and output them as CSV file.

Parameters:	input – path to a bedtools coverage file output – path to the summary CSV file

Variants¶

amplimap.variants.calculate_del_score(merged: pandas.core.frame.DataFrame)¶

Add a column DeleteriousScore to dataframe which contains a count of how many tools have assigned this variant a deletious scores.

Score ranges from 0-6, corresponding to the tools SIFT, Polyphen2, LRT, MutationTaster, GERP++ and phyloP100way_vertebrate.

Additionally, any stopgain, frameshift or splicing variants are always set to 6.

amplimap.variants.find_closest_exon(row: pandas.core.series.Series, gexs: dict) → int¶: Get distance of variant to the closest exon of the gene it has been annotated with.

amplimap.variants.load_gene_exons(file: str, genes: list) → dict¶: Load list of exon chr, strand, start and end locations for each gene in genes.

amplimap.variants.make_summary(input: list, output: list, config: dict, exon_table_path: str = None)¶: Load merged Annovar CSV file (plus targets and sample info), process them and output a new CSV file.

amplimap.variants.make_summary_condensed(input, output)¶: Make condensed summary table that only contains a subset of columns.

amplimap.variants.make_summary_dataframe(merged: pandas.core.frame.DataFrame, targets: pandas.core.frame.DataFrame = None, sample_info: pandas.core.frame.DataFrame = None, genome_name: str = None, include_gbrowse_links: bool = False, include_exon_distance: bool = False, include_score: bool = False, exon_table_path: str = None) → pandas.core.frame.DataFrame¶: Process merged Annovar dataframe (with optional targets and sample_info data frames) into a large summary table.

amplimap.variants.make_summary_excel(input, output)¶: UNTESTED: make Excel table for merged table

amplimap.variants.merge_variants(input, output)¶: Merge individual Annovar CSV files together.

Pileup¶

exception amplimap.pileup.PileupGroupFilterException(filter_column)¶: Raised when UMI group fails a pileup filter, with filter_column being the column to count this in.

exception amplimap.pileup.PileupRowFilterException(filter_column, skip_read_pair=True)¶: Raised when row fails a pileup filter, with filter_column being the column to count this in.

amplimap.pileup.get_al_mate_starts(al: pysam.libcalignedsegment.AlignedSegment)¶: Get set of mate starts for AlignedSegment, always giving read1 first and read2 second

amplimap.pileup.get_group_consensus(group_calls, min_consensus_count=1, min_consensus_fraction=0.51, ignore_groups=False, debug=False)¶: Calculate consensus call, count and phred for UMI group.

amplimap.pileup.get_pileup_row(chrom, pos_0, raw_coverage=0, target_id=None, target_type=None, ref=None, validate_probe_targets=False)¶: Get ordered dict of columns for pileup table.

amplimap.pileup.process_pileup_base(ref, probes_dict, targets_dict, snps_dict, region_index, pileup_base, min_consensus_count, min_consensus_fraction, min_mapq, min_baseq, ignore_groups, group_with_mate_positions, validate_probe_targets, filter_softclipped, no_probe_data, read_metadata, filtered_pair_qnames, umi_to_group, debug)¶: Process a single basepair of pileup, generating a pileup table row.

amplimap.pileup.process_pileup_read(pr, probes_dict, reference_name, reference_pos_0, read_metadata, min_mapq, ignore_groups, group_with_mate_positions, validate_probe_targets, filter_softclipped, no_probe_data, debug)¶: Process single pileup read.

amplimap.pileup.process_pileup_row(row, seen_probes, call_groups, snps_dict, ignore_groups, min_consensus_count, min_consensus_fraction, min_baseq, ref, debug=False)¶: Generate a pileup table row from processed read data and calculate some additional stats and annotation.

amplimap.pileup.record_read_in_group(read_calls, my_call, my_phred, my_umi, read_name)¶: Add call data from read to read_calls dictionary of name -> call.

Others¶

exception amplimap.naive_mapper.AmplimapNoAlignment¶

amplimap.naive_mapper.align_and_find_cigar(read, ref, debug=False, reverse=False) → tuple¶: Align read and reference sequence using parwise global alignment and return start offset and CIGAR ops.

amplimap.naive_mapper.create_bam(sample, files_in1, files_in2, ref_fasta, probes_dict, output, debug=False)¶

Create a BAM file with reads placed at their expected locations, adjusted through pairwise alignment to the target sequences.

This will give reasonable results as long as probes capture the exact target sequences, but will generate alignments with many mismatches if there are any discrepancies.

amplimap.naive_mapper.find_cigar_for_alignment(read_len, alignment, debug) → tuple¶: Generate tuple of start offset and CIGAR operations for given alignment.

amplimap.run.check_config_keys(default_config, my_config, path=[])¶: Recursively check that config keys provided in my_config also exist in default_config (ignoring ‘paths’ and ‘clusters’).

amplimap.run.compare_config_dicts(my_config, used_config, path=[])¶: Recursively search for differences in values between two dicts.

amplimap.run.main(argv=None)¶: Run amplimap.