iss.error_models package

Submodules

iss.error_models.basic module

class iss.error_models.basic.BasicErrorModel(fragment_length=None, fragment_sd=None, store_mutations=False)[source]

Bases: ErrorModel

Basic Error Model class

Basic error model. The phred scores are based on a normal distribution. Only substitutions errors occur. The substitution rate is assumed equal between all nucleotides.

gen_phred_scores(mean_quality, orientation)[source]

Generate a normal distribution, transform to phred scores

Generate a list of phred score according to a normal distribution centered around the ErrorModel quality

Parameters:

mean_quality (int) – mean phred score

Returns:

list of phred scores following a normal distribution

Return type:

list

random_insert_size()[source]

Fake random function returning the default insert size of the basic arror model

Returns:

insert size

Return type:

int

iss.error_models.cdf module

iss.error_models.kde module

class iss.error_models.kde.KDErrorModel(npz_path, fragment_length=None, fragment_sd=None, store_mutations=False)[source]

Bases: ErrorModel

KDErrorModel class.

Error model based on an .npz files derived from read alignments. the npz file must contain:

  • the length of the reads

  • the mean insert size

  • the size of mean sequence quality bins (for R1 and R2)

  • a cumulative distribution function of quality scores for each position

    (for R1 and R2)

  • the substitution for each nucleotide at each position (for R1 and R2)

  • the insertion and deletion rates for each position (for R1 and R2)

gen_phred_scores(cdfs, orientation)[source]

Generate a list of phred scores based on cdfs and mean bins

For each position, draw a phred score from the cdf and append to the phred score list

Parameters:
  • cdfs (ndarray) – array containing the cdfs

  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’

Returns:

a list of phred scores

Return type:

list

random_insert_size()[source]

Draw a random insert size from the insert size cdf

Parameters:

i_size_cdf – cumulative distribution function of the insert size

Returns:

an insert size

Return type:

int

Module contents

class iss.error_models.ErrorModel[source]

Bases: object

Main ErrorModel Class

This class is used to create inheriting classes and contains all the functions that are shared by all ErrorModel classes

adjust_seq_length(mut_seq, orientation, full_sequence, bounds)[source]

Truncate or Extend reads to make them fit the read length

When insertions or deletions are introduced to the reads, their length will change. This function takes a (mutable) read and a reference sequence, and extend or truncate the read if it has had an insertion or a deletion

Parameters:
  • mut_seq (MutableSeq) – a mutable sequence

  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’

  • full_sequence (Seq) – the reference sequence from which mut_seq comes from

  • bounds (tuple) – the position of the read in the full_sequence

Returns:

a sequence fitting the ErrorModel

Return type:

Seq

introduce_error_scores(record, orientation)[source]

Add phred scores to a SeqRecord according to the error_model

Parameters:
  • record (SeqRecord) – a read record

  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’

Returns:

a read record with error scores

Return type:

SeqRecord

introduce_indels(record, orientation, full_seq, bounds)[source]

Introduce insertions or deletions in a sequence

Introduce insertion and deletion errors according to the probabilities present in the indel choices list

Parameters:
  • record (SeqRecord) – a sequence record

  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’

  • full_seq (Seq) – the reference sequence from which mut_seq comes from

  • bounds (tuple) – the position of the read in the full_sequence

Returns:

a sequence record with indel errors

Return type:

SeqRecord

load_npz(npz_path, model)[source]

load the error profile .npz file

Parameters:
  • npz_path (string) – path to the npz file

  • model (string) – type of model. Could be ‘cdf’ or ‘kde’. ‘cdf’ has been deprecated and is no longer available

Returns:

numpy object containg variables necessary

for error model construction

Return type:

ndarray

property logger
mut_sequence(record, orientation)[source]

Introduce substitution errors to a sequence

If a random probability is higher than the probability of the basecall being correct, introduce a substitution error

Parameters:
  • record (SeqRecord) – a read record with error scores

  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’

Returns:

the read record with substitution errors

Return type:

SeqRecord