iss.error_models package

Submodules

iss.error_models.basic module

class iss.error_models.basic.BasicErrorModel[source]

Bases: iss.error_models.ErrorModel

Basic Error Model class

Basic error model. The phred scores are based on a normal distribution. Only substitutions errors occur. The substitution rate is assumed equal between all nucleotides.

gen_phred_scores(mean_quality, orientation)[source]

Generate a normal distribution, transform to phred scores

Generate a list of phred score according to a normal distribution centered around the ErrorModel quality

Parameters:mean_quality (int) – mean phred score
Returns:list of phred scores following a normal distribution
Return type:list
random_insert_size()[source]

Fake random function returning the default insert size of the basic arror model

Returns:insert size
Return type:int

iss.error_models.cdf module

iss.error_models.kde module

class iss.error_models.kde.KDErrorModel(npz_path)[source]

Bases: iss.error_models.ErrorModel

KDErrorModel class.

Error model based on an .npz files derived from read alignments. the npz file must contain:

  • the length of the reads
  • the mean insert size
  • the size of mean sequence quality bins (for R1 and R2)
  • a cumulative distribution function of quality scores for each position
    (for R1 and R2)
  • the substitution for each nucleotide at each position (for R1 and R2)
  • the insertion and deletion rates for each position (for R1 and R2)
gen_phred_scores(cdfs, orientation)[source]

Generate a list of phred scores based on cdfs and mean bins

For each position, draw a phred score from the cdf and append to the phred score list

Parameters:
  • cdfs (ndarray) – array containing the cdfs
  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’
Returns:

a list of phred scores

Return type:

list

random_insert_size()[source]

Draw a random insert size from the insert size cdf

Parameters:i_size_cdf – cumulative distribution function of the insert size
Returns:an insert size
Return type:int

Module contents

class iss.error_models.ErrorModel[source]

Bases: object

Main ErrorModel Class

This class is used to create inheriting classes and contains all the functions that are shared by all ErrorModel classes

adjust_seq_length(mut_seq, orientation, full_sequence, bounds)[source]

Truncate or Extend reads to make them fit the read length

When insertions or deletions are introduced to the reads, their length will change. This function takes a (mutable) read and a reference sequence, and extend or truncate the read if it has had an insertion or a deletion

Parameters:
  • mut_seq (MutableSeq) – a mutable sequence
  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’
  • full_sequence (Seq) – the reference sequence from which mut_seq comes from
  • bounds (tuple) – the position of the read in the full_sequence
Returns:

a sequence fitting the ErrorModel

Return type:

Seq

introduce_error_scores(record, orientation)[source]

Add phred scores to a SeqRecord according to the error_model

Parameters:
  • record (SeqRecord) – a read record
  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’
Returns:

a read record with error scores

Return type:

SeqRecord

introduce_indels(record, orientation, full_seq, bounds)[source]

Introduce insertions or deletions in a sequence

Introduce insertion and deletion errors according to the probabilities present in the indel choices list

Parameters:
  • record (SeqRecord) – a sequence record
  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’
  • full_seq (Seq) – the reference sequence from which mut_seq comes from
  • bounds (tuple) – the position of the read in the full_sequence
Returns:

a sequence with (eventually) indels

Return type:

Seq

load_npz(npz_path, model)[source]

load the error profile .npz file

Parameters:
  • npz_path (string) – path to the npz file
  • model (string) – type of model. Could be ‘cdf’ or ‘kde’. ‘cdf’ has been deprecated and is no longer available
Returns:

numpy object containg variables necessary

for error model construction

Return type:

ndarray

logger
mut_sequence(record, orientation)[source]

Introduce substitution errors to a sequence

If a random probability is higher than the probability of the basecall being correct, introduce a substitution error

Parameters:
  • record (SeqRecord) – a read record with error scores
  • orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’
Returns:

a sequence

Return type:

Seq