iss.error_models package¶

Submodules¶

iss.error_models.basic module¶

class iss.error_models.basic.BasicErrorModel(fragment_length=None, fragment_sd=None, store_mutations=False)[source]¶

Bases: ErrorModel

Basic Error Model class

Basic error model. The phred scores are based on a normal distribution. Only substitutions errors occur. The substitution rate is assumed equal between all nucleotides.

gen_phred_scores(mean_quality, orientation)[source]¶

Generate a normal distribution, transform to phred scores

Generate a list of phred score according to a normal distribution centered around the ErrorModel quality

Parameters:: mean_quality (int) – mean phred score
Returns:: list of phred scores following a normal distribution
Return type:: list

random_insert_size()[source]¶

Fake random function returning the default insert size of the basic arror model

Returns:: insert size
Return type:: int

iss.error_models.cdf module¶

iss.error_models.kde module¶

class iss.error_models.kde.KDErrorModel(npz_path, fragment_length=None, fragment_sd=None, store_mutations=False)[source]¶

Bases: ErrorModel

KDErrorModel class.

Error model based on an .npz files derived from read alignments. the npz file must contain:

the length of the reads
the mean insert size
the size of mean sequence quality bins (for R1 and R2)
a cumulative distribution function of quality scores for each position
(for R1 and R2)
the substitution for each nucleotide at each position (for R1 and R2)
the insertion and deletion rates for each position (for R1 and R2)

gen_phred_scores(cdfs, orientation)[source]¶

Generate a list of phred scores based on cdfs and mean bins

For each position, draw a phred score from the cdf and append to the phred score list

Parameters:

cdfs (ndarray) – array containing the cdfs
orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’

Returns:

a list of phred scores

Return type:

list

random_insert_size()[source]¶

Draw a random insert size from the insert size cdf

Parameters:: i_size_cdf – cumulative distribution function of the insert size
Returns:: an insert size
Return type:: int

Module contents¶

class iss.error_models.ErrorModel[source]¶

Bases: object

Main ErrorModel Class

This class is used to create inheriting classes and contains all the functions that are shared by all ErrorModel classes

adjust_seq_length(mut_seq, orientation, full_sequence, bounds)[source]¶

Truncate or Extend reads to make them fit the read length

When insertions or deletions are introduced to the reads, their length will change. This function takes a (mutable) read and a reference sequence, and extend or truncate the read if it has had an insertion or a deletion

Parameters:

mut_seq (MutableSeq) – a mutable sequence
orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’
full_sequence (Seq) – the reference sequence from which mut_seq comes from
bounds (tuple) – the position of the read in the full_sequence

Returns:

a sequence fitting the ErrorModel

Return type:

Seq

introduce_error_scores(record, orientation)[source]¶

Add phred scores to a SeqRecord according to the error_model

Parameters:

record (SeqRecord) – a read record
orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’

Returns:

a read record with error scores

Return type:

SeqRecord

introduce_indels(record, orientation, full_seq, bounds)[source]¶

Introduce insertions or deletions in a sequence

Introduce insertion and deletion errors according to the probabilities present in the indel choices list

Parameters:

record (SeqRecord) – a sequence record
orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’
full_seq (Seq) – the reference sequence from which mut_seq comes from
bounds (tuple) – the position of the read in the full_sequence

Returns:

a sequence record with indel errors

Return type:

SeqRecord

load_npz(npz_path, model)[source]¶

load the error profile .npz file

Parameters:

npz_path (string) – path to the npz file
model (string) – type of model. Could be ‘cdf’ or ‘kde’. ‘cdf’ has been deprecated and is no longer available

Returns:

numpy object containg variables necessary: for error model construction

Return type:

ndarray

property logger¶

mut_sequence(record, orientation)[source]¶

Introduce substitution errors to a sequence

If a random probability is higher than the probability of the basecall being correct, introduce a substitution error

Parameters:

record (SeqRecord) – a read record with error scores
orientation (string) – orientation of the read. Can be ‘forward’ or ‘reverse’

Returns:

the read record with substitution errors

Return type:

SeqRecord

iss.error_models package¶

Submodules¶

iss.error_models.basic module¶

iss.error_models.cdf module¶

iss.error_models.kde module¶

Module contents¶

InSilicoSeq

Navigation

Related Topics