Error models¶

Pre-built models¶

Available models¶

Model name	Read length
MiSeq*	300 bp
HiSeq	125 bp
NextSeq	300 bp
NovaSeq	150 bp

Qualities¶

The prebuilt models were built with the following commands:

megahit -1 reads_R1.fastq -2 reads_R2.fastq -o asm
bowtie2-build asm/final.contigs.fa miseq_asm/final.contigs
bowtie2 -x asm/final.contigs -1 reads_R1.fastq \
    -2 reads_R2.fastq | samtools view -bS | samtools sort -o mapping.bam
samtools index mapping.bam
iss model -b mapping.bam -o name_of_model

The MiSeq model was built from an ocean metagenomics project in Kenya (sample accession number ERR1912174).

The HiSeq and NovaSeq models were built from human run obtained via basespace.

Creating an Error Model¶

If you do not wish to use the pre-computed error models provided with InSilicoSeq, it is possible to create your own.

InSilicoSeq creates error models from .bam files. The input bam file should be a set of reads aligned against a reference genome or metagenome.

Given you have two read files, reads_R1.fastq.gz and`reads_R2.fastq.gz`, and a reference metagenome ref.fasta:

Align you reads against the reference¶

bowtie2-build ref.fasta ref
bowtie2 -x ref -1 reads_R1.fastq.gz \
    -2 reads_R2.fastq.gz | samtools view -bS | samtools sort -o ref.bam
samtools index ref.bam

Build the model¶

iss model -b ref.bam -o my_model

which will create a my_model.npz file containing your newly built model

Full list of options¶

–bam¶

aligned reads from which the model will be inferred (Required)

–model¶

Error model to build. If not specified, using kernel density estimation (default: kde). Can be ‘kde’ or ‘cdf’

–output¶

Output file path and prefix (Required)

–quiet¶

Disable info logging

–debug¶

Enable debug logging