Skip to main content

Table 1 Counting number of reads not enough

From: Optimal reference sequence selection for genome assembly using minimum description length principle

S.No.

Reference sequence

Number of bases in genomes

Number of reads found

1

Fibrobacter succinogenes subsp. succinogenes S85 (NC_013410.1)

3842635

157

2

Human Chromosome 21 (AC_000044.1)

32992206

158

  1. The table shows that choosing the reference sequence which has the highest number of reads present is not a sufficient condition. Just by looking at the “Data given the model” ≡“Number of reads found” one ends up choosing Human Chromosome 21. However, looking at the fact that Chromosome 21 is about 9×larger than S85 one realizes that actually S85 is the model of choice. Furthermore, S85 is a bacterial genome whereas Chromosome 21 comes from a eukaryote genome. PAb1 is also a bacteria, therefore, S85 is most definitely the model of choice.