S.No. | Ref. Seq. (%) | No. of unaligned reads | Code-length (KB) | Length of new Seq. |
---|
1 | 75 | 172 | 25.91 | 1755 |
2 | 85 | 148 | 25.10 | 1989 |
3 | 95 | 123 | 24.20 | 2223 |
4
|
100
|
109
|
23.62
|
2341
|
5 | 105 | 108 | 24.22 | 2458 |
6 | 115 | 107 | 25.50 | 2692 |
7 | 125 | 106 | 26.78 | 2926 |
- The set of reads, 390 in total, were derived from ‘Influenza A virus (A Puerto Rico 834 (H1N1)) segment 1, complete sequence’ using the ART read simulator for NGS with read length 30, standard deviation 10, and mean fragment length of 100,[79]. Similarly the reference sequences were also derived from the same H1N1 virus. Ref. Seq. 75% used in S.No. 1, has a length which is 75% of the actual genome. Similarly Ref. Seq. 125% has a quarter of the actual genome concatenated with the complete H1N1 genome making the total length 125% of H1N1. All other genomes were derived in a similar way. The code-length is calculated using Equation (3). The results show that the MDL proposed scheme chooses the correct reference sequence, Ref. Seq. 100%, (shown underlined) even when all the contending sequences are closely related to one another in terms of their genome and length.