S.No. | Ref. Seq. (%) | No. of unaligned reads | Code-length (KB) | Execution time (s) | Length of new Seq. |
---|
1 | 1 | 696 | 128.60 | 0.046 | 14 |
2 | 2 | 696 | 128.73 | 0.031 | 47 |
3 | 5 | 693 | 128.575 | 0.046 | 113 |
4 | 10 | 684 | 127.576 | 0.046 | 229 |
5 | 25 | 668 | 126.615 | 0.093 | 565 |
6 | 50 | 650 | 126.615 | 0.109 | 650 |
7
|
100
|
3
|
14.276
|
0.078
|
2342
|
8 | 150 | 2 | 21.164 | 0.062 | 2341 |
9 | 200 | 2 | 27.808 | 0.124 | 2341 |
10 | 300 | 2 | 41.525 | 0.140 | 2341 |
- The set of reads contained 3817 reads all of which were derived from ‘Influenza A virus (A Puerto Rico 834 (H1N1)) segment 1, complete sequence’. Out of 3817 reads the method extracted 696 unique reads which were then used in the MDL proposed scheme. All the reference sequences were derived from the same Influenza A (H1N1) virus. Ref. Seq. 1% used in S.No. 1, has a length which is 1% of the actual genome. Similarly Ref. Seq. 25% has a length which is a quarter of the length of the actual genome. All other genomes were derived in a similar way. For, e.g., Ref. Seq. 200% has two H1N1 viruses concatenated together making the length twice that of the original H1N1 sequence. The code-length is calculated using Equation (3). The results show that the MDL proposed scheme chooses the best reference sequence, one which has the smallest code-length as determined by Equation (3). The MDL scheme does not choose smaller reference sequences with more unaligned reads rather than choosing larger reference sequence with smaller unaligned reads. The experiment also proves the correctness of the optimal reference sequence as it chooses Ref. Seq. 7, (shown underlined), since it has the smallest code-length, as the optimal reference sequence. It was Ref. Seq. 7 from which all the reads were derived from. Since the MDL scheme chooses Ref. Seq. 7 as the optimal sequence, the experiment also proves the correctness of the reference sequence chosen.