Skip to main content

Table 8 The experiment uses the proposed MDL scheme on the same set of reads but different set of reference sequences

From: Optimal reference sequence selection for genome assembly using minimum description length principle

S.No.

Ref. Seq. (%)

No. of unaligned reads

Code-length (KB)

Execution time (s)

Length of new Seq.

1

1

696

128.60

0.046

14

2

2

696

128.73

0.031

47

3

5

693

128.575

0.046

113

4

10

684

127.576

0.046

229

5

25

668

126.615

0.093

565

6

50

650

126.615

0.109

650

7

100

3

14.276

0.078

2342

8

150

2

21.164

0.062

2341

9

200

2

27.808

0.124

2341

10

300

2

41.525

0.140

2341

  1. The set of reads contained 3817 reads all of which were derived from ‘Influenza A virus (A Puerto Rico 834 (H1N1)) segment 1, complete sequence’. Out of 3817 reads the method extracted 696 unique reads which were then used in the MDL proposed scheme. All the reference sequences were derived from the same Influenza A (H1N1) virus. Ref. Seq. 1% used in S.No. 1, has a length which is 1% of the actual genome. Similarly Ref. Seq. 25% has a length which is a quarter of the length of the actual genome. All other genomes were derived in a similar way. For, e.g., Ref. Seq. 200% has two H1N1 viruses concatenated together making the length twice that of the original H1N1 sequence. The code-length is calculated using Equation (3). The results show that the MDL proposed scheme chooses the best reference sequence, one which has the smallest code-length as determined by Equation (3). The MDL scheme does not choose smaller reference sequences with more unaligned reads rather than choosing larger reference sequence with smaller unaligned reads. The experiment also proves the correctness of the optimal reference sequence as it chooses Ref. Seq. 7, (shown underlined), since it has the smallest code-length, as the optimal reference sequence. It was Ref. Seq. 7 from which all the reads were derived from. Since the MDL scheme chooses Ref. Seq. 7 as the optimal sequence, the experiment also proves the correctness of the reference sequence chosen.