Skip to main content

Table 7 Simulations with Influenza virus A, B, and C

From: Optimal reference sequence selection for genome assembly using minimum description length principle

S.No.

Ref. Seq. (Influenza virus)

No. of inversions

No. of deletions

Code-length using proposed scheme (Kb)

1

A, H1N1 (NC_002023.1)

0 / 4

1

254.109

2

A, H5N1 (NC_007357.1)

0 / 4

1

254.109

3

A, H2N2 (NC_007378.1)

0 / 4

1

254.109

4

A, H3N2 (NC_007373.1)

0 / 4

1

254.109

5

A, H9N2 (NC_004910.1)

0 / 4

1

254.109

6

B (NC_002204.1)

4 / 4

1

68.62

7

C (NC_006307.1)

0 / 4

1

254.027

  1. One of the sequences from Influenza virus {A, B, C} was randomly selected and modified to include {SNPs =7, inversions =4, deletions =1, insertions =3}. As Influenza virus A has five different strains while both Influenza viruses B and C each have one the MDL process was used to compare the seven sequences to determine which is the best reference sequence. Ref. Seq. 6, Influenza virus B was found to have the smallest code-length (68.62Kb), and is therefore, the model of choice. The experiment also shows that given the optimal reference sequence, in this case Influenza virus B, the MDL process rectifies all inversions (4/4). However, given non-optimal reference sequences, the proposed MDL process is not able to rectify the inversions (0/4). So the proposed algorithm chooses the optimal reference sequence, and given the optimal reference sequence if not all, at least most of the inversions are also corrected.