From: Optimal reference sequence selection for genome assembly using minimum description length principle
Reads that do not align to the reference sequence | Data given the hypothesis (Bits) | Code-length (Bits) | |||||
---|---|---|---|---|---|---|---|
Model given by the Data | Code-length | ||||||
S.No. | Ref. Seq. | Regret | Proposed scheme | (Bits) | |||
1 | ATAT CGGGG CTATA | 1111011110-1-1-1-1 | CCAA | 12 | 0 | ATATCGGGGCATAT>1111 0 1111 0 -1-1-1-1>CCAA | 102 |
2 | ATGGGCCCTTATTGC | 000000000000000 | ATAT>GGGG>CCAA | 42 | 30 | ATGGGCCCTTATTGC> 000000000000000 >ATAT>GGGG >CCAA | 138 |
3 | GGGGCCCCGGGG | 1111-1-1-1-11111 | ATAT>CCAA | 27 | 15 | GGGGCCCCGGGG>1111-1-1-1-11111>ATAT>CCAA | 105 |