TRII score distributions using as a reference set for the weight matrix. (a) The annAUGs of the set of 1,649 gold-set cDNAs with 5′UTR ≥ 200 (red) have a similar TRII score distribution to the set of 8,071 predicted mRNAs in Release 5.9 with 5′UTR ≥ 200 (green). Both of these are similar to the distribution for 0-upAUG cDNAs (; blue), validating as a control test distribution. (b) The set (blue) and the subset of 300 gold-set 0-upAUG cDNAs (red) have similar score distributions. However, the set of 1,675 nongold-set cDNAs with ≥1 upAUG (green) has a higher fraction of low-scoring cDNAs than the 1,349 gold-set cDNAs with ≥1 upAUG (purple) (, chi-square goodness of fit). Given that nongold cDNAs represent mRNAs not in the predicted transcriptome, this suggests that that algorithms used to predict the Drosophila transcriptome were conservative and failed to predict significant numbers of experimentally observed transcripts including mRNAs with upAUGs and low-scoring annAUGs. (c) The conclusion in (b) is supported by analysis of subsets of nongold cDNAs (≥1 upAUG) that were aligned with genomic DNA using splice site-scanning algorithms [3, 4], either allowing single-nucleotide polymorphisms (992 cDNAs; red) or not (204 cDNAs; green). The distributions for both subsets and the full set (green curve in (b)) are similar. Note that the cDNAs in both subsets all have a stop codon upstream and in-frame with the annAUG. Moreover, premature termination by reverse transcriptase may apply to only a small fraction of these cDNAs: for 13 of the 204 cDNAs (green curve), the 5′ end of the cDNA matches an internal segment of a Release 5.9 predicted transcript, and the cDNA sequence lies downstream of the predicted transcript's start codon.