- Research
- Open Access
Map-invariant spectral analysis for the identification of DNA periodicities
- Ahmad Rushdi^{1}Email author,
- Jamal Tuqan^{2} and
- Thomas Strohmer^{3}
https://doi.org/10.1186/1687-4153-2012-16
© Rushdi et al.; licensee Springer. 2012
- Received: 31 May 2012
- Accepted: 6 September 2012
- Published: 15 October 2012
Abstract
Many signal processing based methods for finding hidden periodicities in DNA sequences have primarily focused on assigning numerical values to the symbolic DNA sequence and then applying spectral analysis tools such as the short-time discrete Fourier transform (ST-DFT) to locate these repeats. The key results pertaining to this approach are however obtained using a very specific symbolic to numerical map, namely the so-called Voss representation. An important research problem is to therefore quantify the sensitivity of these results to the choice of the symbolic to numerical map. In this article, a novel algebraic approach to the periodicity detection problem is presented and provides a natural framework for studying the role of the symbolic to numerical map in finding these repeats. More specifically, we derive a new matrix-based expression of the DNA spectrum that comprises most of the widely used mappings in the literature as special cases, shows that the DNA spectrum is in fact invariable under all these mappings, and generates a necessary and sufficient condition for the invariance of the DNA spectrum to the symbolic to numerical map. Furthermore, the new algebraic framework decomposes the periodicity detection problem into several fundamental building blocks that are totally independent of each other. Sophisticated digital filters and/or alternate fast data transforms such as the discrete cosine and sine transforms can therefore be always incorporated in the periodicity detection scheme regardless of the choice of the symbolic to numerical map. Although the newly proposed framework is matrix based, identification of these periodicities can be achieved at a low computational cost.
Keywords
- Discrete Cosine Transform
- Discrete Fourier Transform
- Affine Transformation
- Finite Impulse Response Filter
- Rectangular Window
1 Introduction
This model provides closed form expressions for the DNA spectrum that generalize and unify some of the already existing results in the literature were obtained. One of these expressions in particular clearly shows that the identification of the period-3 component in the DNA spectrum, a signal processing problem, is equivalent to the detection of the nucleotide distribution disparity in the codon structure of a DNA sequence, a genomic problem. The disparity in the nucleotide distribution within the codon structure of a DNA sequence is termed the codon bias. Using this model, the DNA spectrum is completely characterized by a set of digital sequences, termed the filtered polyphase sequences. By processing these sequences, signal processing techniques can potentially have an impact on understanding and detecting biological structures of this nature. From a computational cost perspective, the computation of the DNA spectrum using this model does not require any complex valued operations[26]. This finding is rather surprising given the existence of complex multipliers in the proposed DSP model as clearly illustrated in Figure 3. It is shown that the direct computation of the DNA spectrum using (2) requires essentially double the amount of arithmetic operations compared to the DSP model approach.
It is important, however, to keep in mind that the above conclusions and results were obtained using the Voss symbolic to numerical transformation. A fundamental research issue is to therefore determine the sensitivity of the signal processing based method to the choice of the symbolic to numerical map. In particular, the core question here is: how dependent are the above results on the Voss representation? Are these results invariant with respect to the other popular maps in the literature? Can we derive necessary and/or sufficient conditions for the invariance of the DNA spectrum to the symbolic to numerical transformation? Is there a general mathematical framework that can help us generate new symbolic to numerical maps for which the DNA spectrum remains essentially the same? These are the type of questions we address in this article and provide answers to. One approach to answer this question was presented in[27], where a novel framework for the analysis of the equivalence of the mappings used for numerical representation of symbolic data based on signal correlation was presented, along with strong and weak equivalence properties. In[28], we attempted to answer the same question starting at the aforementioned DSP model for a limited set of mappings. Our main goal in this study is to de-embed the symbolic to numerical mapping process from the DNA spectrum computation process. We answer a set of other relevant questions along the way.
- 1.
comprises most of the existing mappings in the literature as special cases,
- 2.
shows that the DNA spectrum is in fact invariable under all these mappings,
- 3.
generates a necessary condition for the invariance of the DNA spectrum to the symbolic to numerical mapping used to compute it.
Furthermore, the new algebraic framework presented here decomposes the frequency identification problem into several fundamental components that are totally independent of each other. It follows that sophisticated digital filters and/or alternative transformations to the DFT such as the discrete cosine, sine, and Hartley transforms can always be easily incorporated in the harmonics detection scheme irrespective of the choice of the symbolic to numerical map. Finally, although the newly proposed framework is matrix based, we show that similar to the DSP model approach, the computation of the DNA spectrum using this new framework is very efficient.
The article is organized as follows. In Section 2, we derive a new matrix based framework to efficiently compute the ST-DFT-based spectrum. New expressions for the ST-DFT${X}_{l}(\mathit{Rn},\frac{M}{R})$ and its magnitude squared$|{X}_{l}(\mathit{Rn},\frac{M}{R}){|}^{2}$ are obtained and indicate that these quantities are completely parameterized by some pre-defined matrices. The numerical values of these matrices simply depend on our choice of filtering (e.g., rectangular window versus non-rectangular one versus general FIR filters) as well as our choice of data transform (e.g., the DFT versus the DCT versus the DST).
Using these results, in Section 3, a new expression of the DNA power spectrum is derived and is also completely defined by these matrices. The elegance of this matrix based approach is that it allows the incorporation of general symbolic to numerical maps into the newly derived DNA spectrum expression provided these generic maps can be expressed as affine transformations of the Voss representation. This last assumption is motivated by the fact that all the popular maps that are available in the literature satisfy the affine condition. Furthermore, the maps are now completely characterized by the affine transformation (two matrices A and b) and can be therefore changed without affecting the remaining matrices in the DNA spectrum expression. In conclusion, the newly derived DNA spectrum expression is stated as a function of a number of matrices. Each of these matrices captures an essential component of the process (filtering, data transform, symbolic to numerical map) and the elements of each matrix can be changed without affecting the other matrices.
In Section 4 and using the above results, we show that the Voss-based DNA spectrum is essentially invariant under some of the most popular maps in the literature. A necessary and sufficient condition for the invariance of the DNA spectrum under any map is also derived.
Summary of the article notations
| {A,C,G,T}, the field of DNA nucleotides |
| {0,1}, the field of Voss binary elements |
| A general field of complex valued elements |
$\mathbb{F}\mapsto \mathbb{D}$ | Field mapping operation from set to set , resulting in γ sequences x_{ l }(n), where l = 1,…,γ. For example, when$\mathbb{D}=\mathbb{V}$,$\mathbb{F}\mapsto \mathbb{D}$ results in γ = 4 binary sequences, namely: x_{ A }(n),x_{ C }(n),x_{ G }(n), and x_{ T }(n) |
x_{ l }(n) | A discrete time sequence of length N whose elements belong to the mapped field |
x_{ l }(n) | The n^{ th } window of length M, extracted from x_{ l }(n), l = 1,…,γ |
${\widehat{\mathbf{x}}}_{l}(n)$ | The interleaved version of x(n) with an interleaving factor R, l = 1,…,γ |
${X}_{l}(\mathit{Rn},\frac{M}{R})$ | The ST-DFT of x_{ l }(n), generated using a sliding window of length M and a window shift of length R |
ϒ_{ v }(n) | [X_{ A }(n) X_{ C }(n) X_{ G }(n) X_{ T }(n)]^{ T }, the array of the four -based ST-DFTs |
ϒ_{ d }(n) | [X_{1}(n) X_{2}(n) … X_{ γ }(n)]^{ T }, the array of the γ -based ST-DFTs |
X_{ lr }(n) | The r^{ th } filtered polyphase component of X_{ l }(n), where r = 0,1,…,R − 1 and l = 1,…,γ |
S_{ v }(n) | The DNA spectrum computed by adding the magnitude squared of the ST-DFT of the four -based sequences |
S_{ d }(n) | The DNA spectrum computed by adding the magnitude squared of the ST-DFT of the γ -based sequences |
Γ_{ l }(n) | [X_{l 0}(n) X_{l 1}(n) … X_{l,R−1}(n)]^{ T }, the array of the R filtered polyphase components X_{ lr }(n), r = 0,1,…,R − 1 and l = 1,…,γ |
I _{ γ } | An identity matrix of size γ × γ |
C | An array of length R whose elements are equally spaced on the unit circle |
h | An array of length M/R whose elements are all equal to one |
D | C^{⋆}C^{ T }, an R × R matrix |
H | I_{ R } ⊗ h^{ T }, an R × R block matrix of$\frac{M}{R}\times 1$ blocks |
W | H^{ H }D H, an R × R block matrix of$\frac{M}{R}\times \frac{M}{R}$ blocks |
A,b | The affine transformation matrices of size γ × 4 and γ × 1, respectively, that map the four -based sequences into the γ -based sequences. |
B | A^{ H }A, a 4 × 4 matrix |
$\stackrel{~}{\mathbf{C}}$ | A complex valued array of R elements |
$\stackrel{~}{\mathbf{h}}$ | A complex valued array of M/R elements |
$\stackrel{~}{\mathbf{D}}$ | ${\stackrel{~}{\mathbf{C}}}^{\star}{\stackrel{~}{\mathbf{C}}}^{T}$, an R × R matrix |
$\stackrel{~}{\mathbf{h}}$ | ${\mathbf{I}}_{R}\otimes {\stackrel{~}{\mathbf{h}}}^{T}R\times R$ block matrix of$\frac{M}{R}\times 1$ blocks |
$\stackrel{~}{\mathbf{W}}$ | ${\stackrel{~}{\mathbf{H}}}^{H}\stackrel{~}{\mathbf{D}}\stackrel{~}{\mathbf{H}}$, an R × R block matrix of$\frac{M}{R}\times \frac{M}{R}$ blocks |
2 A new algebraic framework for computing the ST-DFT
Notation of matrix operations
{·}^{∗} | Matrix complex conjugate |
{·}^{ T } | Matrix transpose |
{·}^{ H } | Matrix hermitian |
{⊗} | Kronecker product of two matrices |
{.} | Vector of columns of a matrix |
2.1 Matrix formulation of the ST-DFT
which represents M/R repetitions of the elements in C. Similar to C, the sum of elements in C^{ T }H is equal to 0.
2.2 A matrix based expression for the magnitude squared of the ST-DFT
Matrix W can be represented as a Kronecker product of D and an$\frac{M}{R}\times \frac{M}{R}$ all-one matrix. Note that any row or column in W is a rotated version of C^{ T }H, therefore, the sum of the elements of any row or column in W is equal to 0.
3 The new DNA spectrum expression
For simplicity, we denote$S(\mathit{Rn},k){|}_{k=\frac{M}{R}}$ as S(n) in the following sections. Several mappings were introduced in the literature using both real and complex numerical values with typical number of sequences γ = 1 up to 4 to maintain reasonable computation complexity. In this section, we use the results of Section 2 to derive general expressions for the M/R ST-DFT and spectrum for any symbolic to numeric mapping.
3.1 The Voss-based DNA spectrum
In (26), I_{4} and W are constant matrices ∀n. Hence the computation of the spectrum S_{ v }(n) for different windows of a DNA sequence needs only the evaluation of the Voss interleaved array${\widehat{\mathbf{x}}}_{v}(n)$.
3.2 Computing the DNA spectrum under general symbolic to numerical maps
where B ≐ A^{ H }A. Equation (35) indicates that when a certain symbolic to numeric mapping$\mathbb{F}\mapsto \mathbb{D}$ is used, the DNA power spectrum S_{ d }(n) is completely defined in terms of the Voss-based interleaved array${\widehat{\mathbf{x}}}_{v}(n)$ along with constant matrices W and B which is a function of the transformation matrix A ($\mathbb{V}\mapsto \mathbb{D}$). Note that if A = I_{4} then B = I_{4} at which (35) reduces to (26) which is the Voss-based spectrum case.
4 Invariance of the DNA spectrum under popular mappings
The results found in Section 3 can be applied to some mappings that are widely used in the literature. In specific, by defining the corresponding transformation matrices A and B ($\mathbb{V}\mapsto \mathbb{D}$), closed form expressions for S_{ d }(n) are obtained. Furthermore, for a number of mappings, we show that the -mapped spectrum S_{ d }(n) is in fact a scaled version of the Voss-based spectrum S_{ v }(n).
4.1 Four-to-four (γ= 4) representations
Now, we extend this result to certain transformations where numeric values of the scale factors a, a, g, and t are specified.
§ Tetrahedral mapping.
Since B = α I_{4}(α = 2), the tetrahedral-based DNA spectrum is a scaled version of the Voss-based spectrum.
§ Quaternion mapping.
§ Higher order mappings.
An alternative Quaternion transformation is given by A = diag(1 + i + j + k,1 + i − j − k,1 − i − j + k,1 − i + j − k), which results in B = 4I_{ 4 } and consequently S_{ d }(n) = 4S_{ v }(n). In general, for a complex representation system with η dimensions and equal amplitude coefficients: B = η I_{ 4 } and hence the spectrum S_{ d }(n) = η S_{ v }(n).
4.2 Four-to-three (γ= 3) mappings
In order to reduce the DNA spectrum computational cost, several mappings have been proposed with smaller numbers of sequences.
§ -curve mapping.
This ratio is consistent with the result we first derived in[24] for R = 3, but is now shown to be general for any value of R. We are now ready to state an important result.
where A_{γ×4} and b_{γ×1} = [b_{1}b_{2} … b_{ γ }]^{ T } are constant possibly complex valued arrays. Define the 4 × 4 matrix B = A^{ H }A. The DNA spectrum is invariant under this map, i.e., S_{ d }(n) = α S_{ v }(n) if the transformation matrix B can be written as$\mathbf{B}=\alpha {\mathbf{I}}_{4}+{\sum}_{i}{\mathbf{B}}_{i}$, where B_{ i } holds constant rows and/or constant columns ∀ i.
The proof follows by simply observing that if B_{ i } has constant rows and/or constant columns, then${S}_{d}(n){|}_{{\mathbf{B}}_{i}}=0$. We remind the reader at this point that the vector b_{γ×1} has no bearing on the invariance of the DNA spectrum.
§ Simplex mapping.
This ratio is consistent with the result in[31] which was limited to direct DFT and is now shown to be extended to M/R ST-DFT with any value of R.
4.3 Four-to-two (γ= 2) mappings
which obviously is not a scaled version of S_{ v }(n) since B in this case can not be written as$\alpha {\mathbf{I}}_{4}+{\sum}_{i}{\mathbf{B}}_{i}$, where B_{ i } holds constant rows and/or constant columns ∀ i.
4.4 Four-to-one (γ= 1) mappings
Similar to the previous case, S_{ d }(n) is not a scaled version of S_{ v }(n).
Experimental verification
5 Alternative measures of DNA periodicities
Alternative DNA periodicity measures using fast data transforms[33–35], wavelets, and finite impulse response (FIR) digital filters[25, 36] were recently proposed to improve the detection performance of these periodicities. However, each method was obtained separately from the other using seemingly a different approach. In this section, we show that our proposed framework can systematically generate all these results by simply changing a number of matrices. It therefore provides a generic unified framework for generating alternative measures of DNA periodicities. For example, we can re-express the matrices D and W in terms of general digital filters and use these filters to modify (35) in order to generate new spectrum formulas. Furthermore, using symmetry based decompositions of D and W, we simplify (35) into a formula with low computational complexity.
5.1 Modified periodicity measures
Obviously, W is completely defined by the real array h and the generally complex array C. Note that h and C can be viewed as the impulse responses of two FIR filters defined by the z-transforms H(z) and C(z).
5.1.1 Updating the real filter h
FIR window Specifications: relative peak side lobe A_{ 1 }/ A_{ 0 }in dB, approximate width of main lobe Δ ω , equivalent Kaiser window coefficient β , and transition width Δ ω_{ β }.
FIR Window | A_{1}/A_{0} | Δω | β | Δω_{ β } |
---|---|---|---|---|
Rectangular | −13 | 4Π/(M/R + 1) | 0 | 1.81ΠR/M |
Bartlett | −25 | 8ΠR/M | 1.33 | 2.37ΠR/M |
Hanning | −31 | 8ΠR/M | 3.86 | 5.01ΠR/M |
Hamming | −41 | 8ΠR/M | 4.86 | 6.27ΠR/M |
Blackman | −57 | 12ΠR/M | 7.04 | 9.19ΠR/M |
Moreover, all the mathematical relations derived in Section 3 between the -based spectrum and the Voss-based one are all still valid even when h is replaced by$\stackrel{~}{\mathbf{h}}$.
Experimental verification
5.1.2 Updating the complex filter C
Note that, in this case, the elements in array$\stackrel{~}{\mathbf{C}}$ do not necessarily add to zero anymore. Consequently, the sum of elements in any row or any column in$\stackrel{~}{\mathbf{D}}={\stackrel{~}{\mathbf{C}}}^{\star}{\stackrel{~}{\mathbf{C}}}^{T}$ or$\stackrel{~}{\mathbf{W}}={\mathbf{H}}^{H}\stackrel{~}{\mathbf{D}}\mathbf{H}$ is not necessarily zero. We also note that unlike the case of$\stackrel{~}{\mathbf{h}}$, using$\stackrel{~}{\mathbf{C}}$ instead of C keeps the spectrum formulas in (58) correct but does not preserve the mathematical relations between the different -mapped spectra and the Voss-based spectrum.
5.1.3 Joint optimization of$\stackrel{~}{\mathbf{h}}$ and$\stackrel{~}{\mathbf{C}}$
It should be clear at this point that better DNA harmonics detection performance can be potentially achieved through a joint “optimization” of$\stackrel{~}{\mathbf{h}}$ and$\stackrel{~}{\mathbf{C}}$. For example, a learning paradigm can be used with a least-mean-square (LMS) criterion to find the optimal set,$\stackrel{~}{\mathbf{h}}$ and$\stackrel{~}{\mathbf{C}}$. Alternatively, a biologically induced criterion can yield a substantial boost in performance but it is not clear which criterion to use. This interesting but challenging research topic is however outside the scope of this article and will not be further pursued here.
Example
Parameter settings in Figure 15 to compute the short time Fourier, cosine, sine, and Hartley transforms
Transform | α | a | b | θ _{ r } |
---|---|---|---|---|
ST-DFT | 1 | 1 | 0 | −2Πr/R |
ST-DCT | −1 | 1/2 | −1/2 | (2r + 1)Π/2R |
ST-DST | −1 | 1/2j | −1/2j | (2r + 1)Π/2R |
ST-DHT | 1 | $\frac{1}{2}(1-j)$ | $-\frac{1}{2}(1-j)$ | 2Πr/R |
Modified arrays $\stackrel{\mathbf{~}}{\mathbf{h}}$ and $\stackrel{\mathbf{~}}{\mathbf{C}}$ to compute the short time Fourier-, cosine-, sine-, and Hartley-based DNA spectrum of (60)
ST-DFT | $\stackrel{~}{\mathbf{h}}=\mathbf{h}=\{{(1)}^{i},i=1,2,\dots M/R\}$ |
$\stackrel{~}{\mathbf{C}}=\mathbf{C}=\{{e}^{-j2\Pi r/R},r=1,2,\dots R\}$ | |
ST-DCT | $\stackrel{~}{\mathbf{h}}=\{{(-1)}^{i},i=1,2,\dots M/R\}$ |
$\stackrel{~}{\mathbf{C}}=\{cos((2r+1)\Pi /2R),r=1,2,\dots R\}$ | |
ST-DST | $\stackrel{~}{\mathbf{h}}=\{{(-1)}^{i},i=1,2,\dots M/R\}$ |
$\stackrel{~}{\mathbf{C}}=\{sin((2r+1)\Pi /2R),r=1,2,\dots R\}$ | |
ST-DHT | $\stackrel{~}{\mathbf{h}}=\mathbf{h}=\{{(1)}^{i},i=1,2,\dots M/R\}$ |
$\stackrel{~}{\mathbf{C}}=\{cos(2\Pi r/R)+sin(2\Pi r/R),r=1,2,\dots R\}$ |
Note that similar to the Fourier case, the sum of elements in$\stackrel{~}{\mathbf{C}}$ for the cosine and Hartley transforms cases is equal to zero. Therefore, under these two cases, the relations between different -based DNA spectra and the -based DNA spectrum are still the same as given in Section 3.
5.2 A real approach for the spectrum computation
which provides a completely real approach for the computation of the -mapped spectrum S_{ d }(n). Note that all results and different spectra relations in Section 3 still hold when W_{ s } replaces W as in (65).
Computational complexity comparison
Real multiplications and additions needed for the evaluation of (63) and (16)
$\widehat{\mathit{x}}\mathbf{\left(}\mathit{n}\mathbf{\right)}\mathbf{,}\mathit{W}$ | Real multiplications | Real additions |
---|---|---|
real,real | M(M + 1) | M^{2} − 1 |
real,complex | 2M(M + 1) | 2(M^{2} − 1) |
complex,real | 2M(M + 1) | 2(M^{2} − 1) |
complex,complex | 4M(M + 1) | 2(2M^{2} + M − 1) |
Example
where q = (r + 1) mod 3. The matrix-based DNA spectrum formula in (67) is consistent with the result derived using a different approach in[37].
6 Concluding remarks
In this article, we have introduced a matrix based framework for locating hidden DNA periodicities using spectral analysis techniques that are invariant to the choice of the symbolic to numerical map. The primary advantage of the presented approach over some of the previous study is the decomposition of the spectrum expression into key matrices whose values can be set independently from each other. Each matrix represents one of the essential components involved in the computation of the spectrum such as the symbolic to numerical map, the data transform, and the filtering scheme. The above framework is derived under the assumption that the symbolic to numerical map can be obtained from the Voss representation using an affine transformation. This assumption is however quite loose given that most (if not all) of the proposed maps in the literature satisfy this requisite. Using the new framework, we have then shown that the DNA spectrum expression is invariant under these maps. We have also derived a necessary and sufficient condition for the invariance of the DNA spectrum in terms of the affine transformation matrix A (the b vector in the affine transformation does not affect the DNA spectrum).
This condition can serve as the basis for generating novel symbolic to numerical map that preserve the DNA spectrum expression. Finally, in the latter sections of the article, we have shown the potential of using different filtering schemes, e.g., windows other than the rectangular one as well as alternate fast data transforms, e.g., the DCT, DST, and the Hartley transform. A number of simulation results that verify the findings of this article and a brief quantitative analysis of the computational complexity of the new approach were given in the same sections. Future research study would consider the optimization of the different building blocks, namely the symbolic to numerical map, the data transform, and the filtering scheme. This, in turn, requires a deep understanding of the biological significance of different DNA periodicities in order to set up a meaningful objective function and appropriate constraints. Ultimately, the framework proposed here can be incorporated in a more sophisticated system to study the complex structure of genomic sequences and understand the functionality of its various components. Finally, this efficient framework can be extended to the analysis of other types of symbolic sequences of various limited alphabets, either biological sequences (such as protein sequences) or even non-biological ones.
Declarations
Acknowledgements
TS acknowledges partial support from the NSF via grants DMS 0811169 and DMS-1042939.
Authors’ Affiliations
References
- Benson G: Tandem repeat finder: a program to analyze DNA sequences. Nucleic Acids Res 1999, 27(2):573-580. 10.1093/nar/27.2.573MathSciNetView ArticleGoogle Scholar
- Butler J: Forensic DNA Typing: Biology and Technology behind STR Markers. Academic Press, MA, Burlington; 2003.Google Scholar
- Cummings CA, Relman DA: Microbial forensics: cross-examining pathogens. Science 2002, 296: 1976-1979. 10.1126/science.1073125View ArticleGoogle Scholar
- Ramachandran P, Lu W, Antoniou A: Filter-based methodology for the location of hot spots in proteins and exons in DNA. IEEE Trans. Biomed. Eng 2012, 59(6):1598-1609.View ArticleGoogle Scholar
- Strachan T, Read AP: Human Molecular Genetics. John Wiley and Sons, New York; 1999. http://www.ncbi.nlm.nih.gov/books/NBK7580/Google Scholar
- Smit AF: The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev 1996, 6: 743-748. 10.1016/S0959-437X(96)80030-XView ArticleGoogle Scholar
- Rubinsztein DC, Hayden MR: Analysis of TRIPLET REPEAT Disorders. Bios Scientific Pub, Oxford, England; 1999.Google Scholar
- Gupta R, Mittal A, Gupta S: An efficient algorithm to detect palindromes in DNA sequences using periodicity transform. Signal Process 2001, 18(4):8-20. 10.1109/79.939833View ArticleGoogle Scholar
- Chechetkin VR, Turygin AY: Size-dependence of three-periodicity and long-range correlations in DNA sequences. Phys. Lett A 1995, 199: 75-80. 10.1016/0375-9601(95)00047-7View ArticleGoogle Scholar
- Chechetkin VR, Turygin AY: Search of hidden periodicities in DNA sequences. J. Theor. Biol 1995, 175: 477-494. 10.1006/jtbi.1995.0155View ArticleGoogle Scholar
- Silverman BD, Linsker R: A measure of DNA periodicity. J. Theor. Biol 1986, 118(3):295-300. 10.1016/S0022-5193(86)80060-1View ArticleGoogle Scholar
- Holste D, Grosse I, Beirer S, Schieg P, Herzel H: Repeats and correlations in human DNA sequences. Physic. Rev. E 2003., 67(06913):Google Scholar
- Anastassiou D, Process Genomicsignalprocessing: IEEE Signal Mag. 2001, 18(4):8-20. 10.1109/79.939833View ArticleGoogle Scholar
- Chechetkin VR, Lobzin VV: Anticodons, frameshifts, and hidden periodicities in tRNA sequences. J. Biomol. Struct. Dyn 2006, 24(2):189-202. 10.1080/07391102.2006.10507112View ArticleGoogle Scholar
- Anastassiou D: Frequency domain analysis of biomolecular sequences. Bioinformatics 2000, 16(12):1073-1082. 10.1093/bioinformatics/16.12.1073MathSciNetView ArticleGoogle Scholar
- Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R: Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Biosci 1997, 13(3):263-270.Google Scholar
- Akhtar M, Ambikairajah E: Time and frequency domain methods for gene and exon prediction in Eukaryotes. In Proceedings of ICASSP. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2007, Honolulu, Hawaii, USA; 2007:573-576.Google Scholar
- Voss RF: Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett 1992, 68(25):3805-3808. 10.1103/PhysRevLett.68.3805View ArticleGoogle Scholar
- Fickett JW: The gene identification problem: an overview for developers. Comput. Chem 1996, 20: 103-118. 10.1016/S0097-8485(96)80012-XView ArticleGoogle Scholar
- Bouaynaya N, Schonfeld D: Non-stationary analysis of coding and non-coding regions in nucleotide sequences. IEEE J. Sel. Top. Signal Process 2008, 2(3):357-364.View ArticleGoogle Scholar
- Vaidyanathan PP, Yoon BJ: Gene and exon prediction using all pass-based filters. In Gensips Proc. (Workshop on Genomic Signal Processing and Statistics (GENSIPS), Raleigh, North Carolina, USA; 2003):1-4.Google Scholar
- Vaidyanathan PP, Yoon B: Digital filter for gene prediction applications. In Proc. Asilomar conference. Asilomar Conference on Signals, Systems and Computers (ACSSC), Pacific Grove, CA, USA; 2003:306-310.Google Scholar
- Akhtar M, Epps J, Ambikairajah E: On DNA numerical representations for period-3 based exon prediction. In Proceedings of the workshop on Genomic Signal Processing and Statistics. International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2007, Tuusula, Finland; 2007:1-4.Google Scholar
- Rushdi A, Tuqan J: Gene identification using the Z-curve representation. In Proceedings of the 31st IEEE ICASSP conference. International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2006, Toulouse, France; 2006:1024-1027.Google Scholar
- Tuqan J, Rushdi A: A DSP approach for finding the codon bias in DNA sequences. IEEE J. Sel. Top. Signal Process 2008, 2(3):343-356.View ArticleGoogle Scholar
- Rushdi A, Tuqan J: An efficient algorithm for DNA discrete Fourier analysis. In Proceedings of the 3rd IEEE Cairo International Biomedical Engineering Conference (CIBEC). Cairo International Biomedical Engineering Conference, Cairo, Egypt; 2006:1-4.Google Scholar
- Wang L, Schonfeld D: Mapping equivalence for symbolic sequences: Theory and applications. IEEE Trans. Signal Process 2009, 57(12):4895-4905.MathSciNetView ArticleGoogle Scholar
- Rushdi A, Tuqan J: The role of the Symbolic-to-Numerical Mapping in the detection of DNA Periodicities. In Proceedings of the workshop on Genomic Signal Processing and Statistics. International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2008, Phoenix, AZ, USA; 2008:1-4.Google Scholar
- Cristea PD: Conversion of nucleotides sequences into genomic signals. J. Cellul. Mol. Med 2002, 6(2):279-303. 10.1111/j.1582-4934.2002.tb00196.xView ArticleGoogle Scholar
- Brodzik AK: O Peters, Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences. In Proceedings of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing. International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2005, Philadelphia, PA, USA; 2005:373-376.Google Scholar
- Coward E: Equivalence of two Fourier methods for biological sequences. J. Math. Biol 1997, 36: 64-70. 10.1007/s002850050090MATHMathSciNetView ArticleGoogle Scholar
- Burset M, Guigo R: Evaluation of gene structure prediction programs. Genomics 1996, 34: 353-357. 10.1006/geno.1996.0298View ArticleGoogle Scholar
- Rushdi A, Tuqan J: Trigonometric Transforms for Finding Repeats in DNA sequences. In Proceedings of the workshop on Genomic Signal Processing and Statistics. International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2008, Phoenix, AZ, USA; 2008:1-4.Google Scholar
- Berger JA, Mitra SK, Astola J: Power spectrum analysis for DNA sequences. In Proc. of the Int. Sym. on Signal Processing and its App. Conference: International Symposium on Signal Processing and Its Applications (ISSPA) 2003, Paris, France; 2003:29-32.Google Scholar
- Kotlar D, Lavner Y: Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 2003, 13(8):1930-1937.Google Scholar
- Rushdi A, Tuqan J: The filtered spectral rotation measure. In Proceedings of the 40th IEEE Asilomar Conference on Signals, Systems, and Computers. Asilomar Conference on Signals, Systems and Computers (ACSSC) 2006, CA, USA; 2006:1875-1879.Google Scholar
- Datta S, Asif A: A fast DFT based gene prediction algorithm for identification of protein coding regions. In Proc. of the ICASSP. International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2005, Philadelphia, PA, USA; 2005:113-116.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.