In practice, the distributional parameters are generally unknown, and one would like to estimate the CoD from sample data. We present in this section the derivation of two Bayesian estimators for the CoD in (4). One approach is analogous to that followed by [17] in defining the Bayesian MMSE prediction error estimator, whereas the other one makes use of the optimal Bayesian predictor (OBP), a straightforward generalization of the optimal Bayesian classifier (OBC), introduced in [20].
We will assume that an i.i.d. sample S
n
={(X
1,Y
1),…,(X
n
,Y
n
)} from the distribution of (X,Y) is available. Given S
n
, define U
i
as the number of sample points such that X=x
i and Y=0, and V
i
as the number of sample points such that X=x
i and Y=1, for \(i = 1, \dots,2^{d}\). Note that \(N_{0} = \sum _{i=1}^{2^{d}} U_{i}\) and \(N_{1} = \sum _{i=1}^{2^{d}} V_{i}\) are the (random) sample sizes corresponding to Y=0 and Y=1, respectively.
Let \(\phantom {\dot {i}\!}{\mathbf p} = (p_{1}, \ldots, p_{2^{d}})\), \(\phantom {\dot {i}\!}{\mathbf q} = (q_{1},\ldots,q_{2^{d}})\), and θ=(c,p,q), where 0≤c,p
i
,q
i
≤1, and \(\sum _{i=1}^{2^{d}} p_{i} = \sum _{i=1}^{2^{d}} q_{i} =1\). As shown in the previous section, the distribution of (X,Y) is completely specified by the parameter vector θ. The Bayesian approach treats θ as a random variable, the prior distribution of which can take advantage of a priori knowledge about the problem. We will assume that c, p, and q are independent, i.e., f(θ)=f(c)f(p)f(q). It is shown in [17] that this implies that the posterior distribution of θ also factors f(θ∣S
n
)=f(c∣S
n
)f(p∣S
n
)f(q∣S
n
).
In this paper, we will employ the standard choice of priors for discrete distributions, namely, the Beta and Dirichlet distributions (c.f. Appendices A and B):
$$ \begin{aligned} c &\sim \text{Beta}(\alpha,\beta),\\ {\mathbf p} &\sim \text{Dirichlet}(\alpha_{1}, \dots, \alpha_{2^{d}}),\\ {\mathbf q} &\sim \text{Dirichlet}(\beta_{1}, \dots, \beta_{2^{d}}), \end{aligned} $$
((5))
where the hyperparameters α, β, α
i
, β
i
, i=1,…,2d, are positive numbers. These distributions have bounded supports; the Beta distribution is defined over the interval [0,1], while the Dirichlet distribution is defined over the simplex of 2d nonnegative numbers that add up to one. The shapes of the distributions are controlled by the concentration parameters
Δ
c
=α+β, \(\Delta _{p} = \sum _{j=1}^{2^{d}} \alpha _{j}\), and \(\Delta _{q} = \sum _{j=1}^{2^{d}} \beta _{j}\), and the base measures
c
0=α/Δ
c
, \({\mathbf p}_{0} = (\alpha _{1}/\Delta _{p}, \dots, \alpha _{2^{d}}/\Delta _{p})\), and \({\mathbf q}_{0} = (\beta _{1}/\Delta _{q}, \dots, \beta _{2^{d}}/\Delta _{q})\). Please refer to Appendices A and B for definitions and important facts about the Beta and Dirichlet distributions, which will be needed in the sequel.
A very important property for our purposes is that the Beta and Dirichlet priors are conjugate priors for the discrete multinomial distribution, i.e., they have the same form as the corresponding posteriors. Given the sample data S
n
, the posterior distributions are [17, 18]:
$$ \begin{aligned} c \mid \mathbf{S}_{n} &\sim \text{Beta}(n_{0}+\alpha,n_{1}+\beta), \\ {\mathbf p} \mid \mathbf{S}_{n} &\sim \text{Dirichlet}(u_{1}+\alpha_{1}, \dots, u_{2^{d}}+\alpha_{2^{d}}),\\ {\mathbf q} \mid \mathbf{S}_{n}&\sim \text{Dirichlet}(v_{1}+\beta_{1}, \dots, v_{2^{d}}+\beta_{2^{d}}), \\ \end{aligned} $$
((6))
where n
0 and n
1 are the observed sample sizes corresponding to Y=0 and Y=1, respectively, while U
i
and V
i
are the observed sample values of the random variables U
i
and V
i
, respectively.
Minimum mean-square error CoD estimator
Given a CoD estimator \(\widehat {\text {CoD}}\), consider the mean-square error
$$ \text{MSE} \,=\, E_{\boldsymbol{\theta}, \mathbf{S}_{n}}\left[| \widehat{\text{CoD}} - \text{CoD}|^{2}\right], $$
((7))
The minimum MSE solution, as is well known, is given by the expectation of the CoD according to the posterior distribution of the parameters [23]. This defines the Bayesian MMSE CoD estimator:
$$ \widehat{\text{CoD}}_{\text{MMSE}}\,=\, E[\!\text{CoD}\mid \mathbf{S}_{n}] \,=\, E_{\boldsymbol{\theta}\mid \mathbf{S}_{n}}[\!\text{CoD}], $$
((8))
where the CoD is given by (4).
It is well-known that the MMSE estimator \(\widehat {\text {CoD}}_{\text {MMSE}}\) not only displays the least root mean-square error (RMS) over the distribution of (θ,S
n
), but it is also an unbiased estimator (however, for a specific model with fixed θ, \(\widehat {\text {CoD}}_{\text {MMSE}}\) might not be unbiased or have the least RMS).
In order to derive an expression for the Bayesian MMSE CoD estimator, first note that (4) can be rewritten as
$${} {\fontsize{8.6pt}{9.6pt}{\begin{aligned} \text{CoD} &\,=\, 1-\sum_{i=1}^{2^{d}} \left(p_{i} \,I_{p_{i} < \frac{1-c}{c} q_{i}}I_{c<1/2} + \frac{c}{1\,-\,c}\,p_{i} \,I_{p_{i} < \frac{1-c}{c} q_{i}}\,I_{c\geq 1/2}\right. \\ &\quad\quad\quad\quad\quad \left. + \frac{1-c}{c}\,q_{i} \,I_{q_{i} \leq \frac{c}{1-c}p_{i}}I_{c<1/2}+ q_{i} \,I_{q_{i} \leq \frac{c}{1-c}p_{i}}I_{c\geq 1/2}\right)\!. \end{aligned}}} $$
((9))
Applying (8) to (9) and using the previously mentioned fact that the posterior distribution factors allows one to write the Bayesian MMSE CoD estimator as
$${\kern20pt} {\fontsize{9.2pt}{9.6pt}{\begin{aligned} & \widehat{\text{CoD}}_{\text{MMSE}} \,=\, E_{\boldsymbol{\theta}\mid \mathbf{S}_{n}}\left[\text{CoD}\right] \,=\, E_{c\mid\mathbf{S}_{n}}\left[ E_{{\mathbf p}\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[\text{CoD}\right]\right] \right] \\[-3pt] & \quad =\, 1-\sum_{i=1}^{2^{d}} \left(E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[E_{{\mathbf p}\mid\mathbf{S}_{n}}\left[ p_{i} \,I_{p_{i} < \frac{1-c}{c} q_{i}}\right] I_{c<1/2}\right]\right]\right.\\[-3pt] & \quad\quad\quad\quad\quad\:\left. + E_{c\mid\mathbf{S}_{n}} \left[\frac{c}{1\,-\,c}E_{{\mathbf q}\mid\mathbf{S}_{n}} \left[E_{{\mathbf p}\mid\mathbf{S}_{n}} \left[p_{i} \,I_{p_{i} < \frac{1-c}{c} q_{i}}\right] I_{c\geq 1/2}\right]\right]\right. \\[-2pt] & \quad\quad\quad\quad\quad\:\left.+ E_{c\mid\mathbf{S}_{n}} \left[\frac{1\,-\,c}{c}E_{{\mathbf p}\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}} \left[q_{i} \,I_{q_{i} \leq \frac{c}{1-c} p_{i}}\right] I_{c< 1/2}\right]\right]\right. \\[-2pt] & \quad\quad\quad\quad\quad\:\left. +\, E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf p}\mid\mathbf{S}_{n}}\left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[ q_{i} \,I_{q_{i} \leq \frac{c}{1\,-\,c} p_{i}}\right] I_{c\geq 1/2}\right]\right]\right), \end{aligned}}} $$
((10))
Using (6) and the fact that the marginal distributions of a Dirichlet are Beta (c.f. Appendix B), we have that c ∣ S
n
∼Beta(α
s,β
s), \(p_{i} \!\mid \! \mathbf {S}_{n} \sim \text {Beta}({{\alpha ^{s}_{i}}},{{\overline {\alpha }^{\,s}_{i}}})\), and \(q_{i} \!\mid \! \mathbf {S}_{n} \sim \text {Beta}({{\beta ^{s}_{i}}},{{\overline {\beta }^{\,s}_{i}}})\), where α
s = n
0+α, β
s=n
1+β, \({{\alpha ^{s}_{i}}} = u_{i}+\alpha _{i}\), \({{\overline {\alpha }^{\,s}_{i}}} = n_{0}-u_{i} + \Delta _{p} - \alpha _{i}\), \({{\beta ^{s}_{i}}} = v_{i}+\beta _{i}\), and \({{\overline {\beta }^{\,s}_{i}}} = n_{1}-v_{i} + \Delta _{q} - \beta _{i}\), for i=1,…,2d. Using the results in Appendix A and assuming that the hyperparameters are integers (if they are not, a simple adjustment to the derivation below can be made; see Appendix A), it follows that
$${\kern20pt} {\fontsize{9.2pt}{9.6pt}{\begin{aligned} & E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[E_{{\mathbf p}\mid\mathbf{S}_{n}} \left[p_{i} \,I_{p_{i} < \frac{1-c}{c} q_{i}}\right] I_{c<1/2}\right]\right] \,=\, E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[E_{{\mathbf p}\mid\mathbf{S}_{n}}\left[p_{i} \,I_{p_{i} < \frac{1-c}{c} q_{i}}\right] I_{q_{i} < \frac{c}{1-c}}\right] I_{c<1/2}\right] \,+\,\\[-2pt] & E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[E_{{\mathbf p}\mid\mathbf{S}_{n}}\left[p_{i} \right] \right] I_{c<1/2}\right] \,-\, E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[E_{{\mathbf p}\mid\mathbf{S}_{n}}\left[p_{i} \right] I_{q_{i} < \frac{c}{1-c}}\right] I_{c<1/2}\right]\\[-2pt] & =\frac{1}{\mathrm{B}({{\alpha^{s}_{i}}},{{\overline{\alpha}^{\,s}_{i}}})} \times \left\{E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[\sum_{j=0}^{{{\overline{\alpha}^{\,s}_{i}}}-1}r_{j}({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}) \left(\frac{1-c}{c}q_{i}\right)^{{{\alpha^{s}_{i}}}+j+1} I_{q_{i} < \frac{c}{1-c}} \right]I_{c<1/2}\right]\,+\, \right.\\[-2pt] & \left.E_{c\mid\mathbf{S}_{n}} \left[\mathrm{B}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right) I_{c<1/2} \right] \,-\, E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}} \left[\mathrm{B}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right) I_{q_{i} < \frac{c}{1-c}}\right] I_{c<1/2} \right] \vphantom{\left\{E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[\sum_{j=0}^{{{\overline{\alpha}^{\,s}_{i}}}-1}r_{j}({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}) \left(\frac{1-c}{c}q_{i}\right)^{{{\alpha^{s}_{i}}}+j+1} I_{q_{i} < \frac{c}{1-c}} \right]I_{c<1/2}\right]\,+\, \right.}\right\} =\frac{1}{\mathrm{B}\left({{\alpha^{s}_{i}}},{{\overline{\alpha}^{\,s}_{i}}}\right)\mathrm{B}\left({{\beta^{s}_{i}}},{{\overline{\beta}^{\,s}_{i}}}\right)} \times\\ & \left\{ \sum_{j=0}^{{{\overline{\alpha}^{\,s}_{i}}}-1}\:\: \sum_{k=0}^{{{\overline{\beta}^{\,s}_{i}}}-1} r_{j}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right)\,r_{k}\left({{\alpha^{s}_{i}}}+{{\beta^{s}_{i}}}+j+1,{{\overline{\beta}^{\,s}_{i}}}\right) E_{c\mid\mathbf{S}_{n}} \left[\left(\frac{c}{1-c}\right)^{{{\beta^{s}_{i}}}+k} I_{c<1/2}\right] + \right. \\[-2pt] & \left. \mathrm{B}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right)\mathrm{B}\left({{\beta^{s}_{i}}},{{\overline{\beta}^{\,s}_{i}}}\right)E_{c\mid\mathbf{S}_{n}} \left[ I_{c<1/2}\right] \,-\, \mathrm{B}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right) \sum_{j=0}^{{{\overline{\beta}^{\,s}_{i}}}-1} r_{j}\left({{\beta^{s}_{i}}},{{\overline{\beta}^{\,s}_{i}}}\right) E_{c\mid\mathbf{S}_{n}} \left[\left(\frac{c}{1-c}\right)^{{{\beta^{s}_{i}}}+j} I_{c<1/2}\right]\right\}\\[-2pt] &=\frac{1}{2^{\alpha^{s}}\, \mathrm{B}({\alpha^{s}},{\beta^{s}})\mathrm{B}\left({{\alpha^{s}_{i}}},{{\overline{\alpha}^{\,s}_{i}}}\right) \mathrm{B}\left({{\beta^{s}_{i}}},{{\overline{\beta}^{\,s}_{i}}}\right)}\,\times\,\\[1ex] & \times \left\{ \sum_{j=0}^{{{\overline{\alpha}^{\,s}_{i}}}-1}\:\: \sum_{k=0}^{{{\overline{\beta}^{\,s}_{i}}}-1} \:\:\sum_{l=0}^{{\beta^{s}} - \left({{\beta^{s}_{i}}}+k+1\right)} \!\! \left[ r_{j}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right)\,r_{k}\left({{\alpha^{s}_{i}}}+{{\beta^{s}_{i}}}+j+1,{{\overline{\beta}^{\,s}_{i}}}\right) r_{l}\left({\alpha^{s}}+{{\beta^{s}_{i}}}+k,{\beta^{s}} - \left({{\beta^{s}_{i}}}+k\right)\right) \right. \right. \\ & \left. \quad \quad \times\, \frac{1}{2^{{{\beta^{s}_{i}}}+k+l}}\right] +\, \mathrm{B}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right) \mathrm{B}\left({{\beta^{s}_{i}}},{{\overline{\beta}^{\,s}_{i}}}\right) \, \sum_{j=0}^{{\beta^{s}}-1} \, r_{j}\left({\alpha^{s}},{\beta^{s}}\right) \frac{1}{2^{j}} \\ & \quad\quad -\, \left. \mathrm{B}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right) \, \sum_{j=0}^{{{\overline{\beta}^{\,s}_{i}}}-1}\:\:\sum_{k=0}^{{\beta^{s}}-\left({{\beta^{s}_{i}}}+j+1\right)} \, r_{j}\left({{\beta^{s}_{i}}},{{\overline{\beta}^{\,s}_{i}}}\right) \,r_{k}\left({\alpha^{s}}+{{\beta^{s}_{i}}}+j,{\beta^{s}}-\left({{\beta^{s}_{i}}}+j\right)\right) \frac{1}{2^{{{\beta^{s}_{i}}}+j+k}} \right\}, \end{aligned}}} $$
((11))
Likewise, we have
$${\kern20pt} \begin{aligned} & E_{c\mid\mathbf{S}_{n}} \left[\frac{c}{1\,-\,c}E_{{\mathbf q}\mid\mathbf{S}_{n}} \left[E_{{\mathbf p}\mid\mathbf{S}_{n}} \left[p_{i} \,I_{p_{i} < \frac{1-c}{c} q_{i}}\right] I_{c\geq 1/2} \right] \right] \,=\, \frac{1}{2^{\beta^{s}}\,\mathrm{B}({\alpha^{s}},{\beta^{s}})\mathrm{B}\left({{\alpha^{s}_{i}}}, {{\overline{\alpha}^{\,s}_{i}}}\right) \mathrm{B}\left({{\beta^{s}_{i}}},{{\overline{\beta}^{\,s}_{i}}}\right)}\\[1ex] & \times \left\{ \sum_{j=0}^{{{\overline{\alpha}^{\,s}_{i}}}-1} \:\:\sum_{k=0}^{{\alpha^{s}} - ({{\alpha^{s}_{i}}}+j+1)} \!\! \left[r_{j}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right)\,r_{k}\left({\beta^{s}}+ {{\alpha^{s}_{i}}}+j,{\alpha^{s}}-\left({{\alpha^{s}_{i}}}+j\right)\right) \, \mathrm{B}\left({{\alpha^{s}_{i}}}+{{\beta^{s}_{i}}}+j+1,{{\overline{\beta}^{\,s}_{i}}}\right) \right. \right. \\ & \left. \left. \quad \quad \times\, \frac{1}{2^{{{\alpha^{s}_{i}}}+j+k}}\right]\vphantom{\left\{ \sum_{j=0}^{{{\overline{\alpha}^{\,s}_{i}}}-1} \:\:\sum_{k=0}^{{\alpha^{s}} - ({{\alpha^{s}_{i}}}+j+1)} \!\! \left[r_{j}\left({{\alpha^{s}_{i}}}+1,{{\overline{\alpha}^{\,s}_{i}}}\right)\,r_{k}\left({\beta^{s}}+ {{\alpha^{s}_{i}}}+j,{\alpha^{s}}-\left({{\alpha^{s}_{i}}}+j\right)\right) \, \mathrm{B}\left({{\alpha^{s}_{i}}}+{{\beta^{s}_{i}}}+j+1,{{\overline{\beta}^{\,s}_{i}}}\right) \right. \right.} \right\}, \end{aligned} $$
((12))
$$ \begin{aligned} & E_{c\mid\mathbf{S}_{n}} \left[\frac{1\,-\,c}{c}E_{{\mathbf p}\mid\mathbf{S}_{n}} \left[E_{{\mathbf q} \mid\mathbf{S}_{n}} \left[q_{i} \,I_{q_{i} \leq \frac{c}{1-c} p_{i}}\right] I_{c< 1/2} \right] \right]\\& \,=\, \frac{1}{2^{\alpha^{s}}\,\mathrm{B}({\alpha^{s}},{\beta^{s}})\mathrm{B}\left({{\alpha^{s}_{i}}},{{\overline{\alpha}^{\,s}_{i}}}\right) \mathrm{B}\left({{\beta^{s}_{i}}},{{\overline{\beta}^{\,s}_{i}}}\right)}\\[1ex] & \times \left\{ \sum_{j=0}^{{{\overline{\beta}^{\,s}_{i}}}-1} \:\:\sum_{k=0}^{{\beta^{s}} - ({{\beta^{s}_{i}}}+j+1)} \!\!\left[r_{j}\left({{\beta^{s}_{i}}}+1,{{\overline{\beta}^{\,s}_{i}}}\right)\,r_{k}({\alpha^{s}}+ {{\beta^{s}_{i}}}+j,\right.\right.\\&\quad\; \left.\left.{\beta^{s}}-({{\beta^{s}_{i}}}+j)) \, \mathrm{B}\left({{\alpha^{s}_{i}}}+ {{\beta^{s}_{i}}}+j+1,{{\overline{\alpha}^{\,s}_{i}}}\right) \right. \right. \\ & \left. \left. \quad \quad\;\; \times\, \frac{1}{2^{{{\beta^{s}_{i}}}+j+k}}\right] \vphantom{\left\{ \sum_{j=0}^{{{\overline{\beta}^{\,s}_{i}}}-1} \:\:\sum_{k=0}^{{\beta^{s}} - ({{\beta^{s}_{i}}}+j+1)} \!\!\left[r_{j}\left({{\beta^{s}_{i}}}+1,{{\overline{\beta}^{\,s}_{i}}}\right)\,r_{k}({\alpha^{s}}+ {{\beta^{s}_{i}}}+j,\right.\right.}\right\}, \end{aligned} $$
((13))
and
$${} {\fontsize{9.4pt}{9.6pt}{\begin{aligned} & E_{c\mid\mathbf{S}_{n}} \left[E_{{\mathbf p}\mid\mathbf{S}_{n}}\left[E_{{\mathbf q}\mid\mathbf{S}_{n}}\left[ q_{i} \,I_{q_{i} \leq \frac{c}{1\,-\,c} p_{i}}\right] I_{c\geq 1/2}\right] \right]\\&\,=\, \frac{1}{2^{\beta^{s}}\, \mathrm{B}({\alpha^{s}},{\beta^{s}})\mathrm{B}\left({{\alpha^{s}_{i}}},{{\overline{\alpha}^{\,s}_{i}}}\right) \mathrm{B}\left({{\beta^{s}_{i}}},{{\overline{\beta}^{\,s}_{i}}}\right)}\\[1ex] & \times \left\{ \sum_{j=0}^{{{\overline{\beta}^{\,s}_{i}}}-1}\:\: \sum_{k=0}^{{{\overline{\alpha}^{\,s}_{i}}}-1} \:\:\sum_{l=0}^{{\alpha^{s}} -({{\alpha^{s}_{i}}}+k+1)} \!\! \left[r_{j}({{\beta^{s}_{i}}}+1,{{\overline{\beta}^{\,s}_{i}}})\,r_{k}({{\alpha^{s}_{i}}}+{{\beta^{s}_{i}}}+j \right.\right.\\&\qquad \left.\left.+ \ 1,{{\overline{\alpha}^{\,s}_{i}}}) r_{l}({\beta^{s}}+{{\alpha^{s}_{i}}}+k,{\alpha^{s}} - ({{\alpha^{s}_{i}}}+k)) \right. \right. \\ & \left. \quad \quad \times\, \frac{1}{2^{{{\alpha^{s}_{i}}}+k+l}}\right] +\, \mathrm{B}({{\beta^{s}_{i}}}+1,{{\overline{\beta}^{\,s}_{i}}})\mathrm{B}({{\alpha^{s}_{i}}},{{\overline{\alpha}^{\,s}_{i}}}) \, \sum_{j=0}^{{\alpha^{s}}-1} \, r_{j}({\beta^{s}},{\alpha^{s}}) \frac{1}{2^{j}} \\ & \quad\quad -\, \left. \mathrm{B}({{\beta^{s}_{i}}}+1,{{\overline{\beta}^{\,s}_{i}}}) \, \sum_{j=0}^{{{\overline{\alpha}^{\,s}_{i}}}-1}\:\:\sum_{k=0}^{{\alpha^{s}}-({{\alpha^{s}_{i}}}+j+1)} \, r_{j}({{\alpha^{s}_{i}}},{{\overline{\alpha}^{\,s}_{i}}})\,r_{k}({\beta^{s}}+{{\alpha^{s}_{i}}}\right.\\& \qquad\left.+ \ j,{\alpha^{s}}-({{\alpha^{s}_{i}}}+j)) \frac{1}{2^{{{\alpha^{s}_{i}}}+j+k}}\vphantom{\left\{ \sum_{j=0}^{{{\overline{\beta}^{\,s}_{i}}}-1}\:\: \sum_{k=0}^{{{\overline{\alpha}^{\,s}_{i}}}-1} \:\:\sum_{l=0}^{{\alpha^{s}} -({{\alpha^{s}_{i}}}+k+1)} \!\! \left[r_{j}({{\beta^{s}_{i}}}+1,{{\overline{\beta}^{\,s}_{i}}})\,r_{k}({{\alpha^{s}_{i}}}+{{\beta^{s}_{i}}}+j \right.\right.} \right\}, \end{aligned}}} $$
((14))
where the Beta function B(a,b) and the coefficients r
i
(a,b) are defined in Appendix A.
Replacing (11)–(14) into (10) produces an exact expression for computing the MMSE CoD estimator in terms of sample sizes and model hyperparameters. Notice that for the previous expressions to make sense, one must have α>Δ
p
−1 and β>Δ
q
−1. In particular, if uniform priors are chosen for p or q, then the prior for c cannot be uniform (c.f. Appendix A).
Optimal Bayesian predictor CoD estimator
In this section, we derive a second Bayesian CoD estimator, using the optimal Bayesian predictor (OBP), a simple extension to the Boolean prediction problem of the “optimal Bayesian classifier” (OBC) proposed in [20]. Formally, let ε
θ
[ψ] denote the error of a predictor ψ under parameter vector θ. The OBP predictor ψ
OBP minimizes the average error over the family of (posterior) distributions indexed by the parameter
$$ \psi_{\text{OBP}} \,=\, \arg\min_{\psi\in \Upsilon} \:E_{\boldsymbol{\theta}\mid\mathbf{S}_{n}}[\!\varepsilon_{\boldsymbol{\theta}}[\!\psi]], $$
((15))
Using the results of [20] for the OBC, one can verify that the OBP for the Beta-Dirichlet model considered here is given by
$$ \psi_{\text{OBP}}(\mathbf{x}^{i}) \:=\: \left\{ \begin{array}{ll} 1, & {\textrm{if }\:\:\frac{n_{0}+\alpha}{n+\alpha+\beta} \:\frac{U_{i} + \alpha_{i}}{n_{0}+\Delta_{p}} \:<\: \frac{n_{1}+\beta}{n+\alpha+\beta} \:\frac{V_{i} + \beta_{i}}{n_{1}+\Delta_{q}}}, \\ 0, & \mathrm{otherwise,} \end{array} \right. $$
((16))
for i=1,…,2d, with optimal prediction error
$${} {\fontsize{9.2pt}{9.6pt}{\begin{aligned} \hat{\varepsilon}_{\text{OBP}} \,=\, E_{\boldsymbol{\theta}\mid\mathbf{S}_{n}}[\!\varepsilon_{\boldsymbol{\theta}}[\!\psi_{\text{OBP}}]] \,&=\, \sum_{i=1}^{2^{d}} \min \left\{ \frac{n_{0}+\alpha}{n+\alpha+\beta}\: \frac{U_{i} + \alpha_{i}}{n_{0}+\Delta_{p}},\right.\\&\qquad\qquad\;\;\; \left. \frac{n_{1}+\beta}{n+\alpha+\beta} \: \frac{V_{i} + \beta_{i}}{n_{1}+\Delta_{q}}\right\}. \end{aligned}}} $$
((17))
On the other hand, the average errors of the the constant predictors ψ≡0 and ψ≡1 are
$$ \begin{aligned} E_{c\mid\mathbf{S}_{n}}[\!P(Y=1)]&\,=\, E_{c\mid\mathbf{S}_{n}}[\!c] \,=\, \frac{n_{0}+\alpha}{n+\alpha+\beta}, \\ E_{c\mid\mathbf{S}_{n}}[\!P(Y=0)]&\,=\, 1-E_{c\mid\mathbf{S}_{n}}[\!c] \,=\, \frac{n_{1}+\beta}{n+\alpha+\beta}, \end{aligned} $$
((18))
respectively, so that the OBP error in the absence of observations is
$$ \hat{\varepsilon}_{0,\text{OBP}} \,=\, \min \left\{ \frac{n_{0}+\alpha}{n+\alpha+\beta},\: \frac{n_{1}+\beta}{n+\alpha+\beta} \right\}, $$
((19))
We can then combine (17) and (19) to obtain the optimal Bayesian predictor (OBP) CoD estimator
$${} \begin{aligned} \widehat{\text{CoD}}_{\text{OBP}} & \,=\, 1 - \frac{\hat{\varepsilon}_{\text{OBP}}}{\hat{\varepsilon}_{0,\text{OBP}}} \\[-2ex] & \,=\, 1- \frac{1}{\min\{n_{0}+\alpha,n_{1}+\beta\}} \, \sum_{i=1}^{2^{d}} \min \\&\quad\;\;\left\{ \frac{n_{0}+\alpha}{n_{0}+\Delta_{p}} (U_{i} + \alpha_{i}),\: \frac{n_{1}+\beta}{n_{1}+\Delta_{q}} (V_{i} + \beta_{i})\right\}. \end{aligned} $$
((20))
It is easy to show that \(0 \leq \hat {\varepsilon }_{\text {OBP}} \leq \hat {\varepsilon }_{0,\text {OBP}}\), and thus \(0 \leq \widehat {\text {CoD}}_{\text {OBP}} \leq 1\).
Execution time for computation of the OBP CoD estimator grows as O(2d). By comparison, the complexity for exact computation of the Bayesian MMSE CoD estimator introduced in the previous subsection is O(n
3 × 2d). Neither n or d tends to be too large in Genomics applications, due to small sample sizes and the fact that the average number of predictor genes d per target gene must be small for a stable system, as remarked by S. Kauffman in [2]. However, if n and d become large, one could devise Monte Carlo approximation methods to compute both CoD estimators.
Therefore, the OBP CoD estimator, though suboptimal, is much more efficient computationally than the MMSE CoD estimator, especially at large sample sizes. In addition, we will see in the next section that the OBP CoD can be even more accurate than the MMSE CoD estimator, in frequentist sense, under a fixed value of the parameters.