A visual analytics approach for models of heterogeneous cell populations

Hasenauer, Jan; Heinrich, Julian; Doszczak, Malgorzata; Scheurich, Peter; Weiskopf, Daniel; Allgöwer, Frank

doi:10.1186/1687-4153-2012-4

Research
Open access
Published: 31 May 2012

A visual analytics approach for models of heterogeneous cell populations

Jan Hasenauer¹,
Julian Heinrich²,
Malgorzata Doszczak³,
Peter Scheurich³,
Daniel Weiskopf² &
…
Frank Allgöwer¹

EURASIP Journal on Bioinformatics and Systems Biology volume 2012, Article number: 4 (2012) Cite this article

7172 Accesses
6 Citations
Metrics details

Abstract

In recent years, cell population models have become increasingly common. In contrast to classic single cell models, population models allow for the study of cell-to-cell variability, a crucial phenomenon in most populations of primary cells, cancer cells, and stem cells. Unfortunately, tools for in-depth analysis of population models are still missing. This problem originates from the complexity of population models. Particularly important are methods to determine the source of heterogeneity (e.g., genetics or epigenetic differences) and to select potential (bio-)markers. We propose an analysis based on visual analytics to tackle this problem. Our approach combines parallel-coordinates plots, used for a visual assessment of the high-dimensional dependencies, and nonlinear support vector machines, for the quantification of effects. The method can be employed to study qualitative and quantitative differences among cells. To illustrate the different components, we perform a case study using the proapoptotic signal transduction pathway involved in cellular apoptosis.

1 Introduction

Cell populations are heterogeneous in terms of, e.g, cell age, cell cycle state, and protein abundance [1, 2]. This heterogeneity is ubiquitous, even in clonal population, and influences cell fate decisions [2, 3], such as cell death/proliferation [4–7]. Thus, to ultimately understand and control the behavior of populations, the key sources of cell-to-cell variability have to be unraveled. Unfortunately, this is challenging due to experimental constraints. Most experimental systems and measurement devices only allow for the simultaneous assessment of a few cellular properties on a single cell basis. This prohibits the purely experimental analysis of processes which depend on many different cellular properties. Spencer et al. [5] have shown that the experimental limitations can be overcome partially using mathematical models.

To mathematically describe heterogeneous populations, agent-based models are used most frequently. Each agent provides a mechanistic description of the signal transduction within individual cells and thus of its behavior. In such a framework, variability can be modeled by either stochastic [8–10] or deterministic [4, 5, 11] differences among individual cells. The source of the former is the stochasticity of biochemical reactions, while the latter may arise from genetic and epigenetic differences, environmental heterogeneity, or slow dynamic processes (such as the cell cycle).

We focus on the deterministic differences among cells — also called extrinsic factors [12] — in populations of non-interacting cells. Those differences are commonly modeled by differential parameter values and initial conditions [5, 13]. Several methods exist to infer the distribution of parameters and initial conditions from experimental data [13–15] and to obtain quantitative, mechanistic models for cell populations. Unfortunately, the resulting agent-based models are in general highly complex. This complexity prevents the analysis of these models using common tools for dynamical systems [16], such as sensitivity and bifurcation analysis. To the best of our knowledge, for models of heterogeneous cell populations, no structured analysis approach is available. To study population models and to facilitate a model-driven analysis of the heterogeneity, highly flexible methods are required which do not rely on an analytical analysis.

In this work, we propose two methods to fill this gap and to facilitate the analysis of population models. These methods — parallel-coordinates plots[17] and support vector (SV) machines[18–20] — are tools widely used for the analysis of high-dimensional datasets. We outline how these tools can also be used to analyze complex models of heterogeneous cell populations, particularly addressing the question: "Which parameters cause the heterogeneity of the population's response?". Thereby, we extend our previous work [21] and consider qualitative heterogeneity among cells, in the context of cell fate decisions, as well as quantitative heterogeneity, such as the delay of a decision process.

We show that parallel-coordinates plots provide an easy tool to obtain a qualitative understanding of the system, whereas SV machines allow for assessing the performance of marker combinations quantitatively. Good markers are thereby defined as single cell parameters that facilitate a good prediction of the cell fate decision or the quantitative property under consideration of the individual cell. Furthermore, we show how the combination of parallel-coordinates plots and SV machines enables an in-depth analysis of complex models using exploration techniques.

The article is structured as follows: In the section "Methods", the considered system class and problem are described in mathematical terms, the general idea is discussed, and the application of parallel-coordinates plots and SV machines is outlined. In the section "Results", we provide an exemplary application of our method to a model of the caspase cascade. The article is summarized in the section "Discussion".

2 Methodology

2.1 Models for heterogeneous cell populations and decision processes

2.1.1 Mechanistic population model

In this article, population dynamics are described using an ensemble [5, 13] of cells (agents). This yields the agent-based population model:

Σ_{pop} = \{Σ (θ^{(i)}) | i = {1, \dots, N}, θ^{(i)} ~ Θ (θ)\},

in which the superscript (i) specifies individual cells within the population, N ∈ ℕ denotes the size of the cell ensemble and Σ(θ⁽ⁱ⁾) is the model of the i-th cell. The single cell model Σ(θ⁽ⁱ⁾) may belong to the class of Markov jump processes [15], stochastic differential equations [14], or ordinary differential equations [13]. Since in this study we are mainly interested in signal transduction and decision making, we consider ordinary differential equation models. Each individual cell of Σ_pop is described by

Σ (θ^{(i)}) : ẋ^{(i)} = f (x^{(i)}, θ^{(i)}), x^{(i)} (0) = x_{0} (θ^{(i)}),

with state vector $x^{(i)} (t) \in ℝ_{+}^{n}$ and parameter vector $θ^{(i)} \in ℝ_{+}^{q}$ . The vector field $f : ℝ_{+}^{n} \times ℝ_{+}^{q} \to ℝ^{n}$ describing the cell dynamics is locally Lipschitz and the mapping $x_{0} : ℝ_{+}^{q} \to ℝ_{+}^{n}$ is continuously differentiable. The parameters θ⁽ⁱ⁾may be kinetic constants, such as synthesis, degradation, or reaction rates.

Heterogeneity among cells of the ensemble is modeled by differential parameter values θ⁽ⁱ⁾and initial conditions x₀(θ⁽ⁱ⁾) among individual cells. The density of parameters θ⁽ⁱ⁾is given by a probability density function $Θ : ℝ_{+}^{q} \to ℝ_{+}$ . Thus, the probability of observing θ⁽ⁱ⁾∈Ω is

Prob (θ^{(i)} \in Ω) = \int_{Ω} Θ (θ) d θ .

This modeling framework is highly flexible and has been proven to be very useful, especially if fast signal transduction processes, such as cellular apoptosis, are investigated. For a more detailed introduction, we refer to the work of Spencer et al. [5] and Hasenauer et al. [14]. The properties of such populations of single cells have been studied by Spencer et al. [5], while Hasenauer et al. [14] have derived a partial differential equation model for the resulting population dynamics.

2.1.2 Qualitative and quantitative properties of the single cell response

Given the mathematical models introduced above, we study qualitative and quantitative properties of the single cell responses. Qualitative properties are defined as the outcome of a discrete decision processes, e.g., whether the state of a bistable system converges to one or the other stable steady state, or whether a certain concentration threshold is reached. In contrast, quantitative properties allow the assessment of small differences among cells, such as the time point when a particular threshold is exceeded.

To define single cell properties given the single cell trajectory x⁽ⁱ⁾(·), the functionals F_φ : ℓ¹ → ℝ and F_δ : ℓ¹ → {-1, +1} are introduced. The functional F_φ is used to evaluate the quantitative property φ⁽ⁱ⁾= F_φ (x⁽ⁱ⁾(·)) ∈ ℝ, while F_δ determines the qualitative property δ⁽ⁱ⁾= F_δ (x⁽ⁱ⁾(·)) ∈ {-1, +1}.

To exemplify the functionals, we consider a process in which threshold exceeding and its timing are of interest. Such processes are important, for example, in apoptotic signaling [5] and cell cycle progression [22, 23], and allow for two outcomes. Either the concentration of a molecule $x_{j}^{(i)}$ within the i-th cell exceeds the threshold x_{j, th}, δ⁽ⁱ⁾= +1, or it does not, δ⁽ⁱ⁾= -1. This yields the decision functional

F_{δ} (x^{(i)} (\cdot)) : = \{\begin{matrix} + 1 & if max_{t} x_{j}^{(i)} (t) \geq x_{j, th} \\ - 1 & otherwise . \end{matrix}

(1)

For the subgroup of cells exceeding the threshold, the time of threshold exceeding is defined by the second functional

F_{φ} (x^{(i)} (\cdot)) : = arg min_{t} \{x_{j}^{(i)} (t) \geq x_{j, th}\},

(2)

and may be employed to achieve a quantitative understanding.

Note that the response x⁽ⁱ⁾(·) of a cell merely depends on the cell's parameters θ⁽ⁱ⁾, as the single cell model is deterministic. Therefore, the quantitative and qualitative properties of a single cell can be viewed as a function of the parameters, φ⁽ⁱ⁾= φ(θ⁽ⁱ⁾) and δ⁽ⁱ⁾= δ(θ⁽ⁱ⁾). Differences in the parameters — as they arise between different cells — may hence influence δ⁽ⁱ⁾and φ⁽ⁱ⁾, which determine cell fate decision and qualitative properties of the cells.

2.1.3 Response markers

To understand the heterogeneity within the population response Σ_pop, it is necessary to assess the dependency of δ⁽ⁱ⁾and φ⁽ⁱ⁾on the individual parameters θ_j . In particular, the question arises which subset θ_m of parameters,

θ_{m} : = {[θ_{m_{1}}, \dots, θ_{m_{r}}]}^{T}, with m \subseteq {1, \dots, q},

is responsible for which aspect of the population heterogeneity. Mathematically, m is an index set and, e.g., for m = [2, 4] ^T only θ_m = [θ₂, θ₄]^T is considered. The question of the relative importance of different parameters directly relates to the common problem of biomarker selection for stem cells and tumor cells, which is experimentally challenging.

If there exists a subset θ_m of the parameters θ which allows for the reliable prediction of the response, not all sources for heterogeneity have to be assessed but only those associated to θ_m. This enables a focusing of the model development, as well as the reduction of the experimental effort.

2.2 Analysis of population models using data analysis tools

In this contribution, we illustrate the application of parallel-coordinates plots and support vector machines for the study of parameter dependencies and the selection of markers m. Parallel-coordinates plots and SV machines are well-known, but almost exclusively applied to study high-dimensional sets of measurement data. To exploit the methods for the analysis of simulation models, at first the cell ensemble is simulated for N ≫ 1. This yields many pairs of parameters and trajectories,

(θ^{(i)}, x^{(i)} (\cdot)), i = 1, \dots, N,

which are then used to obtain samples of quantitative,

S_{φ} = \{(θ^{(1)}, φ^{(1)}), \dots, (θ^{(N)}, φ^{(N)})\}, with φ^{(i)} = F_{φ} (x^{(i)} (\cdot)),

and qualitative

S_{δ} = \{(θ^{(1)}, δ^{(1)}), . . ., (θ^{(N)}, δ^{(N)})\}, with δ^{(i)} = F_{δ} (x^{(i)} (\cdot)),

cell properties of interest. These samples contain information about the dependency of φ and δ on the parameters θ, being analyzed in the following. To study the high-dimensional mappings δ = δ(θ) and φ = φ(θ), parallel-coordinates plots will be employed. For the quantitative assessment of particular marker combinations SV machines will be applied. By combining both approaches it is possible to quickly gain an overview of important interrelations and quantify those.

2.2.1 Combining parallel-coordinates plots and SV machines to a visual analytics system

The proposed simulation data-based analysis approach circumvents an analytical analysis of the system equations, which would be time consuming and could only be carried out by experts. However, the simulation data-based approach creates the need for analyzing the large, high-dimensional datasets, $S_{δ}$ and $S_{φ}$ .

The analysis of such datasets often relies on a reduction of complexity while preserving the important information. Visualization can help in such a situation to determine the important parameters and to avoid information loss. In this work, parallel-coordinates plots are used to gain insight into the high-dimensional dependencies and to find interesting dimensions. In this particular setting, interesting dimensions are those that clearly separate a given set of classes and thus are good candidates for the selection of potential markers m. In a second step, the potential markers m are used to train a SV machine. These SV machines allow for a quantitative evaluation of the marker quality. While SV machines are also helpful on their own, checking all possible combinations of markers would result in a combinatorial explosion. By combining SV machines and parallel-coordinates plots, the number of necessary SV machine evaluations can be decreased substantially, resulting in a tremendously reduced computational complexity. The overall workflow of the analysis illustrated in Figure 1.

Besides an improved understanding of the model, results obtained during the analysis can be used to adapt the population model or to select additional experiments. This proposed framework, integrating interactive visualization with automated methods while allowing for a feedback to the actual system/model, thus incorporates important aspects of visual analytics [24].

2.3 Parallel-coordinates for the analysis of high-dimensional data

Parallel-coordinates [17] are a popular visualization technique for high-dimensional data. A parallel-coordinates plot is constructed by placing axes in parallel, as illustrated in Figure 2. A single pair of adjacent axes represents a 2-D projection of the data, where a point of the corresponding Cartesian coordinates is mapped to a line in parallel-coordinates, and vice versa. Due to this point-line duality, the same patterns emerge in a parallel-coordinates plot as in the dual Cartesian coordinates. However, adding more axes not only allows to visualize a set of pairwise relations, but also supports the viewer in tracing lines over all dimensions. As a result, multi-dimensional outliers and clusters can be visualized together with 2-D relations and the distribution of values for single dimensions.

As an N-dimensional data point is represented by a polyline intersecting axes at the respective values, parallel-coordinates greatly suffer from overplotting if many lines have to be drawn. In the resulting clutter of lines, interesting patterns might be hidden from the user. Exploiting the point-line duality, similar clutter-reducing approaches as for Cartesian coordinates can be used, where a popular technique is to estimate the density of points (lines) and to render points (lines) transparently with blending enabled. Other approaches compute a continuous density [25] or estimate the overall density using density estimation techniques [26, 27]. In this work, both alpha and additive blending is used to visualize the parameter distribution in the different classes (φ⁽ⁱ⁾= 1 and φ⁽ⁱ⁾= -1), enabling a qualitative analysis of their multi-dimensional shape. An example of this alpha blending is shown in the section "Results".

For the analysis of a continuous variable, colormaps can be applied to the axis representing the dependent variable φ⁽ⁱ⁾. Then, every polyline is rendered using a color according to φ⁽ⁱ⁾, such that its value can be visually determined over the whole plot. The overall distribution of colors can then be used in conjunction with the shape of lines to analyze the dependency of independent variables from the dependent. Again, overplotting can become an issue for large datasets, such that a separation in few classes and a separate visualization of those might be more informative (see example in section "Results").

2.4 SV machines for the quantification of marker performance

Given a basic understanding of the importance of the parameters and a potential marker combination θ_m, a quantitative assessment of the predictive power of θ_m is desirable. To achieve this, the samples $S_{δ}$ and $S_{φ}$ are analyzed employing nonlinear SV classification and nonlinear SV regression, respectively. SV classification allows for the study of decision processes, while SV regression enables the analysis of quantitative system properties.

The performance of SV machines — which might be interpreted as data-based predictors — provides a measure for the quality of the marker combination θ_m. If a SV machine using only θ_m provides good predictions for a decision process which depends on θ, then this means that θ_m carries the most important information. This will be discussed in more detail in the following.

2.4.1 SV classification

The goal of the SV classification is to predict the discrete property δ⁽ⁱ⁾given $θ_{m}^{(i)}$ . Thus, the nonlinear mapping δ = δ(θ) is approximated by the lower-dimensional nonlinear mapping $\hat{δ} = \hat{δ} (θ_{m})$ . To calculate the SV classifier, a two step procedure is applied, as illustrated in Figure 3. First, a mapping $Φ : ℝ^{r} \to ℝ^{r^{*}}$ — also called kernel — is constructed that transforms the input space into a feature space of higher dimension (r* > r). Second, a linear separation of the data is performed in feature space [20]. Therefore, the optimization problem

\begin{gathered} \underset{w, b, ξ}{minimize} \frac{1}{2} w^{T} w + C \sum_{i = 1}^{N} ξ_{i} \\ subject to δ^{(i)} (w^{T} Φ (θ_{m}^{(i)}) + b) \geq 1 - ξ_{i}, i = 1, \dots, N, \\ ξ_{i} \geq 0, i = 1, \dots, N, \end{gathered}

(3)

is solved, in which w and b denote the normal vector of the separating hyperplane and its offset, respectively. The objective function combines a misclassification penalty, $\sum_{i = 1}^{S} ξ_{i}$ , and a margin maximization, $\frac{1}{2} w^{T} w$ . The weighting of the different terms can be influenced via C. The constraints are that all data points $Φ (θ_{m}^{(i)}, δ^{(i)})$ are correctly classified within a certain error margin ξ_i .

Given the solution of (3), a predictor (SV classifier) for the decision process δ = δ(θ) is

\hat{δ} (θ_{m}) = sign (w^{T} Φ (θ_{m}) + b) .

(4)

Assuming that the training set $S_{δ}$ is large, the predictive power of this predictor will be high — meaning that $\hat{δ} (θ_{m}^{(i)}) = δ (θ^{(i)})$ for most θ⁽ⁱ⁾~ Θ(θ) — if and only if the selected markers θ_m are informative. This allows the quantitative assessment of the informativeness of the markers θ_m using the SV classifier.

Therefore, a second sample $S_{δ}^{'}$ is computed which was not used to train the SV classifier, avoiding overfitting. For this sample, the predictor ${\hat{δ}}^{(i)} = \hat{δ} (θ_{m}^{(i)})$ is evaluated. These results are used to calculate the percentage of true positive classifications TP $(δ^{(i)} = 1 \land {\hat{δ}}^{(i)} = 1)$ and false positive classifications FP $(δ^{(i)} = 0 \land {\hat{δ}}^{(i)} = 1)$ achieved by the SV classifier. TP and FP provide information about the predictability of the outcome for θ⁽ⁱ⁾using solely $θ_{m}^{(i)}$ . Thus, the marker quality can be assessed via TP and FP. If a low-dimensional m exists that provides TP ≈ 1 and FP ≈ 0, the parameters θ_m dominate the decision process and are good markers. For a quantification of this effect, the classification performance can be analyzed in receiver-operating characteristic (ROC) space [28].

2.4.2 SV regression

Similar to the assessment of the predictive power of marker combinations for qualitative decisions, also quantitative properties may be analyzed. Therefore, we employ SV regression which allows us to compute a data-based predictor

\hat{φ} (θ_{m}) = w^{T} Φ (θ_{m}) + b,

(5)

for the quantitative property φ = φ (θ). To compute the nonlinear predictor, a kernel $Φ : ℝ^{r} \to ℝ^{r^{*}}$ [29] is chosen and an optimization criterion selected. In this work, we use an ε-insensitive loss function [30], meaning that residuals $φ^{(i)} - \hat{φ} (θ_{m}^{(i)})$ with an absolute value below ε are not penalized while larger residuals are penalized linearly. This loss function is frequently used in the literature (see, e.g., [20, 30]) and results for the sample $S_{φ}$ in the optimization problem:

\begin{gathered} \underset{w, b, ξ, ξ^{*}}{minimize} \frac{1}{2} w^{T} w + C \sum_{i = 1}^{N} (ξ_{j} + ξ_{i}^{*}) \\ subject to φ^{(i)} - w^{T} Φ (θ_{m}^{(i)}) - b \geq ε + ξ_{i}, i = 1, \dots, N, \\ - φ^{(i)} + w^{T} Φ (θ_{m}^{(i)}) + b \geq ε + ξ_{i}^{*}, i = 1, \dots, N, \\ ξ_{i}, ξ_{i}^{*} \geq 0, i = 1, \dots, N . \end{gathered}

(6)

Aside from the penalization of prediction error, $\sum_{i = 1}^{N} (ξ_{i} + ξ_{i}^{*})$ , flatness and a unique solution is ensured using $\frac{1}{2} w^{T} w$ . The trade-off between those two is determined by the constant C > 0.

The optimal solution of (6) for w and b provides the optimal predictor (5) with respect to the loss function and kernel. This predictor ${\hat{φ}}^{(i)} = \hat{φ} (θ_{m}^{(i)})$ is applied to a second sample $S_{φ}^{'}$ to compute ${\hat{φ}}^{(i)}$ , a prediction for φ⁽ⁱ⁾. Employing φ⁽ⁱ⁾and ${\hat{φ}}^{(i)}$ the marker combination m might be evaluated based on the relative prediction errors, $e_{m}^{(i)} = |\frac{φ^{(i)} - \hat{φ} (θ_{m}^{(i)})}{φ^{(i)}}|$ . Using $e_{m}^{(i)}$ , the prediction powers of different marker combinations can be assessed and compared using, e.g., the mean error $\frac{1}{N} \sum_{i = 1}^{N} e_{m}^{(i)}$ . If the mean prediction error achieved by a marker combination is small, the parameters $θ_{m}^{(i)}$ carry most of the information about φ⁽ⁱ⁾, and hence are suitable markers. In some situations, the information about the mean prediction error may be complemented by detailed information about the error statistics, ${e_{m}^{(i)}}_{i = 1}^{N}$ . These statistics may be visualized using, for instance, box plots or histograms, and provide additional insight, e.g., in the structure of the error (short- vs. long-tailed distributions) and the potential causes.

Note that the performance and predictive power of SV machines strongly depend on the available training set. For the analysis performed, we ensured that the training sets are large enough and that a further increase in its size does not result in a significant improvement of the predictors. This is, in most situations where SV machines and SV regressions are used, impossible for data analysis, as the measurement devices are limited. However, in this work we study the problem of model analysis. The size of the dataset can be increased arbitrarily by repeated simulation of the model. Besides the size of the dataset, the parameters of the SV classification and SV regression are tuned to allow for a fair comparison between the marker combinations. With this and the existence of sophisticated SV machine toolboxes (e.g., LIBSVM [31]), the observed difference between marker combinations can be assumed to be due to the predictive power of the markers.

Summing up, SV machines allow for the derivation of predictors for qualitative and quantitative properties. These predictors can be used to assess the information content of a subset m of the parameters about the respective properties, thereby facilitating the assessment of a quantitative evaluation of the predictive power of θ_m. For further details about SV machines we refer to [18–20, 30, 31] and references therein.

3 Results

3.1 Model for heterogeneous cancer cell population

To illustrate the proposed visual analytics framework, a model of the proapoptotic signaling is analyzed. Proapoptotic signaling is involved in the process of apoptosis [32–34], also called programmed cell death. Apoptosis is an important physiological process to remove infected, malfunctioning, or no longer needed cells from a multicellular organism. The apoptotic signaling pathways converge at the caspase cascade [32], where initiator caspases (e.g., caspase 8) and effector caspases (e.g., caspase 3) are activated. If the activity of effector caspases exceeds a certain threshold, apoptosis is induced.

A variety of single cell and cell population models have been proposed to describe cellular apoptosis (see, e.g., [4–6, 34–40] and references therein). In this study, we consider the mathematical model of the signal transduction which is depicted in Figure 4. This single cell model [35] is among the most cited ones. For details about the model, we refer to the original publication [35]. As the process of apoptosis induction is known to be heterogeneous, we extend the single cell model [35] by accounting for cell-to-cell variability. This is achieved by introducing differences in parameter values and initial conditions:

From flow cytometric experiments, it is known that the amount of caspase 8 (C8), caspase 3 (C3), caspases 8- and 10-associated RING protein (CARP), and inhibitor of apoptosis protein (IAP) is different among individual cells. The differences are modeled by differences in synthesis rates (k_-8, k_-9, k_-10, and k_-12) among individual cells. The distribution of k_-8, k_-9, k_-10, and k_-12 within the population is modeled as log-normal distribution, with mean as published by Eissing et al. [35] and a coefficient of variation of 0.4 (own unpublished data). The initial conditions of C8, C3, CARP, and IAP are set to their steady state values.
Similar to the original publication [35], the activation of the caspase cascade is modeled by a non-zero initial condition of active caspase 8, C8a(0). In the population, C8a(0) is log-normally distributed with a median of 4,000 molecules per cells and a coefficient of variation of 0.4. The variation of C8a(0) accounts for variability up-stream of the caspase cascade.

The binding affinities and kinetic rates are the same for all cells. For the numerical values, we refer to the article of Eissing et al. [35].

Given this model of the heterogeneous cell population, we analyzed (i) how the decision whether or not a cell undergoes apoptosis during the first 12 hours and (ii) how the time of cell death T_d is influenced by the cell's parameters θ = [C8a(0), k_-8, k_-9, k_-10, k_-12]^T. This yields two variables of interest: δ (= +1 ⇒ cell survived; = -1 ⇒ cell died) providing the outcome of the decision process; and φ (= T_d ) providing the time of apoptosis commitment. As indicator for apoptosis, the amount of active caspase 3 (C3a) is used. If more than 5,000 copies of C3a are present in a cell, this cell is assumed to undergo apoptosis within 10 minutes, defining the time of cell death T_d . The functionals associated to the considered δ and φ are similar to (1) and (2), respectively. In the remainder, we search for a lower-dimensional subset of the parameters θ which provide good markers for cell death and survival as well as the time of cell death.

3.2 Parallel-coordinates plot establishes importance of C3 and IAP concentration for cell fate decision

To study the life-death-decision, a sample $S_{δ}$ with 100,000 members is visualized in parallel-coordinates (Figure 5). As only two classes (dead and alive) are considered, alpha blending can be used to visualize the density of each class as well as the density at the overlapping regions, where the transparent red color, representing dead cells, and the transparent blue color, representing living cells, are blended wit α = 0.03. Using this coloring, high-density regions appear more saturated for the individual classes and darker at their overlap.

From Figure 5, it is apparent that the second and fourth parameters (θ_m = [k_-8, k_-10]^T) provide a reasonable separation between the classes (red = dead, blue = alive). Most of the surviving cells have high values of k_-8 and low values of k_-10, which corresponds to a high IAP expression and a low C3 expression, respectively. Although the other parameters also influence the process, their influence seems to be minor.

3.3 SV classification proves that C3 and IAP expression are the best markers for the cell fate decision

Given the results of the visual analysis, we consider θ_m = k_-8, θ_m = k_-10, as well as θ_m = [k_-8, k_-10]^T and compute the classification quality using SV machines (for details see "Methods"). As can be seen in Figure 6A, the predictive power of the individual parameters is limited (θ_m = k_-8: TP = 0.73, FP = 0.38; θ_m = k_-10: TP = 0.74, FP = 0.29), while both markers together yield a reasonable classification performance (TP = 0.77, FP = 0.13). The corresponding ROC curve is depicted in Figure 6C and the visualization of TP and FP is provided in Figure 6D. For comparison, the alternative combinations of two markers are evaluated in terms of the area under the ROC curve (Table 1) and the TP/FP (Figure 6C).

Table 1 Area under the ROC curve for different marker combinations

Full size table

The markers θ_m = k_-8 and θ_m = k_-10 outperform all other single markers and marker pairs. In addition, the marker vector θ_m = [k_-8, k_-10]^T outperforms all other combinations in terms of the area under the ROC curve. Some other combinations result in more than 50% false positive classifications (see Figure 6B). Of course, extending the marker vector, e.g., by adding k_-12, results in further improvement.

3.4 Parallel-coordinates plots show a complex dependency of the time of death on the parameters

After the analysis of the decision process, we study the dependency of time of cell death T_d on the parameters. The time of cell death T_d is a quantitative property and can take any positive value, therefore an alternative visualization has to be used. One approach would be to use a different color for each line in parallel-coordinates, depending on $T_{d}^{(i)}$ . Unfortunately, this approach suffers from heavy overplotting, which is why the data was split into three classes and separate plots were created for each class.

Figure 7A-C visualize the parameter distribution in different percentile intervals for T_d . A comparison of Figure 6A, visualizing the cells that die early (0 to 10th percentile), and Figure 7C, depicting the cells that die late (90 to 100th percentile), unravels o sets in all parameter dimensions. The differences are particularly prominent for C8a(0), k_-10, and k_-12, showing that the abundance of C3 also plays an important role in determining whether cells die early or late. Unfortunately, a closer look at Figure 7 also reveals that the parameter distributions associated to cells that undergo apoptosis at early, intermediate, and late time points strongly overlap in parallel-coordinates. This indicates that T_d may depends on all parameters. Therefore, a reliable prediction of T_d using only a few parameters might be infeasible.

3.5 SV regression reveals ubiquitous importance of IAP an C3 expression levels

To quantify the predictive power of different marker combinations with respect to T_d , we employ the SV regression based approach introduced in "Methods". As a performance measure, the relative prediction error $|\frac{T_{d}^{(i)} - {\hat{T}}_{d}^{(i)}}{T_{d}^{(i)}}|$ , their ${\hat{T}}_{d}^{(i)}$ is the prediction of the SV machine. Details on the implementation may be found in "Methods".

At first, we study the potential combinations of two markers proposed by the parallel-coordinates plots: k_-10 and k_-12; C8a(0) and k_-10; and C8a(0) and k_-12. Out of those, the best performance with a median prediction error of 40% is achieved by C8a(0) and k_-12, which also outperforms all other combinations of two markers. Interestingly, all marker combinations achieve a median prediction error between 40 and 50%, as shown in Figure 8. This illustrates two things: On the one hand, markers allowing for a distinction between early and late dying cells do not necessarily enable a good prediction of the death time T_d , as here also the cells dying in an intermediate interval dominate the statistic. On the other hand, this quantification proves that even the best combination of two markers provides only very limited predictive power. Thus, unlike the decision which predominantly depends an C3 and IAP expression, the time of cell death is highly sensitive to changes in all parameters.

4 Conclusion

4.1 Visual analytics enable an in-depth analysis of complex population models

In this article, a novel explorative approach has been presented to determine markers for decision processes in heterogeneous populations. It has been shown that methods used for data analysis can also be employed to gain insight into complex models, where common analytical methods seem to reach their limits. Especially, the potential of parallel-coordinates plots and support vector machines has been illustrated. While the first allows for the study of large, high-dimensional datasets and the selection of potential markers, the latter can provide a quantitative assessment of their predictive power. Using both methods, the source of qualitative and quantitative cell-to-cell variability may be unraveled.

This article provides a case study evaluating the potential of combining visualization and automated methods for the assessment of complex system models. The considered system class is only one example and the proposed framework can be generalized easily to other systems and questions.

4.2 Analysis of heterogeneous cell population allows for novel insight

We have illustrated the proposed visual analytics approach by analyzing a cell population model for proapoptotic signaling, which plays an essential role in programmed cell death. We have studied the cell fate decision as well as the time of cell death. These properties were analyzed before (see, e.g., [5]) in a purely qualitative way and without the tools proposed in this work.

Our study shows that parallel-coordinates plots are a proper tools to determine potential markers. The predictive power of these markers can then be quantified using SV machines. In this study, the markers we found agree well with those found in the literature. In particular, the important role of IAP—also called XIAP—for cell death commitment is outlined in several publications [39, 41]. While C3 abundance is known to be important [39], our analysis suggests that the amount of available C3 could be even more important than expected.

In addition, our analysis indicates that, under normal conditions, the time of cell death strongly depends on all parameters, which has been hypothesized earlier [5]. Only under altered conditions, e.g., a strongly increased initial amount of C8a(0), some parameters become more important than others (results not shown). This is again in agreement with the results of Spencer et al. [5]. Furthermore, this finding of a varying importance of parameters depending on the experimental setup, provides hints for possible future experiments. Thus, our visual analytics approach we propose also provides helpful feedback for model validation and development.

4.3 Outlook

In this work, we have proposed a method to determine decision markers for given models. However, all model possess uncertainties, rendering an uncertainty-aware analysis crucial. Therefore, a workflow including model development, parameter estimation, uncertainty analysis, and marker prediction has to be established. This requires improved modeling and parameter estimation tools, as well as methods to evaluate the uncertainty of the marker prediction, arising from model uncertainties.

Given such a workflow, beyond the analysis of models, our analysis tools might also be used to guide the search for biomarkers. This is possible as our methods allows for the assessment of the importance of any parameters which are different among cells of the population. Among others, the importance of common biomarkes, e.g., expression levels and transcription factor/protein abundance, may be determined based on a model of the population. This is much in the same way as the target selection using sensitivity analysis of single cell models based on ordinary differential equations (see, e.g., [42]). However, the marker selection requires population models, as differences between cells have to be considered, and is therefore more challenging.

Methods

Software

The model of the heterogeneous cell population was implemented in MATLAB using the SBtoolbox2[43]. For the SV classification and the SV regression, the LIBSVM toolbox for MATLAB is employed [31]. The visualization software for parallel-coordinates was implemented in C++ using the Qt library Version 4.8.0 and OpenGL.

Numerics

For the SV classification and SV regression, we employed as kernels radial basis function with γ = 0.25. The SV regression parameter which defines the interval of insensitivity was set to 0.01. All remaining parameters are set to the default constants, see LIBSVM manual. To improve the performance of the SV machines, we applied a log-transformation to the parameters θ.

References

Avery S: Microbial cell individuality and the underlying sources of heterogeneity. Nat Rev Microbiol 2006, 4: 577-587. 10.1038/nrmicro1460
Article Google Scholar
Snijder B, Pelkmans L: Origins of regulated cell-to-cell variability. Nat Rev Mol Cell Biol 2011, 12(2):119-25. 10.1038/nrm3044
Article Google Scholar
Eldar A, Elowitz M: Functional roles for noise in genetic circuits. Nature 2010, 467(9):1-7. 10.1038/nj7319-1
Google Scholar
Albeck J, Burke J, Spencer S, Lauffenburger D, Sorger P: Modeling a snap-action, variable-delay switch controlling extrinsic cell death. PLoS Biol 2008, 6(12):2831-2852.
Article Google Scholar
Spencer S, Gaudet S, Albeck J, Burke J, Sorger P: Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature 2009, 459(7245):428-433. 10.1038/nature08012
Article Google Scholar
Niepel M, Spencer S, Sorger P: Non-genetic cell-to-cell variability and the consequences for pharmacology. Cur Opin Biotechnol 2009, 13(5-6):556-561.
Google Scholar
Singh D, Ku CJ, Wichaidit C, Steininger R, Wu L, Altschuler S: Patterns of basal signaling heterogeneity can distinguish cellular populations with different drug sensitivities. Mol Syst Biol 2010, 6(369):1-10.
Google Scholar
Paulsson J: Models of stochastic gene expression. Phys Life Rev 2005, 2(2):157-175. 10.1016/j.plrev.2005.03.003
Article Google Scholar
Glauche I, Moore K, Thielecke L, Horn K, Loeffler M, Roeder I: Stem cell proliferation and quiescence — two sides of the same coin. PLoS Comput Biol 2009, 5(7):e1000447. [http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000447] 10.1371/journal.pcbi.1000447
Article MathSciNet Google Scholar
Huh D, Paulsson J: Non-genetic heterogeneity from stochastic partitioning at cell division. Nat Gen 2011, 43(2):95-102. 10.1038/ng.729
Article Google Scholar
Glauche I, Thielecke L, Roeder I: Cellular aging leads to functional heterogeneity of hematopoietic stem cells: a modeling perspective. Aging Cell 2011, 10: 457-465. 10.1111/j.1474-9726.2011.00692.x
Article Google Scholar
Swain P, Elowitz M, Siggia E: Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci USA 2002, 99(20):12795-12800. 10.1073/pnas.162041399
Article Google Scholar
Hasenauer J, Waldherr S, Doszczak M, Radde N, Scheurich P, Allgöwer F: Identification of models of heterogeneous cell populations from population snapshot data. BMC Bioinf 2011, 12: 125. 10.1186/1471-2105-12-125
Article Google Scholar
Hasenauer J, Waldherr S, Doszczak M, Radde N, Scheurich P, Allgöwer F: Analysis of heterogeneous cell populations: a density-based modeling and identification framework. J Process Control 2011, 21(10):1417-1425. 10.1016/j.jprocont.2011.06.020
Article Google Scholar
Koeppl H, Zechner C, Ganguly A, Pelet S, Peter M: Accounting for extrinsic variability in the estimation of stochastic rate constants. Int J Robust Nonlinear Control 2012, 22(10):1-21.
Article MathSciNet Google Scholar
Guckenheimer J, Holmes P: Nonlinear Oscillations Dynamical Systems and Bifurcations of Vector Fields. In Appl Math Sci. Volume 42. Springer-Verlag, New York; 1983.
Google Scholar
Inselberg A, Dimsdale B: Parallel coordinates: A tool for visualizing multi-dimensional geometry. In Proc of IEEE Visualization. Edited by: Kaufman A. Los Alamitos, California, IEEE Computer Society Press; 1990:361-378.
Google Scholar
Vapnik V: The Nature of Statistical Learning Theory. Springer, New York; 1995.
Book MATH Google Scholar
Cristianini N, Shawe-Taylor J: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge; 2000.
Google Scholar
Ivanciuc O: Applications of Support Vector Machines in Chemistry. In Reviews in Computational Chemistry. Volume 23. Edited by: Lipkowitz KB, Cindari TR. Wiley-VCH, Weinheim; 2007.
Chapter Google Scholar
Hasenauer J, Heinrich J, Doszczak M, Scheurich P, Weiskopf D, Allgöwer F: Visualization methods and support vector machines as tools for determining markers in models of heterogeneous populations: Proapoptotic signaling as a case study. In Proc of Workshop Comp Syst Biol. Edited by: Koeppl H, Aćimović J, Kesselin J, Mäki-Marttunen T. Zürich, Switzerland; 2011:61-64. (TICSP series # 57)
Google Scholar
Novak B, Pataki Z, Ciliberto A, Tyson J: Mathematical model of the cell division cycle of fission yeast. Chaos 2001, 11: 277-286. 10.1063/1.1345725
Article MATH Google Scholar
Pan J, Chen RH: Spindle checkpoint regulates Cdc20p stability in Saccharomyces cerevisiae. Genes Dev 2004, 18: 1439-1451. 10.1101/gad.1184204
Article Google Scholar
Thomas J, Cook K: A visual analytics agenda. IEEE Comput Graph Appl 2006, 26: 10-13.
Article Google Scholar
Heinrich J, Weiskopf D: Continuous parallel coordinates. IEEE Trans Vis Comput Graph 2009, 15(6):1531-1538.
Article Google Scholar
Feng D, Kwock L, Lee Y, Taylor R: Matching visual saliency to confidence in plots of uncertain data. IEEE Trans Vis Comput Graph 2010, 16(6):980-989.
Article Google Scholar
Heinrich J, Bachthaler S, Weiskopf D: Progressive splatting of continuous scatterplots and parallel coordinates. Comput Graph Forum 2011, 30(3):653-662. 10.1111/j.1467-8659.2011.01914.x
Article Google Scholar
Zweig M, Campbell G: Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 1993, 39(8):561-577.
Google Scholar
Schölkopf B, Sung K, Burges C, Girosi F, Niyogi P, Poggio T, Vapnik V: Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans Signal Process 1997, 45: 2758-2765. 10.1109/78.650102
Article Google Scholar
Smola A, Schölkopf B: A tutorial on support vector regression. Stat Comp 2004, 14(3):199-222.
Article Google Scholar
Chang CC, Lin CJ: LIBSVM: A library for support vector machines. IEEE/ACM Trans Intell Syst Tech 2011, 2(3):1-27.
Article Google Scholar
Wajant H, Pfizenmaier K, Scheurich P: Tumor necrosis factor signaling. Cell Death Diff 2003, 10: 45-65. 10.1038/sj.cdd.4401189
Article Google Scholar
Gewirtz D, Holt S, Grant S (Eds): Cancer Drug Discovery and Development In Apoptosis, Senescence, and Cancer. 2nd edition. Humana Press, Totowa; 2007.
Chapter Google Scholar
Spencer S, Sorger P: Measuring and modeling apoptosis in single cells. Cell 2011, 144(6):926-939. 10.1016/j.cell.2011.03.002
Article Google Scholar
Eissing T, Conzelmann H, Gilles E, Allgöwer F, Bullinger E, Scheurich P: Bistability analyses of a caspase activation model for receptor-induced apoptosis. J Biol Chem 2004, 279(35):36892-36897. 10.1074/jbc.M404893200
Article Google Scholar
Albeck J, Burke J, Aldridge B, Zhang M, Lau enburger D, Sorger P: Quantitative analysis of pathways controlling extrinsic apoptosis in single cells. Mol Cell 2008, 30: 11-25. 10.1016/j.molcel.2008.02.012
Article Google Scholar
Eissing T, Chaves M, Allgöwer F: Live and let die — a systems biology view on cell death. Comput Chem Eng 2009, 33(3):583-589. 10.1016/j.compchemeng.2008.10.014
Article Google Scholar
Schlatter R, Schmich K, Vizcarra I, Scheurich P, Sauter T, Borner C, Ederer M, Merfort I, Sawodny O: ON/OFF and beyond — a boolean model of apoptosis. PLoS Comput Biol 2009, 5(12):1-13.
Article Google Scholar
Rehm M, Huber H, Dussmann H, Prehn J: Systems analysis of effector caspase activation and its control by X-linked inhibitor of apoptosis protein. EMBO J 2006, 25(18):4338-4349. 10.1038/sj.emboj.7601295
Article Google Scholar
Würstle M, Laussmann M, Rehm M: The caspase-8 dimerisation/dissociation balance is a highly potent regulator of caspase-8, -3, -6 signalling. J Biol Chem 2010, 285(43):33209-33218. 10.1074/jbc.M110.113860
Article Google Scholar
Jost P, Grabow S, Gray D, McKenzie M, Nachbur U, Huang D, Bouillet P, Thomas H, Borner C, Silke J, Strasser A, Kaufmann T: XIAP discriminates between type I and type II FAS-induced apoptosis. Nature 2009, 460(7258):1035-1039. 10.1038/nature08229
Article Google Scholar
Schöberl B, Pace E, Fitzgerald J, Harms B, Xu L, Nie L, Linggi B, Kalra A, Paragas V, Bukhalid R, Grantcharova V, Kohli N, West K, Leszczyniecka M, Feldhaus M, Kudla A, Nielsen U: Therapeutically targeting ErbB3: a key node in ligand-induced activation of the ErbB receptor-PI3K axis. Sci Signal 2009, 2(77):ra31. 10.1126/scisignal.2000352
Google Scholar
Schmidt H, Jirstrand M: Systems biology toolbox for MATLAB: a computational platform for research in systems biology. Bioinf 2006, 22(4):514-515. 10.1093/bioinformatics/bti799
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge financial support from the German Research Foundation within the Cluster of Excellence in Simulation Technology (EXC 310/1) at the University of Stuttgart, from the German Federal Ministry of Education and Research (BMBF) within the FORSYS-Partner program (grant nr. 0315-280A and D), and from Center Systems Biology at the University of Stuttgart.

Author information

Authors and Affiliations

Institute for Systems Theory and Automatic Control, University of Stuttgart, Pfaffenwaldring 9, 70569, Stuttgart, Germany
Jan Hasenauer & Frank Allgöwer
Visualization Research Center, University of Stuttgart, Allmandring 19, 70569, Stuttgart, Germany
Julian Heinrich & Daniel Weiskopf
Institute of Cell Biology and Immunology, University of Stuttgart, Allmandring 31, 70569, Stuttgart, Germany
Malgorzata Doszczak & Peter Scheurich

Authors

Jan Hasenauer
View author publications
You can also search for this author in PubMed Google Scholar
Julian Heinrich
View author publications
You can also search for this author in PubMed Google Scholar
Malgorzata Doszczak
View author publications
You can also search for this author in PubMed Google Scholar
Peter Scheurich
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Weiskopf
View author publications
You can also search for this author in PubMed Google Scholar
Frank Allgöwer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Hasenauer.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Hasenauer, J., Heinrich, J., Doszczak, M. et al. A visual analytics approach for models of heterogeneous cell populations. J Bioinform Sys Biology 2012, 4 (2012). https://doi.org/10.1186/1687-4153-2012-4

Download citation

Received: 08 January 2012
Accepted: 31 May 2012
Published: 31 May 2012
DOI: https://doi.org/10.1186/1687-4153-2012-4

A visual analytics approach for models of heterogeneous cell populations

Abstract

1 Introduction

2 Methodology

2.1 Models for heterogeneous cell populations and decision processes

2.1.1 Mechanistic population model

2.1.2 Qualitative and quantitative properties of the single cell response

2.1.3 Response markers

2.2 Analysis of population models using data analysis tools

2.2.1 Combining parallel-coordinates plots and SV machines to a visual analytics system

2.3 Parallel-coordinates for the analysis of high-dimensional data

2.4 SV machines for the quantification of marker performance

2.4.1 SV classification

2.4.2 SV regression

3 Results

3.1 Model for heterogeneous cancer cell population

3.2 Parallel-coordinates plot establishes importance of C3 and IAP concentration for cell fate decision

3.3 SV classification proves that C3 and IAP expression are the best markers for the cell fate decision

3.4 Parallel-coordinates plots show a complex dependency of the time of death on the parameters

3.5 SV regression reveals ubiquitous importance of IAP an C3 expression levels

4 Conclusion

4.1 Visual analytics enable an in-depth analysis of complex population models

4.2 Analysis of heterogeneous cell population allows for novel insight

4.3 Outlook

Methods

Software

Numerics

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords