COARSE BUT EFFICIENT IDENTIFICATION OF METABOLIC PATHWAY SYSTEMS

IWATA M.1, SHIRAISHI F.2, VOIT E.O.3*
1Section of Bio-process design, Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, 6-10-1, Hakozaki, Higashi-Ku, Fukuoka 820-8581, Japan.
2Section of Bio-process design, Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, 6-10-1, Hakozaki, Higashi-Ku, Fukuoka 820-8581, Japan.
3The Wallace H. Coulter, Department of Biomedical Engineering at Georgia Institute of Technology and Emory University, 313 Ferst Drive, Suite 4103, Atlanta, GA 30332-0535, U.S.A.
* Corresponding Author : eberhard.voit@bme.gatech.edu

Received : 05-08-2013     Accepted : 03-09-2013     Published : 10-09-2013
Volume : 4     Issue : 1       Pages : 57 - 72
Int J Syst Biol 4.1 (2013):57-72

Conflict of Interest : None declared

Cite - MLA : IWATA M., et al "COARSE BUT EFFICIENT IDENTIFICATION OF METABOLIC PATHWAY SYSTEMS." International Journal of Systems Biology 4.1 (2013):57-72.

Cite - APA : IWATA M., SHIRAISHI F., VOIT E.O. (2013). COARSE BUT EFFICIENT IDENTIFICATION OF METABOLIC PATHWAY SYSTEMS. International Journal of Systems Biology, 4 (1), 57-72.

Cite - Chicago : IWATA M., SHIRAISHI F., and VOIT E.O. "COARSE BUT EFFICIENT IDENTIFICATION OF METABOLIC PATHWAY SYSTEMS." International Journal of Systems Biology 4, no. 1 (2013):57-72.

Copyright : © 2013, IWATA M., et al, Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

The rapid development of high-throughput experiments in molecular biology over the past decade has facilitated the generation of metabolic datasets of unprecedented quantity and quality. These datasets contain enormous information that must be extracted with computational methods. A premier goal of such an extraction is the construction of fully parameterized kinetic models of metabolic pathway systems, which may subsequently be utilized for deepening our understanding of metabolism and for the design of manipulation and optimization strategies toward the production of valuable organics. Numerous methods have been proposed for the conversion of metabolic data into such models, but the nonlinear nature of regulated pathway systems continues to be a challenge for the estimation of reliable model parameters. To alleviate the situation, the present work proposes a coarse yet efficient technique for the estimation of parameter values that is based on linear regression and therefore naturally scales to large systems. The proposed method permits the complete construction of a coarse model, the identification of a starting point for fine-tuned nonlinear estimation, or the characterization of a few fluxes that can subsequently be used for performing Dynamic Flux Estimation (DFE), a method that yields insights into the true nature of the processes that govern metabolic pathway systems. The proposed linear inference method is demonstrated with a regulated branched pathway and with a metabolic system from the literature that describes the biosynthesis of amino acids from aspartate. Both analyses indicate that the method yields satisfactory results.

Keywords

Biochemical Systems Theory (BST), Dynamic Flux Estimation (DFE), Metabolic pathway analysis, Parameter estimation, S-system, Time course data.

Introduction

The performance of high-throughput analytical instruments has been enhanced so much in recent years that it is becoming feasible to measure comprehensive time course data of metabolite concentrations, in some cases even in living cells [1-5] . The resulting data are very valuable, as they portray the collective functionality of cellular metabolism in unprecedented detail. At the same time, the data in themselves only provide limited insight and mandate the development of corresponding theoretical advances and computer-aided techniques that extract as much information from the metabolic time series as possible. The reward for such developments is substantial: if valid mathematical models can be constructed from experimental time series data, it may become possible to explore with computer simulations of metabolic pathway systems what is actually happening in cells, for instance, in response to some environmental demand or stress [6-12] .
The true format of optimal mathematical representations of metabolism is unknown. This fact poses a great challenge, because about any parameter estimation or simulation technique assumes some functional form within which the parameters are optimally adjusted or the dynamics of a system is computed. Dynamic Flux Estimation (DFE) [13] is an attempt to characterize processes within biological systems with minimal prior knowledge. While intriguing in principle, DFE requires a complete set of time series data and, more limiting, systems that contain as many independent fluxes as system variables. This situation is seldom given, thus requiring the augmentation of DFE with additional a priori knowledge or assumptions regarding a subset of fluxes.
Power-law equations within the modeling framework of Biochemical Systems Theory (BST) [14-20] have proven to be useful defaults for the construction of models for metabolic reaction networks. Three reasons for their utility are that BST models are based on rigorous approximation theory, that they can easily be constructed directly from the topology of a metabolic map, and that each parameter in these models has a uniquely defined meaning and interpretation, which facilitates the moving back and forth between biochemistry and mathematics. Whereas it is easy to set up a BST model for a pathway system in a symbolic format, the identification of optimal parameter values continues to be a significant challenge. As a consequence, numerous direct and indirect optimization methods for BST models have been proposed in recent years; examples include [21-27] . None of them, however, is effective in all practical cases.
One partial strategy for system identification that has turned out to be quite effective is the estimation of slopes from time series data. The advantage of this step is that it converts the estimation of differential equations into an estimation task exclusively involving systems of algebraic equations. As an added benefit, this strategy permits the decoupling of differential equation models into several smaller estimation tasks that may be executed in parallel or sequentially [28-30] .
In spite of numerous advances in recent years, the estimation of kinetic parameters, even for moderately sized pathway systems, is still difficult because the task is nonlinear, measured time course data always contain noise, and the most appropriate functions to represent the data are simply not known. To alleviate this problem, we propose here a coarse yet efficient estimation of parameter values that is based on linear regression and may be considered a simplification of an earlier method [21,31] . The method permits either the full construction of an initial model, under the assumption that power-law functions are reasonable approximations, or the characterization of a few fluxes that can subsequently be used to execute DFE. The latter case minimizes the assumption that the power-law form is adequate. In the following we first introduce the concepts and details of the coarse estimation technique and then evaluate this technique against data created with an artificial small system and by analyzing a metabolic pathway described in the literature. In both cases, the method, with its different variations, yields satisfactory fits.

Methods

S-system Differential Equations

BST models come in two main varieties. S-system models consist of differential equations that express the dynamics of dependent valuables (metabolite concentrations) in the following format:

(1)

Here, the dependent variables Xi (i = 1, ..., n) represent metabolite pools or concentrations, and m additional variables may be included as factors or modulators that remain unaffected by the action of the system; t is time. All influxes into each metabolite pool are collectively formulated as a single power-law term, and the analogous is done for all effluxes. The parameters αi and βi (i = 1, ..., n) are aggregate rate constants of the influx and efflux terms, respectively, while gij and hij (i = 1, ..., n; j = 1, …, n + m) are kinetic orders that reflect the strength and directionality of the effect a variable has on a given influx or efflux.
In the related Generalized Mass Action (GMA) system, each influx and each efflux is represented with its own power-law term, so that each equation may contain several positive and several negative terms. The format with analogous parameter names is therefore

(2)

where Ti is the number of terms in the ith equation. GMA and S-systems have been discussed extensively in the literature. A recent comprehensive review is [20] .

Slope-estimation and Decoupling

It is useful to plot each set of the measured metabolite concentrations Xi(tk) at the time points tk (k = 1, 2, ..., K) and to estimate the slopes of the trend lines at these time points. Each slope Si(tk) (k = 1, 2, ..., K) corresponds directly to the derivative dXi/dt at time tk, so that the two may be substituted for parameter estimation purposes. As a result, each differential equation may be represented by a set of K algebraic equations. For instance, in the case of an S-system model, the ith equation can be substituted point-wise with the following algebraic equations:











(3)

[28-30] . Since experimental data usually contain noise, it is beneficial to smooth the time series data. Numerous methods are available for this purpose [32-37] . Once the data are smoothed, values of Xi(tk) and Si(tk) (k = 1, 2, ..., K) may be obtained from the time series not only at the time points corresponding to actual measurements, but at essentially arbitrary time points, so that more instances are available to construct the set of algebraic equations. Subsequently, these equations are used to estimate the kinetic parameters αi, gij, βi and hij. It is noted that this technique automatically decouples the differential equations, and that the parameter estimation maybe performed for one differential equation at a time or in parallel. In the following, we will merge independent variables, which are typically constant, with the corresponding rate constants and therefore let the products in the equations run to n rather than n+m. This decision does not affect the generality of our methods and results.

Parameter Estimation Through Approximation and Linear Regression

In an S-system model, the efflux from a metabolite pool at each time point tk is given by

(4)

where Xjk = Xj(tk). Once the slopes Sik = Si(tk) are estimated, the corresponding equation in (3) may equivalently be written as

(5)

Suppose that the efflux in [Eq-4] only depends on the metabolite Xi itself, which is a relatively frequent occurrence. Then, [Eq-4] simplifies to [Eq-6] :

(6)

The value of the kinetic order hii typically falls within a small range between 0 and 1 [18,38,39] , while βi is unconstrained, except that it must be non-negative. It is known that there is a certain degree of compensation between hii and βi, so that two power-law functions, for instance, one with hii = 0.4 and one with hii = 0.5, can be made very similar by an adjustment of βi, as long as the range for Xi is relatively small. Thus, a relatively coarse grid of hii and βi covers most essential power-law behaviors, as long as very high accuracy is not required and the variable Xi does not vary too much.
For any combination of assumed grid values for hii and βi and for observed concentration values Xik, the terms V-ik and Sik in [Eq-5] and [Eq-6] are real numbers. Thus, substituting these numbers in [Eq-5] and taking logarithms of both sides permits the application of linear regression with the residual function

(7)

The result of the regression consists of optimized parameter values αi and gij, for each set of grid values for hii and βi.
If the degradation term contains variables other than Xi itself, [Eq-4] is used as opposed to [Eq-6] , and the grid needs to be extended by one or more dimensions for the linear regression. If the procedure is used as a preliminary step for DFE, only a few fluxes have to be estimated and one may choose efflux terms with only one or two dependent variables. We will demonstrate this strategy later.

Estimation Strategies for Pathway Systems

The parameter determination may follow alternative procedures. It may be accomplished backwards, forwards, with or without constraints, or as a preliminary step for DFE. Each variant is presented separately. It is always assumed that the concentration trend lines had been smoothed and that slopes had been obtained. In the following, the methods are generally demonstrated for one equation at a time.

Backward Estimation

Considering that the degradation term, but not necessarily the production term, for a metabolite pool almost always contains the variable itself, it makes sense to estimate pathways in a backward direction. Specifically, the following steps are used for each variable Xi.
Determine a suitable grid for βi and hii, such as 0.1, 0.5, 1, 5, 10, 50, 100, 500 and 1000 for βi and 0.2, 0.4, 0.6, 0.8 and 1 for hii.
For each combination of grid points, perform linear regression according to [Eq-7] . The result consists of parameter sets for αi and gij.
For each set of parameter values, compute the sum of squared errors

(8)

between the actual data and the corresponding estimates and select the smallest SSE for the set of best parameter estimates.
If the degradation term contains variables other than Xi itself, the grid in Step 1 is extended by one or more dimensions.

Forward Estimation

Instead of using a grid for βi and hii, one may use a grid for αi and gij. The forward estimation strategy is preferable when information is available about influxes to the system. For instance, it may be possible to measure substrate uptake. The result of the forward estimation consists of parameter sets for βi and hij for every combination of grid values for αi and gij. Typically, the set with the smallest SSE is selected.

Constrained Estimation

So far, the parameters of each equation were computed as if all equations were independent of each other. In reality, systems are much more constrained. For instance, the efflux out of X2 in [Fig-1] is identical to the influx into X3. Thus, one possible strategy of constrained estimation is to choose grid points β3 and h33, estimate the corresponding α and g parameters for the influx of X3, select the best set and use this set as estimates for the parameters in the efflux of X2. The influx is then estimated as before. The same strategy is used for X4. The collective efflux of X1 is identical to the sum of the influxes to X2 and X4. Thus, the estimates should satisfy the equation

(9)

Due to the mathematical features of power-law functions, this equation can, in general, only be satisfied at some chosen operating point or for trivial situations, where g21 = g41 or where one of the fluxes on the right is zero. At a chosen operating point, the parameters for the efflux of X1 are related to those of the two influxes by the equation

(10)

where all variables are evaluated at the operating point.

Estimation of Subsets of Fluxes for DFE

DFE consists of two phases. The first phase represents metabolic pathways as flux systems of the type

(11)

where the σ quantities are stoichiometric coefficients, which are typically known [40-42] . Given slope estimates at K time points, as before, the equations in [Eq-11] constitute one set of n linear equations in the fluxes vji and vij at each of the K time points. If the number of non-zero fluxes equals n, each set of equations can be inverted and the result is a numerical value of each flux at each time point. In a second phase of DFE, one attempts to find mathematical representations for these numerical flux estimates.
In most realistic cases, the number of fluxes exceeds the number of dependent variables and the system in [Eq-11] cannot be inverted, unless independent information about some of the fluxes is obtained. Some strategies for obtaining such information have been proposed [43,44] , but they are not always sufficient. Here, the proposed coarse estimation, in forward or backward direction, may be used to obtain such fluxes. Thus, if the system contains eight dependent variables and 12 fluxes, four fluxes typically need to be estimated. The choice of these four is not entirely arbitrary, but this step usually permits several degrees of freedom [45] . As the method proposed here assumes validity of the power-law representation, it is a priori beneficial to select variables for estimation that vary only modestly within the given data set, as this limited range of variation increases the likelihood that the power-law representation is adequate.

Metabolic Reaction Networks for Illustration of the Method

Generic Branched Pathway Model of Michaelis-Menten Type

The first pathway for illustration is the metabolic reaction network shown in [Fig-1] . In previous work, this system was used to demonstrate estimation techniques for BST systems [24,26,27,30] . Here, we model the same pathway with Michaelis-Menten-type rate laws. The advantage of this artificial system is that we have complete knowledge of all of its features. Thus, we can create “data” with this model, subsequently pretend not to know the model parameters or even the mathematical structure of the equations, and ultimately gauge how well our inference method works and where it fails.
The specific model, with typical parameters, is

(12)

(13)

(14)

(15)

The steady state of the system is approximately = 1.050, = 0.420, = 13.96 and = 1.050. As initial values we choose, more or less arbitrarily, X10 = 1.4, X20 = 2.7, X30 = 1.2, X40 = 0.4.

Biosynthesis of Aspartate-derived Amino Acids

The metabolic reaction network for this example is shown in [Fig-2] . A model of the pathway was originally presented with traditional kinetic rate functions for seven dependent variables, namely X1 = [ AspP ], X2 = [ ASA ], X3 = [ Lys ], X4 = [ Hser ], X5 = [ PHser ], X6 = [ Thr ] and X7 = [ Ile ] [46] . Here, we additionally consider X8 = [ Threonyl-tRNA ]. The flux equations are

(16)

(17)

(18)

(19)

(20)

(21)

(22)

(23)

The kinetic formulations of these fluxes are taken directly from [46] ; they are reproduced in the Additional Results. With these settings, the steady-state values for the first seven variables are approximately = 0.3440, = 0.9645, = 69.38, = 0.9368, = 45.27, = 302.4, and = 59.25. X8 continues to accumulate in this model and has no steady state. As initial values, we start the system either 20% above the steady state or with values that correspond to 20% of the steady state.

Computational Environment

All calculations were performed with Microsoft Visual Studio 2008 on Windows 7, with a computer with Intel® Core™ i7 CPU and mounted 4.00 GB memory.

Results

Performance Evaluations

The performance of the proposed method is first evaluated with the generic branched pathway model consisting of Michaelis-Menten processes [Fig-1] , [Eq-12] to [Eq-15] .

Approximate S-system Parameters at an Operating Point

In order to assess how well the power-law approximation might perform, we first approximate the original equations with an S-system. As the operating point, we choose the values X1, op = 1.7, X2, op = 1.5, X3, op = 5.0 and X4, op = 1.3, which are more or less in the center of the range of variation for each variable. The parameter values of the approximate S-system are shown in [Table-1] . Although its functions are different, the S-system approximates the original Michaelis-Menten model overall quite well [Fig-S1] .

Unconstrained Backward Estimation

Using backward estimation for various grid point combinations of β and h parameters, we computed linear regressions and SSE values, as well as calculation times. The results for the best β - h combinations are shown in [Table-2] a and [Fig-3] . Although a relatively coarse grid was used, the computed dynamics fits the artificial data rather well. Other β - h combinations lead to much inferior fits [Fig-S2] .
We also created data from the same system and superimposed them with 10% uniformly distributed random noise. While the concentration values thus deviate from smooth trend lines, we assume that appropriate smoothing of these noisy data would lead to correct slopes Sik. [Fig-4] shows results from 300 estimations of αi, gij and βi in pseudo-three-dimension plots and the most illustrative projections. Red circles mark the values shown in [Table-2] a. The results are not connected and do not form clouds, as one encounters them frequently, due to the discrete choices of grid points. As has been observed in other contexts [47] , the correlations between α and β parameters for the different results are more or less linear, while correlations between rate constants and kinetic orders are nonlinear and more or less logarithmic.

Constrained Backward Estimation

While simulations with the parameters estimated so far fit the given data quite well, an adequate model should satisfy the constraints that derive from the topology of the metabolic reaction network. For instance, the production of X3 in [Fig-1] constitutes the same process as the degradation of X2, which mandates

(24)

The branched pathway imposes an additional constraint which, outside trivial special cases, can only be satisfied at an operating point:

(25)

(cf. [Eq-9] and [Eq-10] ). To satisfy these constraints during the backward estimation, kinetic parameters for X3 and X4 are determined first, and then the estimation moves upstream the metabolic reaction network. [Table-2b] indicates that the resulting minimum SSE values are slightly higher than before. Nonetheless, simulations with the best parameter values yield good results [Fig-5] .

Forward Estimation

As described in the Methods section, the estimation may also proceed in forward direction. Without constraints, optimal kinetic parameters, SSEs and calculation times are shown in [Table-2] c. All SSE values indicate satisfactory data fits, which are confirmed in [Fig-6] . As before, we also estimated the parameters from noisy data. The results are shown in [Fig-S3] .
Because the pathway is branched, the forward computation of constrained parameter values allows some flexibility, because either α2 and g21 or α4 and g41 may be estimated first from the branch point constraint in [Eq-25] . If α2 and g21 are estimated first, then α4 and g41 are calculated using [Eq-26] and [Eq-27] , respectively:

(26)

(27)

In turn, if α4 and g41 are estimated first, then α2 and g21 are determined from the corresponding constraints. Once influx parameters are determined, forward estimation for X2, X3 and X4 is performed and efflux parameters with the lowest SSE values are retained [Table-2] d and [Table-2] e. While the branch point offers some flexibility, not all choices are feasible, because in some cases the constraints (26) and (27) can no longer be satisfied for all values of the dependent variables. As an example, some choices of a4 and g41 lead to infeasibility for the efflux of X3. In other cases, the best-fitting kinetic order might be negative, which is inconsistent with the structure of the pathway.

Biosynthesis of Aspartate-derived Amino Acids

To evaluate the scalability of the proposed method, we applied the method to a larger metabolic reaction network model from the literature. Curien et al. [46] constructed a dynamic model for the biosynthesis of aspartate-derived amino acids from kinetic measurements in vitro. We pretend here that simulation results with their model are measured experimental time series data, and it is our goal to infer an adequate mathematical model from these “data.” In particular, the analysis allows us to compare the inferred fluxes with those used by the authors to construct their model from the bottom up. In contrast to the original article, we merge isozyme catalyzed reactions into single processes.

Analysis I: Initial Values are Set as 120% of the Steady State Values

The first set of “data” corresponds to a simulation with the published model, which starts with initial values that are set as 120% of the steady-state values. Such a deviation is relatively small, which has advantages and disadvantages. On the positive side, the power-law approximation is possibly good enough to capture the dynamics of the system. On the negative side, the data are not very informative. As with the earlier example, we first estimate parameters for unconstrained equations, using the backward strategy. [Table-3] a shows the resulting kinetic parameters and corresponding SSE values. [Fig-S4] compares trend lines calculated by integration of the eight S-system equations with those from the original equations. Some of the results are fairly good, but X6 is mis-estimated, which through feedbacks affects all other variables after a while. [Fig-S5] shows corresponding estimations for data with noise. The corresponding forward estimation without constraints also yields good results [Table-3] c and [Fig-S6] .
More relevant is an estimation that accounts for constraints in the pathway system. The results of a constrained backward inference are shown in [Table-3] b and [Fig-7] ; a corresponding figure with a magnified initial range is presented as [Fig-S7] . Even though the power-law terms are quite different from the functions used to construct the original model, the results are surprisingly good, especially in the initial time range.

Analysis II: Initial Values are Set as 20% of Steady State Values

In contrast to the previous analysis, the deviation from the steady state is now considerable. The estimation method itself is not affected by this strong deviation, but it is more likely that the power-law approximation might exceed the range of valid representation. Nonetheless, the results of an unconstrained forward estimation are actually surprisingly good, except for variable X8, which is not subject to degradation and only accumulates material and with it all inference errors [Table-4] and [Fig-8] . In an unconstrained backward estimation, X8 is modeled very well, but the other variables show inferior fits, thus indicating that the power-law formulation leaves its range of valid representation [Table-S1] , [Fig-S8] and [Fig-S9] .

Analysis III: Estimation of a Minimum Set of Fluxes to Complement DFE

The pathway contains eight dependent variables and ten fluxes. Thus, we need to estimate two independent fluxes. Furthermore, characterizability analysis of the pathway system reveals that even perfect, complete time series data do not allow a full characterization of flux profiles, because the same time series data could be associated with different flux profiles. In particular, it is possible that pairs of fluxes, such as Vi(tk) and V-i(tk), permit a shift by an unknown constant c, which disappears in the differential equation, because Vi(tk) + c – (V-i(tk) + c) = Vi(tk) – V-i(tk). One should note that this possible shift is not a result of the slope-estimation-decoupling strategy or our coarse inference method. It constitutes a genuine estimation challenge for underdetermined flux systems, because the “true” differential equations generate results that are exactly equivalent to those from other differential equations with shifted fluxes.
As a first example, consider “data” obtained with initial values at 120% of the steady state. We use two fluxes, estimated from the earlier constrained backward estimation and use DFE to infer the remaining fluxes. For Case 1, we use the fluxes vASADH and vHSDH and for Case 2, we use vDHDPS and vHSK [Fig-9] .
The main results are shown in [Fig-10] . The green and blue lines are inferred with DFE for Cases 1 and 2, while the red lines correspond to the fluxes obtained from the coarse, constrained backward estimation and the black lines correspond to the fluxes in Curien’s original model. Three generic features emerge. First, the inferred DFE fluxes by and large have the correct shape; i.e., they are very similar to the fluxes of Curien’s model, with occasional exceptions at the very beginning or end of the plots. Second, most of the fluxes exhibit some shift, indicating a degree of freedom and non-unique-ness that is predicted by characterizability analysis [45] . These shifts are not entirely arbitrary though: for instance, the shift in vAK is the same as the shift in vASADH, as both fluxes are associated with X1. Third, the original fluxes and the backward-inferred fluxes have slightly different shapes, even though the metabolite trends [Fig-7] are overall well modeled. These differences reflect the fact that relatively complex kinetic rate functions were approximated with rather simple power-law functions. The situation is similar for “data” that are initialized at 20% of the steady-state values [Fig-S10] .

Discussion and Conclusions

The estimation of parameter values for nonlinear models of biological systems is a challenging task. Even harder is the identification of functional forms for the processes governing biological systems. We have shown here that both tasks can be aided by a relatively coarse, but very simple, estimation strategy that exclusively uses linear regression and is therefore easily scalable. The strategy may be used in different ways. First, one may obtain a complete power-law model of a pathway by assuming a relatively coarse grid of parameter values characterizing some fluxes in the system. The result is a fully parameterized model that may be used for further analyses. The model parameters thus inferred may also be used as start values for launching a more refined parameter estimation based directly on the nonlinear differential equations. Assuming that the coarsely inferred parameter values are acceptable, this nonlinear estimation would start in-or close to-a basin of attraction of the true solution, thereby circumventing many issues with nonlinear optimization methods. The coarsely parameterized model may also be used to augment DFE. In this case, only a few fluxes are retained from the coarse estimation and all other fluxes are computed with methods of DFE. If the retained fluxes correspond to variables that do not vary very much within the given datasets, one might assume that the power-law representation is relatively accurate. As a consequence, this strategy makes the fewest assumptions and yields results in an almost unbiased fashion. A remaining issue is that the topology of the pathway system may allow one or more degrees of freedom within the flux profile, which cannot be resolved with DFE method, unless additional, independent information is available.

Additional Results

Branched Pathway

To evaluate the performance of the proposed method, a generic, branched pathway model consisting of Michaelis-Menten processes [Fig-1] ; [Eq-12] to [Eq-15] was analyzed. The original equations were approximated with an S-system model at an operating point chosen within the range of the variables, as described in the Text. [Table-1] shows the parameter values of the approximate S-system. The model approximates the original reasonably well [Fig-S1] .
For the unconstrained backward estimation strategy, we chose various grid point combinations of β and h parameters. The dynamic results for the best β - h combination are shown in [Table-2] a and [Fig-3] ; results for other combinations are shown in [Fig-S2] . The dynamic representations in these cases are much inferior.
Parameter values were also estimated for the unconstrained forward estimation strategy using data with 10% uniformly distributed random noise. [Fig-S3] shows the distributions of inferred kinetic parameter values.

Biosynthesis of Aspartate-Derived Amino Acids

To evaluate the scalability of the proposed method, the method was applied to a larger metabolic reaction network model from the literature [Fig-2] ; [Eq-16] to [Eq-23] ; [46] , as described in the Text. The kinetic representations, as proposed by Curien et al. are shown below; note that our analysis combines isozyme-catalyzed reactions.







































It should be noted that the following flux equations are used in [Eq-21] and [Eq-22] . The values are also used in [Fig-10] .



Analysis I: Initial Values are Set as 120% of the Steady State Values

The “data” for this inference study were generated with initial values set as 120% of the steady-state values. The inferred kinetic parameter values and corresponding SSEs for the unconstrained backward estimation strategy are shown in [Table-3] a. [Fig-S4] compares trend lines calculated by integration of the eight S-system equations with those from the original equations. [Fig-S5] and [Fig-S6] show corresponding estimations for data with noise and the results of an unconstrained forward estimation, respectively. [Fig-S7] exhibits a magnified initial range for the plots shown in [Fig-7] of the Text.

Analysis II: Initial Values are Set as 20% of the Steady State Values

The “data” for this simulation were generated with initial values set as 20% of the steady-state values. The inferred kinetic parameter values and corresponding SSEs of the unconstrained backward estimation strategy are shown in [Table-S1] . [Fig-S8] compares trend lines calculated by integration of the eight S-system equations with those from the original equations. [Fig-S9] shows corresponding estimations for data with noise.

Analysis III: Estimation of a Minimum Set of Fluxes to Complement DFE

The pathway contains eight dependent variables and ten fluxes. Thus, we need to estimate two independent fluxes, which we select as described in the Text [Fig-9] .
The case where “data” were obtained with initial values at 120% of the steady state is shown in the Text. [Fig-S10] shows the corresponding results for “data” that are initialized at 20% of the steady state values in comparison to results from the S-system model with kinetic parameters inferred through constrained backward estimation. One notes that vASADH, which was used for DFE in Analysis I, cannot be calculated using this result. [Fig-S10] , therefore, does not show the corresponding green plot lines. Similarly, red plot lines cannot be computed for vAK and vASADH. By contrast, vDHDPS and vHSK, which are now available, are shown as blue lines.
[Table-S2] presents numerical results. The parameter values do not necessarily satisfy all constraints, because some parameter values cannot be obtained, as the corresponding fluxes are obtained with DFE.

Acknowledgements

This work was supported in part by a research fellow grant from the Japan Society for the Promotion of Science (JSPS; to M.I.), a JSPS KAKENHI Grant #25119719 (PI: F.S.), grants from the U.S. National Science Foundation (Projects MCB-0946595 and ARRA-0928196 (PI: EOV)) and a grant from the BioEnergy Science Center (BESC; PI: Paul Gilna), a U.S. Department of Energy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science. The funding agencies are not responsible for the content of this article.

References

[1] Hirai M.Y., Yano M., Goodenowe D.B., Kanaya S., Kimura T., Awazuhara M., Arita M., Fujiwara T. and Saito K. (2004) Proceedings of the National Academy of Sciences, 101(27), 10205-10210.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[2] Hirai M.Y., Klein M., Fujikawa Y., Yano M., Goodenowe D.B., Yamazaki Y., Kanaya S., Nakamura Y., Kitayama M., Suzuki H., Sakurai N., Shibata D., Tokuhisa J., Reichelt M., Gershenzon J., Papenbrock J. and Saito K. (2005) Journal of Biological Chemistry, 280(27), 22590-22595.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[3] Sawada Y., Akiyama K., Sakata A., Kuwahara A., Otsuki H., Sakurai T., Saito K. and Hirai M.Y. (2009) Plant and Cell Physiology, 50(1), 37-47.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[4] Neves A.R., Ramos A., Nunes M.C., Kleerebezem M., Hugenholtz J., de Vos W.M., Almeida J. and Santos H. (1999) Biotechnology and Bioengineering, 64(2), 200-212.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[5] Neves R., Pool W.A., Kok J., Kuipers O.P. and Santos H. (2005) FEMS Reviews Microbiology, 29, 531-554.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[6] Fonseca L.L., Sanchez C., Santos H. and Voit E.O. (2011) Molecullar BioSystems, 7(3), 731-741.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[7] Fonseca L.L., Chen P.W. and Voit E.O. (2012) Metabolites, 2, 221-241.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[8] Sriyudthsak K. and Shiraishi F. (2010) Journal of Biotechnology, 149(3), 191-200.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[9] Sriyudthsak K. and Shiraishi F. (2010) Mathematical Biosciences, 228(1), 1-9.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[10] Sriyudthsak K. and Shiraishi F. (2010) Industrial & Engineering Chemistry Research, 49(20), 9738-9742.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[11] Sriyudthsak K. and Shiraishi F. (2010) Industrial & Engineering Chemistry Research, 49(5), 2122-2129.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[12] Chen P.W., Fonseca L.L., Hannun Y.A. and Voit E.O. (2013) PLOS Computational Biology, 9(5).  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[13] Goel G., Chou I.C. and Voit E.O. (2008) Bioinformatics, 24(21), 2505-2511.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[14] Savageau M.A. (1969) Journal of Theoretical Biology, 25, 365-369.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[15] Savageau M.A. (1969) Journal of Theoretical Biology, 25, 370-379.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[16] Savageau M.A. (1970) Journal of Theoretical Biology, 26, 215-226.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[17] Savageau M.A. (1976) Biochemical Systems Analysis: A Study of Function and Design in Molecullar Biology, Addison-Wesley Pub. Co.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[18] Voit E.O. (2000) Computational Analysis of Biochemical Systems: A Practical Guide for Biochemists and Molecular Biologists, Cambridge University Press.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[19] Torres N.V. and Voit E.O. (2002) Pathway Analysis and Optimization in Metabolic Engineering, Cambridge University Press.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[20] Voit E.O. (2013) International Scholarly Research Network - Biomathematics, 1-53.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[21] Chou I.C., Martens H. and Voit E.O. (2006) Theoretical Biology and Medical Modelling, 3, 1-11.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[22] Kutalik Z., Tucker W. and Moulton V. (2007) IET Systems Biology, 1(3), 174-180.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[23] Liu P.K. and Wang F.S. (2008) Bioinformatics, 24(8), 1085-1092.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[24] Chou I.C. and Voit E.O. (2009) Mathematical Biosciences, 219(2), 57-83.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[25] Srinath S. and Gunawan R. (2010) Journal of Biotechnology, 149(3), 132-140.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[26] Jia G.S., Gregory N. and Gunawan R. (2011) Bioinformatics, 27(14), 1964-1970.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[27] Sriyudthsak K., Shiraishi F. and Hirai M.Y. (2013) PLOS ONE, 8(1).  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[28] Varah J.M. (1982) SIAM Journal on Scientific and Statistical Computing, 3(1), 28-46.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[29] Voit E.O. and Savageau M.A. (1982) Journal of Fermentation Technology, 60(3), 233-241.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[30] Voit E.O. and Almeida J. (2004) Bioinformatics, 20(11), 1670-1681.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[31] Chou I.C., Martens H. and Voit E.O. (2007) Statistics and Operations Research Transactions, 31(1), 55-74.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[32] Kimura S., Hatakeyama M. and Konagaya A. (2005) Chem-Bio Informatics Journal, 4, 1-14.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[33] Vilela M., Borges C.C.H., Vinga S., Vasconcelos A.T.R., Santos H., Voit E.O. and Almeida J.S. (2007) BMC Bioinformatics, 8.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[34] Kimura S., Ide K., Kashihara A., Kano M., Hatakeyama M., Masui R., Nakagawa N., Yokoyama S., Kuramitsu S. and Konagaya A. (2004) Bioinformatics, 21(7), 1154-1163.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[35] Almeida J.S. and Voit E.O. (2003) Genome Informatics, 14, 114-123.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[36] Whittaker E.T. (1923) Proceedings of the Edinburgh Mathematical Association, 78, 81-89.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[37] Eilers P.H. (2003) Analytical Chemistry, 75(14), 3631-3636.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[38] Shiraishi F. and Savageau M. (1992) Journal of Biological Chemistry, 267(32), 22912-22918.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[39] Curto R., Sorribas A. and Cascante M. (1995) Mathematical Biosciences, 130(1), 25-50.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[40] Heinrich R. and Schuster S. (1996) The Regulation of Cellular Systems, Chapman & Hall.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[41] Gavalas G.R. (1968) Nonlinear Differential Equations of Chemically Reacting Systems, Springer-Verlag.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[42] Palsson B.O. (2006) Systems Biology: Properties of Reconstructed Networks, Cambridge University Press.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[43] Voit E.O., Goel G., Chou I.C. and Fonseca L.L. (2009) IET Systems Biology, 3(6), 513-522.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[44] Chou I.C. and Voit E.O. (2012) BMC Systems Biology, 6.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[45] Voit E.O. (2013) Mathematical Biosciences, DOI: 10.1016/j.mbs.2013.01.008.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[46] Curien G., Bastien O., Robert-Genthon M., Cornish-Bowden A., Cardenas M.L. and Dumas R. (2009) Molecular Systems Biology, 5.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[47] Sorribas A., March J. and Voit E.O. (2000) Statistics in Medicine, 19(5), 697-713.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

Images
Fig. 1- Reaction diagram of a generic branched pathway.
Fig. 2- Metabolic reaction network of the biosynthesis of aspartate-derived amino acids. Abbreviations are: Asp: L-Aspartate, AspP: L-Aspartate-4-phosphate, ASA: L-Aspartate-semialdehyde, Lys: L-Lysine, Hser: Homoserine, PHser: O-Phospho-L-homoserine, AdoMet: S-Adenosylmethionine, Thr: L-Threonine, Ile: L-Isoleucine, Val: L-Valine. Lysyl: tRNA and Isoleucyl-tRNA are shown here as end products, although they are not explicitly included in the model.
Fig. 3- Comparison between results of the inferred unconstrained S-system model (lines) and those computed with the Michaelis-Menten model of the generic branched pathway in [Fig-1] (symbols). Time and concentrations are given in unspecified units.
Fig. 4- Distributions of kinetic parameter values, inferred for the Michaelis-Menten type branched pathway model with the unconstrained backward estimation strategy with 300 datasets with ±10% noise. Red circles mark the kinetic parameter values shown in [Table-2a]. Left panel: Pseudo-three-dimensional representations of αi-giji; center panel: αi-gij projections; right panel: αii projections.
Fig. 5- Comparison between results of the inferred S-system model (lines) and results computed with the Michaelis-Menten model of the generic branched pathway in [Fig-1] (symbols), using fully constrained backward inference. Time and concentrations are given in unspecified units.
Fig. 6- Comparison between the inferred S-system model (lines) and results computed with the Michaelis-Menten model of the generic branched pathway in [Fig-1] (symbols), using unconstrained forward inference. Time and concentrations are given in unspecified units.
Fig. 7- Comparisons between “data” (symbols and connecting black lines) obtained from the model of aspartate-derived amino acid biosynthesis [46] and results the S-system model with kinetic parameters inferred through constrained backward estimation (red lines). Analysis I: Initial values set at 120% of the steady state. See [Fig-S7] for a magnified representation of the initial time range.
Fig. 8- Comparisons between “data” (symbols and connecting black lines) obtained from the model of aspartate-derived amino acid biosynthesis [46] and results the S-system model with kinetic parameters inferred through unconstrained forward estimation (red lines). Analysis II: Initial values set at 20% of the steady state.
Fig. 9- Two pairs of fluxes were selected to complement DFE for Case Studies 1 and 2. Their values were taken from the constrained backward inference with the model of aspartate-derived amino acid biosynthesis [Fig-2]; [46] and used to perform DFE. Abbreviations are: vAK–Aspartate kinase, vASADH–Aspartate semi-aldehyde dehydrogenase, vDHDPS–Dihydrodipicolinate synthase, vLKR–Lysine ketoglutarate reductase, vHSDH–Homoserine dehydrogenase, vHSK–Homoserine kinase, vTS1–Threonine synthase, vTD–Threonine deaminase, v(Thr)tRNAsth–Threonyl-tRNA synthetase, v(Ile)tRNAsth–Isoleucyl-tRNA synthetase.
Fig. 10- Results of DFE, augmented with pairs of fluxes as shown in [Fig-9]. The fluxes obtained with DFE (green and blue; sometimes overlapping) by and large have the shapes of the fluxes used by Curien et al. [46] to construct the model of aspartate-derived amino acid biosynthesis, but they are shifted, thus indicating a remaining degree of freedom. The red flux graphs show the corresponding backward-inferred, constrained power-law representations (cf. [Fig-7]).
Fig. S1- Comparison between results of the approximate S-system model (red lines) and those computed with the Michaelis-Menten model of the generic branched pathway in [Fig-1] (black lines). Time and concentrations are given in unspecified units.
Fig. S2- Comparison between results of the various unconstrained S-system models (lines) and those computed with the Michaelis-Menten model of the generic branched pathway in [Fig-1] (symbols). Time and concentrations are given in unspecified units.
Fig. S3- Distributions of kinetic parameter values, inferred for the Michaelis-Menten-type branched pathway model with the unconstrained forward estimation strategy, using 300 datasets with 10% uniformly distributed noise. Red circles indicate the kinetic parameter values shown in [Table-2c]. Left panel: Pseudo-three-dimensional representations of βi-hii-αi; center panel: βi-hii projections; right panel: βii projections.
Fig. S4- Comparisons between “data” (symbols connected by black lines) obtained from the model of aspartate-derived amino acid biosynthesis [46] and results from the S-system model (red lines) with kinetic parameters inferred through unconstrained backward estimation. Analysis I: Initial values set at 120% of the steady state.
Fig. S5- Distributions of kinetic parameter values, inferred for the aspartate-derived amino acid biosynthesis model with the unconstrained backward estimation strategy, using 300 datasets with 10% uniformly distributed noise. Red circles indicate the kinetic parameter values shown in [Table-3a]. Analysis I: Initial values set at 120% of the steady state. Left panel: Pseudo-three-dimensional representations of αi-giji; center panel: αi-gij projections; right panel: αii projections.
Fig. S6- Comparisons between “data” (symbols connected by black lines) obtained from the model of aspartate-derived amino acid biosynthesis [46] and results from the S-system model (red lines) with kinetic parameters inferred through unconstrained forward estimation. Analysis I: Initial values set at 120% of the steady state.
Fig. S7- Comparisons between “data” (symbols and connecting black lines) obtained from the model of aspartate-derived amino acid biosynthesis [46] and results the S-system model with kinetic parameters inferred through constrained backward estimation (red lines). In comparison with [Fig-7] of the Text, only the initial time range of 20 seconds is shown. Analysis I: Initial values set at 120% of the steady state.
Fig. S8- Comparisons between “data” (symbols connected by black lines) obtained from the model of aspartate-derived amino acid biosynthesis [46] and results from the S-system model (red lines) with kinetic parameters inferred through unconstrained backward estimation. Analysis II: Initial values set at 20% of the steady state.
Fig. S9- Distributions of kinetic parameter values, inferred for aspartate-derived amino acid biosynthesis model with the unconstrained backward estimation strategy with 300 datasets with ± 10% noise. Red circle indicates the kinetic parameter values shown in [Table-S1]. Analysis II: Initial values set at 20% of the steady state. Left panel: Pseudo-three-dimensional representations of αi-giji; center panel: αi-gij projections; right panel: αii projections.
Fig. S10- Results of DFE, augmented with pairs of fluxes as shown in [Fig-9] of the Text. The fluxes obtained with DFE (blue) by and large have the shapes of the fluxes used by Curien et al. [46] to construct the model of aspartate-derived amino acid biosynthesis (black). However, they are shifted, thus indicating a remaining degree of freedom. The red flux graphs show the corresponding backward-inferred, constrained power-law representations (cf. [Fig-S7]). The small arrows indicate that the corresponding values are shown on the right y-axis.
Table 1- Parameter values of an S-system model approximating the Michaelis-Menten model of a generic branched pathway [Fig-1], [Eq-12] to [Eq-15] at the operating point (X1, op = 1.7, X2, op = 1.5, X3, op = 5.0, X4, op = 1.3).
Table 2- Parameter values, residual errors (SSE values) and computing times (CT) associated with different inference strategies using artificial data characterizing the branched pathway model [Fig-1], [Eq-12] to [Eq-15]
Table 3- Parameter values and residual errors (SSE values) associated with different inference strategies for the model of aspartate-derived amino acid biosynthesis [Fig-2]. In Analysis I, the initial values were set at 120% of the steady state.
Table 4- Parameter values and residual errors (SSE values) associated with different inference strategies for the model of aspartate-derived amino acid biosynthesis [Fig-2]. In Analysis II, the initial values set at 20% of the steady state.
Table S1: Parameter values and residual errors (SSE values) associated with different inference strategies for the model of aspartate-derived amino acid biosynthesis [Fig-2]. In Analysis II, the initial values set at 20% of the steady state.
Table S2- Parameter values and residual errors (SSE values) associated with different inference strategies for the model of aspartate-derived amino acid biosynthesis [Fig-2]. In Analysis II, the initial values set at 20% of the steady state.