During the last decade, the wide availability of high-throughput biological data has made it possible to produce new knowledge via a systems biology approach [1–3]. The inference of biochemical networks (i.e. the mathematical mapping of the molecular interactions in the cell) is therefore a question of key importance in the field. During the last decade, many methods have been developed to solve the network-inference (sometimes called reverse-engineering ) problems arising in e.g. gene expression [5–13], signal transduction [14–17] and metabolic networks [18–25].
In this context, it is particularly worth mentioning the DREAM initiative (Dialogue for Reverse Engineering Assessments and Methods) , which targeted the problems of cellular network inference and quantitative model building in systems biology. DREAM tries to address two fundamental questions: (i) how can we assess how well we are describing the networks of interacting molecules that underlie biological systems? and (ii) how can we know how well we are predicting the outcome of previously unseen experiments from our models? Interestingly, one of the main conclusions of the DREAM3 event was that the vast majority of the teams' predictions were statistically equivalent to random guesses. Moreover, even for particular problem instances like gene regulation network inference, there was no one-size-fits-all algorithm .
The use of a performance profiling framework with the DREAM3 benchmark problems revealed that current inference methods are affected by different types of systematic prediction errors . These authors conclude that reliable network inference from gene expression data remains an unsolved problem. Further, they highlight two major difficulties in the case of gene-network reverse engineering: limited data (which may leave the inference problem underdetermined), and the difficulty of distinguishing direct from indirect regulation. Prill et al  further explored the issue of intrinsic impediments to network inference, designating identifiability of certain network edges and systematic false positives as the main barriers. In this paper, we consider the widely used reaction kinetic formalism, where dynamic models of biological networks are described by a set of ordinary differential equations (see, e.g. [28–30] and the related literature). In particular, we consider the central question of the identifiability of such a network as understood in the systems and control area [31, 32].
Identifiability analysis studies whether there is a theoretical chance of uniquely determining the parameters of a mathematical model assuming perfect noise-free measurements and error-free modeling [33–35]. One of the early approaches for identifiability testing of nonlinear models is based on the Taylor-series expansion of the system output using the fact that the Taylor coefficients are unique . A similar but more general method uses the generating series or Volterra-series coefficients of the system which is the nonlinear generalization of the Laplace-transform method used for linear systems . In  a similarity transformation approach is proposed that gives necessary and sufficient conditions on local and global identifiability through the checking of nonlinear controllability and observability conditions. The appearance of differential algebra methods in systems and control theory [39, 40] opened the possibility for new types of identifiability tests that have gained significant popularity [41–43]. Further theoretical developments in the field include the identifiability conditions of rational function state space models , the possible effect of initial conditions on identifiability , and the application of Lie-algebras . While identifiability is the property of a certain parameterized model, a related notion called distinguishability addresses the problem whether two or more parameterized models (with the same or with different structure) can produce the same output for any allowed input [46–48]. The literature about identifiability and distinguishability of biological and chemical system models is relatively wide: Compartmental systems (that form a special subclass of general mass-action networks) are studied in [38, 49, 50]. The authors treat general nonlinear CRNs in [51, 52] and  where it is shown that for thermodynamically meaningful models, nonlinearity reduces the chance of indistinguishability compared to the linear case . Geometric conditions for the indistinguishability of CRNs are given in  with a related comment in . Computer algebra tools can be successfully used for the symbolic computations needed for identifiability and distinguishability testing of complex models [57–60].
The importance of identifiability has been recognized previously in systems biology, too [14, 61–64]. However, and despite a number of works illustrating ways to test the structural and practical identifiability of models [65–67], a significant portion of modeling studies in systems biology continue to ignore this key property.
It has been known for long that chemical reaction networks with different structure and/or parametrization may produce the same dynamical models describing the time-evolution of species concentrations [28, 55]. A related problem, namely the non-unique structure of Petri nets associated to reaction network dynamics, is studied in . Additionally, the value of prior information in biological network inference was clearly shown in [69, 70] by applying Bayesian network models. However, a constructive optimization-based approach for the study of dynamically equivalent (or similar) reaction networks is a recent development [71–74], which we further extend in this paper.
As a novelty, we present in this paper the definition and a computational method to find the so-called core reactions that are present in any dynamically equivalent reaction network if the set of complexes is given a priori. Moreover, a computationally improved method is introduced for the computation of dense realizations of CRNs together with a modified algorithm to check the uniqueness of a constrained reaction network structure. Structural non-uniqueness and the use of the proposed computational methods will be illustrated with the help of biological models known from the literature.
The structure of the paper is the following. The 'Methods' section introduces the notions of chemical reaction networks, structural identifiability and distinguishability of dynamical models. Moreover, it contains the procedures to obtain core reactions of a network and its sparse and dense representations, which rely on standard methods of linear programming (LP) and mixed integer linear programming (MILP) [75–78]. The analysis of four biological system models can be found in the 'Results and discussion' section, followed by the conclusions.