Hybrid method to solve HP model on 3D lattice and to probe protein stability upon amino acid mutations

Guo, Yuzhen; Tao, Fengying; Wu, Zikai; Wang, Yong

doi:10.1186/s12918-017-0459-4

Volume 11 Supplement 4

Selected papers from the 10th International Conference on Systems Biology (ISB 2016)

Research
Open access
Published: 21 September 2017

Hybrid method to solve HP model on 3D lattice and to probe protein stability upon amino acid mutations

Yuzhen Guo¹,
Fengying Tao¹,
Zikai Wu^4,5 &
…
Yong Wang^2,3

BMC Systems Biology volume 11, Article number: 93 (2017) Cite this article

2515 Accesses
9 Citations
Metrics details

Abstract

Background

Predicting protein structure from amino acid sequence is a prominent problem in computational biology. The long range interactions (or non-local interactions) are known as the main source of complexity for protein folding and dynamics and play the dominant role in the compact architecture. Some simple but exact model, such as HP model, captures the pain point for this difficult problem and has important implications to understand the mapping between protein sequence and structure.

Results

In this paper, we formulate the biological problem into optimization model to study the hydrophobic-hydrophilic model on 3D square lattice. This is a combinatorial optimization problem and known as NP-hard. Particle swarm optimization is utilized as the heuristic framework to solve the hard problem. To avoid premature in computation, we incorporated the Tabu search strategy. In addition, a pulling strategy was designed to accelerate the convergence of algorithm based on the characteristic of native protein structure. Together a novel hybrid method combining particle swarm optimization, Tabu strategy, and pulling strategy can fold the amino acid sequences on 3D square lattice efficiently. Promising results are reported in several examples by comparing with existing methods. This allows us to use this tool to study the protein stability upon amino acid mutation on 3D lattice. In particular, we evaluate the effect of single amino acid mutation and double amino acids mutation via 3D HP lattice model and some useful insights are derived.

Conclusion

We propose a novel hybrid method to combine several heuristic strategies to study HP model on 3D lattice. The results indicate that our hybrid method can predict protein structure more accurately and efficiently. Furthermore, it serves as a useful tools to probe the protein stability on 3D lattice and provides some biological insights.

Background

Protein is the substantial basis of biological activity. The function of protein is determined by its structure which is believed to be decided by the amino acid sequence according to Anfinsen’s experiments. So the research on protein structure prediction (also called protein folding problem) is very significant and fundamental in exploring the fundamental principle to map sequence, structure, and function.

To capture the backbone of protein structure prediction, Dill and his collaborators introduced HP lattice model to simplify real world complexity in 1995 [1]. HP lattice model is an abstracted scaffold, and eventually convert the protein structure prediction problem to an optimization problem on lattice. The aim is to find the optimal structure with the lowest energy. Computationally, solving this problem is NP-hard. For this reason many researchers have been attracted to study this problem by proposing many heuristic algorithms. In recent years, for 2D HP protein folding problem, many methods have been proposed, e.g., PSO (Particle Swarm Optimization) [2], ACO (Ant Colony Algorithm) [3], ABO (Artificial Bee Colony) [4] and SOM (Self-Organizing Mapping) [5] etc.

One issue for 2D lattice model is that it’s too simplified to constrain the amino acid sequence on a 2D plane. One step forward is to fold the sequence on 3D lattice and make it a better and native approximation. So far, several algorithms have been applied for 3D HP protein structure prediction problem, such as UEGO (Universal Evolutionary Global Optimization) [6], GA (Genetic Algorithms) [7], TS (Tabu Search) [8], EA (Evolutionary Algorithm) [9] and so on. Each method has its advantage to capture some special structure in the problem. In this paper, we aim to propose a hybrid method and improve the efficiency to solve the 3D HP protein structure prediction problem.

PSO was introduced by Kennedy and Eberhart [10]. It is a swarm intelligence optimization algorithm which imitates the foraging behaviors of birds and fish. As a simple meta-heuristic, it has been used to solve optimization problem with nonlinear, non-differentiable, and multi-modal function. Originally, this algorithm was designed for solving continuous optimization problem. Here, we started from the basic PSO framework and firstly extend the algorithm to the combinatorial optimization, into which we formally formulate the HP model on 3D lattice. In addition, we improved PSO as follows: a) redefined velocity for discrete model; b) employed modified Tabu search strategy to avoid premature convergence; c) designed pulling strategy to speed up convergence.

We showed that our hybrid algorithm can predict structures of amino acid sequences with different length efficiently. With this useful tool, we simulated the effects after single amino acid mutation and double amino acids mutation, respectively. Some biological insights are obtained.

The remainder of this paper is organized as follows. Firstly, a mathematical model was established for 3D HP problem. Secondly, we explained the PSO algorithm and proposed modified Tabu search method and pulling strategy. Thirdly, the performance of our algorithm was validated. Fourthly, the amino acid mutation result was obtained and analyzed. Finally, conclusions were presented.

Methods

Combinatorial optimization formulation for 3D HP lattice model

In HP model, every amino acid sequence is abstracted as an alphabetic string with H (hydrophobic amino acid) and P (hydrophilic amino acid). The protein conformation is a self-avoiding path on a 2D lattice. It is assumed that the main driving forces of the formation of the tertiary structure are the interactions among hydrophobic amino acids which are adjacent on lattice but not adjacent in the sequence, denoted as H-H interactions. The free energy of a protein conformation (X) is expressed by the number of H-H interactions. Based on Anfinsen’s assumption [11], the configuration tends to form a core in the spatial structure shield from the surrounding solvent by hydrophilic amino acids with the minimal free energy. So the more H-H interactions, the lower the free energy. We assumed that the free energy equals to the minus number of H-H interactions. HP lattice model has been used for solving protein structure prediction problem on 2D and 3D lattices widely. In this paper, we focused on the 3D HP square lattice model.

At present, relative coordinates and space coordinates have been used to denote the protein conformation. For a sequence S with L amino acids, X is a string of length L−1 over the symbols {r(ight),l(eft),f(orward),d(own),u(p)} in relative coordinates, these five symbols reflect the relative location of contiguous amino acids on lattice. In space coordinates, X records the 3D coordinates of L amino acids, namely, X=(X(1),X(2)⋯X(L)) and X(l)∈N ³(l=1,2⋯L) is the coordinate of the l ^th amino acid. In this paper, we chose the space coordinates. For example, Fig. 1 showed a conformation with 7 H-H interactions on 3D square lattice. Its conformation was denoted as X=((2,3,2),(3,3,2),(3,4,2),(3,4,3),(3,3,3),(2,3,3),(2,2,3),(3,2,3),(3,2,2),(3,1,2),(2,1,2),(2,2,2)).

Based on the abstraction and minimum energy principle, we established the optimization model (OM) for protein structure prediction problem on 3D square lattice as following:

$$\begin{array}{*{20}l} & min\quad E(X) \end{array} $$

(1)

$$\begin{array}{*{20}l} & s.t.\quad \sum_{i=1}^{I}\sum_{j=1}^{J}\sum_{k=1}^{K}x_{\text{\textit{i,j,k}}}(l)=1 \quad \quad l=1,2\cdots L \end{array} $$

(2)

$$\begin{array}{*{20}l} & \quad \quad 0\leq\sum_{l=1}^{L}x_{\text{\textit{i,j,k}}}(l)\leq1 \quad \quad l=1,2\cdots L \end{array} $$

(3)

$$\begin{array}{*{20}l} & \quad \quad \sum_{d=1}^{3}|X(l+1)_{d}-X(l)_{d}|\cdot\|X(l+1)-X(l)\|=1\\ & \qquad \qquad \qquad \qquad \quad \quad \quad l=1,2\cdots L-1 \end{array} $$

(4)

Here,

$$\begin{array}{*{20}l}{} E(X) =&-M(X) \end{array} $$

(5)

$$\begin{array}{*{20}l}{} M\! =&\!\sum\limits_{i=1}^{I}\!\sum\limits_{j=1}^{J}\!\sum\limits_{k=1}^{K}\!\sum\limits_{l=1}^{L}x_{\text{\textit{i,j,k}}}(l)f(l)\!\sum\limits_{r=1}^{L}f(r)[\!x_{i,j,k+1}(r)\,+\,x_{i,j+1,k}(r)\quad\quad\\ &+x_{i+1,j,k}(r)]-h \end{array} $$

(6)

$$\begin{array}{*{20}l}{} h =&\sum_{l=1}^{L-1}f(l)f(l+1) \end{array} $$

(7)

$$\begin{array}{*{20}l}{} x_{\text{\textit{i,j,k}}}(l) =& \begin{cases} 1 &\text{if the $X(l)=(i,j,k)$}\\ 0 &\text{else} \end{cases} \end{array} $$

(8)

$$\begin{array}{*{20}l}{} f(l)=& \begin{cases} 1 &\text{if the $l^{th}$ amino acid is H}\\ 0 &\text{if the $l^{th}$ amino acid is P} \end{cases} \end{array} $$

(9)

Where, E(X) is the free energy of protein conformation X, X(l)_d is the d ^th component of X(l), M(X) is the number of H-H interactions in conformation X, r expresses the number of adjacent hydrophobic pairs in amino acid sequence and ∥·∥ is Hamming distance. Equations (2), (3) and (4) constrain that every amino acid occupies only one lattice point, each lattice point cannot be used more than once and adjacent amino acids in the chain occupy the adjacent points on the lattice. Equation (8) presents whether the l ^th amino acid occupies point (i,j,k). In Eq. (9), f(l) translates the l ^th H (or P) of the amino acid sequence into 1 (or 0).

Solving the simplified HP model is NP-complete even on two dimensional lattice. Then we have to seek help from heuristic algorithms. Particle swarm optimization, one of the stochastic algorithm, serves as a powerful approximation method.

Hybrid algorithm

The basic PSO algorithm

Particle swarm optimization (PSO) is a heuristic framework that optimizes an objective function by iteratively improve a candidate solution. The motivation is to have a population of candidate particles, and move these particles around in the search-space according to simple mathematical formulae over the particle’s position and velocity. Each particle’s movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. Finally it is expected to move the swarm toward the best solution. The advantage of PSO is that it makes no assumptions about the problem and can search very large spaces of candidate solutions.

In basic PSO algorithm (See Table 1), m particles search the optimal position simultaneously with dynamic velocity. Particle velocity is affected by iteration, own cognition, and social cognition of particle. Particularly, each particle can remember not only its own flight experience, but also the trajectories of all particles. In n dimensional search space, the position and velocity of the i ^th particle are represented as X _i∈R ⁿ and V _i∈R ⁿ, respectively. They are updated by the following two equations:

$$\begin{array}{*{20}l} {} V^{t+1}_{i}&=\omega V^{t}_{i}+c_{1}r_{1}\left(P^{t}_{ib}-X^{t}_{i}\right)+c_{2}r_{2}\left(P^{t}_{gb}-X^{t}_{i}\right) \end{array} $$

(10)

Table 1 The process of basic PSO algorithm

Selected papers from the 10th International Conference on Systems Biology (ISB 2016)

Hybrid method to solve HP model on 3D lattice and to probe protein stability upon amino acid mutations

Abstract

Background

Results

Conclusion

Background

Methods

Combinatorial optimization formulation for 3D HP lattice model

Hybrid algorithm

The basic PSO algorithm

The modified PSO algorithm

Definitions

Modified Tabu search strategy

Pulling strategy

Hybrid method

Results

Numerical simulations

Simulation of sequences with 27 amino acids

Simulation of sequences with different length

Probing protein stability upon amino acid mutation

Single amino acid mutation

Sequences with different length

Sequences with the same length

Double neighbouring amino acids mutation

Double arbitrary amino acids mutation

Discussion

Conclusion

References

Acknowledgements

Availability of data and materials

About this supplement

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Systems Biology

Contact us