- Research
- Open Access
Hybrid method to solve HP model on 3D lattice and to probe protein stability upon amino acid mutations
- Yuzhen Guo†^{1}Email author,
- Fengying Tao†^{1},
- Zikai Wu^{4, 5} and
- Yong Wang^{2, 3}
https://doi.org/10.1186/s12918-017-0459-4
© The Author(s) 2017
- Published: 21 September 2017
Abstract
Background
Predicting protein structure from amino acid sequence is a prominent problem in computational biology. The long range interactions (or non-local interactions) are known as the main source of complexity for protein folding and dynamics and play the dominant role in the compact architecture. Some simple but exact model, such as HP model, captures the pain point for this difficult problem and has important implications to understand the mapping between protein sequence and structure.
Results
In this paper, we formulate the biological problem into optimization model to study the hydrophobic-hydrophilic model on 3D square lattice. This is a combinatorial optimization problem and known as NP-hard. Particle swarm optimization is utilized as the heuristic framework to solve the hard problem. To avoid premature in computation, we incorporated the Tabu search strategy. In addition, a pulling strategy was designed to accelerate the convergence of algorithm based on the characteristic of native protein structure. Together a novel hybrid method combining particle swarm optimization, Tabu strategy, and pulling strategy can fold the amino acid sequences on 3D square lattice efficiently. Promising results are reported in several examples by comparing with existing methods. This allows us to use this tool to study the protein stability upon amino acid mutation on 3D lattice. In particular, we evaluate the effect of single amino acid mutation and double amino acids mutation via 3D HP lattice model and some useful insights are derived.
Conclusion
We propose a novel hybrid method to combine several heuristic strategies to study HP model on 3D lattice. The results indicate that our hybrid method can predict protein structure more accurately and efficiently. Furthermore, it serves as a useful tools to probe the protein stability on 3D lattice and provides some biological insights.
Keywords
- Protein structure prediction
- HP model
- 3D lattice
- Particle swarm optimization
- Protein stability
Background
Protein is the substantial basis of biological activity. The function of protein is determined by its structure which is believed to be decided by the amino acid sequence according to Anfinsen’s experiments. So the research on protein structure prediction (also called protein folding problem) is very significant and fundamental in exploring the fundamental principle to map sequence, structure, and function.
To capture the backbone of protein structure prediction, Dill and his collaborators introduced HP lattice model to simplify real world complexity in 1995 [1]. HP lattice model is an abstracted scaffold, and eventually convert the protein structure prediction problem to an optimization problem on lattice. The aim is to find the optimal structure with the lowest energy. Computationally, solving this problem is NP-hard. For this reason many researchers have been attracted to study this problem by proposing many heuristic algorithms. In recent years, for 2D HP protein folding problem, many methods have been proposed, e.g., PSO (Particle Swarm Optimization) [2], ACO (Ant Colony Algorithm) [3], ABO (Artificial Bee Colony) [4] and SOM (Self-Organizing Mapping) [5] etc.
One issue for 2D lattice model is that it’s too simplified to constrain the amino acid sequence on a 2D plane. One step forward is to fold the sequence on 3D lattice and make it a better and native approximation. So far, several algorithms have been applied for 3D HP protein structure prediction problem, such as UEGO (Universal Evolutionary Global Optimization) [6], GA (Genetic Algorithms) [7], TS (Tabu Search) [8], EA (Evolutionary Algorithm) [9] and so on. Each method has its advantage to capture some special structure in the problem. In this paper, we aim to propose a hybrid method and improve the efficiency to solve the 3D HP protein structure prediction problem.
PSO was introduced by Kennedy and Eberhart [10]. It is a swarm intelligence optimization algorithm which imitates the foraging behaviors of birds and fish. As a simple meta-heuristic, it has been used to solve optimization problem with nonlinear, non-differentiable, and multi-modal function. Originally, this algorithm was designed for solving continuous optimization problem. Here, we started from the basic PSO framework and firstly extend the algorithm to the combinatorial optimization, into which we formally formulate the HP model on 3D lattice. In addition, we improved PSO as follows: a) redefined velocity for discrete model; b) employed modified Tabu search strategy to avoid premature convergence; c) designed pulling strategy to speed up convergence.
We showed that our hybrid algorithm can predict structures of amino acid sequences with different length efficiently. With this useful tool, we simulated the effects after single amino acid mutation and double amino acids mutation, respectively. Some biological insights are obtained.
The remainder of this paper is organized as follows. Firstly, a mathematical model was established for 3D HP problem. Secondly, we explained the PSO algorithm and proposed modified Tabu search method and pulling strategy. Thirdly, the performance of our algorithm was validated. Fourthly, the amino acid mutation result was obtained and analyzed. Finally, conclusions were presented.
Methods
Combinatorial optimization formulation for 3D HP lattice model
In HP model, every amino acid sequence is abstracted as an alphabetic string with H (hydrophobic amino acid) and P (hydrophilic amino acid). The protein conformation is a self-avoiding path on a 2D lattice. It is assumed that the main driving forces of the formation of the tertiary structure are the interactions among hydrophobic amino acids which are adjacent on lattice but not adjacent in the sequence, denoted as H-H interactions. The free energy of a protein conformation (X) is expressed by the number of H-H interactions. Based on Anfinsen’s assumption [11], the configuration tends to form a core in the spatial structure shield from the surrounding solvent by hydrophilic amino acids with the minimal free energy. So the more H-H interactions, the lower the free energy. We assumed that the free energy equals to the minus number of H-H interactions. HP lattice model has been used for solving protein structure prediction problem on 2D and 3D lattices widely. In this paper, we focused on the 3D HP square lattice model.
Where, E(X) is the free energy of protein conformation X, X(l)_{ d } is the d ^{ th } component of X(l), M(X) is the number of H-H interactions in conformation X, r expresses the number of adjacent hydrophobic pairs in amino acid sequence and ∥·∥ is Hamming distance. Equations (2), (3) and (4) constrain that every amino acid occupies only one lattice point, each lattice point cannot be used more than once and adjacent amino acids in the chain occupy the adjacent points on the lattice. Equation (8) presents whether the l ^{ th } amino acid occupies point (i,j,k). In Eq. (9), f(l) translates the l ^{ th } H (or P) of the amino acid sequence into 1 (or 0).
Solving the simplified HP model is NP-complete even on two dimensional lattice. Then we have to seek help from heuristic algorithms. Particle swarm optimization, one of the stochastic algorithm, serves as a powerful approximation method.
Hybrid algorithm
The basic PSO algorithm
Particle swarm optimization (PSO) is a heuristic framework that optimizes an objective function by iteratively improve a candidate solution. The motivation is to have a population of candidate particles, and move these particles around in the search-space according to simple mathematical formulae over the particle’s position and velocity. Each particle’s movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. Finally it is expected to move the swarm toward the best solution. The advantage of PSO is that it makes no assumptions about the problem and can search very large spaces of candidate solutions.
The process of basic PSO algorithm
Step 1 | To initialize \(\{X^{0}_{i}|i=1,2\cdots m\}\) and \(\{V^{0}_{i}|i=1,2\cdots m\}\); |
Step 2 | To calculate \(E(X^{t}_{i})\), find \(P^{t}_{ib}\) and \(P^{t}_{gb}\) ; |
Step 3 | To update \(X^{t}_{i}\) and \(V^{t}_{i}\); |
Step 4 | To output P _{ gb }. |
Where \(P^{t}_{ib}\) and \(P^{t}_{gb}\) are the best position of the i ^{ th } particle and the best position of all particles in the t ^{ th } iteration, respectively. Inertia weight (ω), self confidence (c _{1}) and swarm confidence (c _{2}) are input parameters, r _{1},r _{2} are two separately generated uniformly distributed random numbers in the range [0,1].
The modified PSO algorithm
Definitions
Clearly, X+V is a new position. Nevertheless, the new position may not satisfy the constraints in the OM model. An adjustment strategy is needed to ensure the new position was valid.
Modified Tabu search strategy
Premature convergence is one of the major difficulty to solve OM model by PSO algorithm. To further improve the modified PSO, we adopted the idea of Tabu search which was proposed by Glover [12]. This method was briefly described as follows.
Tabu search is a meta-heuristic method that maintains only one solution in the iteratively searching process. Given an initial solution X, the idea is to calculate and compare its neighboring solutions N(X). The best solution is chosen as candidate solution X _{ c }. If X _{ c } is satisfied with the aspiration rule, it will replace the current solution X and be added to tabu list T _{ list }; Otherwise, the current solution X will be replaced by the best one X ^{′} (E(X ^{′})=min{E(X)|X∈N(X),X∉T _{ list }}) and X ^{′} will be added to T _{ list }. Generally, T _{ list } is a first-in first-out (fifo) memory with limited length. So particles would not search the solutions which have been found for a while, simultaneously, the better solutions would not always be taboo.
Neighbourhood of solution and aspiration rule are the key components of Tabu search. In our 3D HP problem, feasible solution is a 3D self-avoiding path. It was not easy to figure out its neighboring solutions from a given solution. According to Eqs. (10) and (11), we got similar solutions by changing r _{1},r _{2} at the same iteration for the same particle in PSO, then these solutions constituted a neighbourhood. When candidate solution was better than the current solution, we would ignore whether the candidate solution was taboo or not.
Pulling strategy
The convergence rate of modified PSO with Tabu search strategy is not fast enough and the conformations obtained by this modified PSO may be too loose. The following strategy was designed in order to improve the algorithm.
Hybrid method
A novel hybrid method was proposed by combining modified PSO with modified Tabu search strategy algorithm, denoted as TPPSO^{1}. Another hybrid method was taken as TPPSO^{2}, which combined TPPSO^{1} with pulling strategy. Both methods employed Tabu search strategy and were applied to solve protein structure prediction problem. In TPPSO^{1} and TPPSO^{2}, when P _{ ib } and P _{ gb } were found, s alternative particles would be produced by Eqs. (10) and (11) for each particle.
The algorithm outline of TPPSO ^{2}
Step 1 | To initialize \(\{X^{0}_{i}|i=1,2\cdots m\},\{V^{0}_{i}|i=1,2\cdots m\}\) and T _{ list }=Ø; |
Step 2 | To calculate \(E(X^{t}_{i})\), find \(P^{t}_{ib}\) and \(P^{t}_{gb}\); |
Step 3 | To update \(\{V^{t}_{ij}|j=1,2\cdots s\}\) and \(\{X^{t}_{ij}|j=1,2\cdots s\}\); |
Step 4 | To adjust and pull \(\{X^{t}_{ij}|j=1,2\cdots s\}\) ; |
Step 5 | To calculate \(E(X^{t}_{ic})=min\{E(X^{t}_{ij})|j=1,2\cdots s\}\); |
Step 6 | If \(E(X^{t}_{ic})\leq E(X^{t}_{i})\) then \(X^{t}_{i}=X^{t}_{ic}\); |
Step 7 | To calculate \(E(P^{t}_{gbc})=min\{E(X^{t}_{i})|i=1,2\cdots m\}\); |
Step 8 | If \(E(P^{t}_{gbc})< E(X^{t}_{gb})\) then \(X^{t}_{i}=X^{t}_{ic},T_{list}=\text {{\O }}\); |
Step 9 | If \(E(P^{t}_{gbc})= E(X^{t}_{gb})\) and \(P^{t}_{gbc}\notin T_{list}\) then \(T_{list}=T_{list}+X^{t}_{gb}, X^{t}_{gb}=X^{t}_{gbc}\); |
Step 10 | To output P _{ gb }. |
Results
Numerical simulations
In order to test the feasibility of the hybrid algorithms (TPPSO^{1} and TPPSO^{2}) and explore the properties of algorithms, we calculated two groups of amino acids sequences, respectively.
Simulation of sequences with 27 amino acids
Sequences with 27 amino acids used in our study
Sequence ID | Amino acids sequence |
---|---|
A _{1} | PHPHPH _{3} P _{2} HPHP _{11} H _{2} P |
A _{2} | PH _{2} P _{10} H _{2} P _{2} H _{2} P _{2} HP _{2} HPH |
A _{3} | H _{4} P _{5} HP _{4} H _{3} P _{9} H |
A _{4} | H _{3} P _{2} H _{4} P _{3} HPHP _{2} H _{2} P _{2} HP _{3} H _{2} |
A _{5} | H _{4} P _{4} HPH _{2} P _{3} H _{2} P _{10} |
A _{6} | HP _{6} HPH _{3} P _{2} H _{2} P _{3} HP _{4} HPH |
A _{7} | HP _{2} HPH _{2} P _{3} HP _{5} HPH _{2} PHPHPH _{2} |
A _{8} | HP _{11} HPHP _{8} HPH _{2} |
A _{9} | P _{7} H _{3} P _{3} HPH _{2} P _{3} HP _{2} HP _{3} |
A _{10} | P _{5} H _{2} PHPHPHPHP _{2} H _{2} PH _{2} PHP _{3} |
A _{11} | HP _{4} H _{4} P _{2} HPHPH _{3} PHP _{2} H _{2} P _{2} H |
The Time is the circular times and Maxtime is the maximum number of iterations which is 3000 in our implementation. For each particle, we chose c _{1}=c _{2}=1, r _{11}=rand(0.9,1), r _{12}=rand(0.82,0.92), r _{13}=rand(0.74,0.84), r _{21}=rand(0.9,1), r _{22}=rand(0.85,0.95), r _{23}=rand(0.8,0.9) to produce three similar but not identical alternative particles. In this test, T _{ list } only contained ten particles.
Comparing four algorithms in eleven sequences with 27 amino acids
Sequence ID | EN | hELP | TPPSO^{1} | TPPSO^{2} |
---|---|---|---|---|
A _{1} | -9 | -9(18009) | -9(1983) | -9(177) |
A _{2} | -10 | -10(9447) | -10(1304) | -10(439) |
A _{3} | -8 | -8(1420) | -8(1249) | -8(44) |
A _{4} | -15 | -15(2125) | -15(795) | -15(19) |
A _{5} | -8 | -8(2877) | -8(104) | -8(61) |
A _{6} | -11 | -12(2610) | -11(940) | -12(812) |
A _{7} | -13 | -13(3967) | -12(721) | -13(805) |
A _{8} | -4 | -4(1070) | -4(6) | -4(3) |
A _{9} | -7 | -7(363) | -7(389) | -7(14) |
A _{10} | -11 | -11(416) | -11(2784) | -11(83) |
A _{11} | -14 | -16(285) | -14(957) | -16(2672) |
IBE number of TPPSO ^{2}
Sequence ID | A _{8} | A _{9} | A _{3} | A _{5} | A _{1} | A _{2} | A _{10} | A _{6} | A _{7} | A _{4} | A _{11} |
---|---|---|---|---|---|---|---|---|---|---|---|
H-H ^{a} | 4 | 7 | 8 | 8 | 9 | 10 | 11 | 12 | 13 | 15 | 16 |
IBE number ^{b} | 3 | 14 | 44 | 61 | 177 | 439 | 83 | 812 | 805 | 19 | 2672 |
where x is the number of H-H pairs, and y is the IBE number.
Test sequences
Sequence ID | Amino acids sequence | H-H | IBE number | Relative error |
---|---|---|---|---|
Test 1 | H _{4} P _{5} HP _{5} H _{3} P _{8} H | 8 | 51 (52.3829) | 0.0271 |
Test 2 | H _{4} P _{5} HP _{5} H _{3} P _{4} HP _{3} H | 9 | 167 (177.8417) | 0.0649 |
Test 3 | (HP _{2} HP)_{5} HP | 14 | 956 (921.1219) | 0.0365 |
Simulation of sequences with different length
Sequences with different lengths
Sequence ID | Amino acids sequence | Length | H-H | IBE number |
---|---|---|---|---|
B | H _{4} P _{2} H _{7} P _{3} H | 17 | 9 | 2 |
C | HPHP _{2} H _{2} PHP _{2} HPH _{2} P _{2} HPH | 20 | 11 | 11 |
D | P _{2} HP _{2} H _{2} P _{4} H _{2} P _{4} H _{2} P _{4} H _{2} | 25 | 9 | 139 |
E | P _{3} H ^{2} P _{2} H _{2} P _{5} H _{7} P _{2} H _{2} P _{4} H(HP _{2})_{2} | 36 | 17 | 432 |
F | P _{2} H(P _{2} H _{2})_{2} P _{5} H _{10} P _{6}(H _{2} P _{2})_{2} HP _{2} H _{5} | 48 | 29 | 976 |
These results shows that: a) TPPSO^{2} is able to solve sequences with different length and the obtained characteristic of protein structure is significant. b) pulling strategy improved the performance. c) Tabu search strategy avoided prematurity effectively. d) For TPPSO^{2}, the longer the sequence, the more the IBE number.
Probing protein stability upon amino acid mutation
Protein stability determines whether a protein will be in its native folded conformation or a denatured state. The folded, biologically active conformation of a protein is believed more stable than the unfolded, inactive conformations [15]. Thus, making proteins more stable is important in medicine and basic research. Amino acid mutations are widely used in protein design and analysis techniques to increase or decrease stability. These mutations are carried out experimentally using site-directed mutagenesis and similar techniques. This is time-consuming and often requires the use of computational prediction methods to select the best possible combinations [16–19]. With the efficient hybrid method at hand, we aim to probe the protein stability on 3D lattice. Particularly, we will simulate how single-site or double amino acid mutation affects protein stability. i.e., predicting the protein stability changes upon amino acid mutations with TPPSO^{2}.
Single amino acid mutation
The hybrid method TPPSO^{2} has been tested to solve protein structure prediction problem. Now, we focused on single amino acid mutation, whether and which amino acid affects the stability of protein structure. The experiments is designed as follows. We firstly calculate the optimal H-H interactions of original sequence by TPPSO^{2}. Then we choose one amino acid to mutate, i.e., we change it from H (P) into P (H). Then we calculate the optimal H-H interactions of mutated sequence by TPPSO^{2}. Finally the deviation of H-H interactions between mutated sequence and original sequence was recorded.
Sequences with different length
In order to probe the stability of amino acid mutation, we chose four sequences with different lengths. These sequences were mentioned in the above section. They are sequence B, C, D and A _{8}.
The single amino acid mutation results for sequence B
D-value | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
---|---|---|---|---|---|---|---|
Q-value | 0 | 1 | 5 | 7 | 2 | 2 | 0 |
R-value | 0% | 6% | 2% | 41% | 12% | 12% | 0% |
The single amino acid mutation results for sequence C
D-value | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
---|---|---|---|---|---|---|---|
Q-value | 0 | 5 | 5 | 4 | 6 | 0 | 0 |
R-value | 0% | 25% | 25% | 20% | 30% | 0% | 0% |
The single amino acid mutation results for sequence D
D-value | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
---|---|---|---|---|---|---|---|
Q-value | 0 | 7 | 1 | 6 | 11 | 0 | 0 |
R-value | 0% | 28% | 4% | 24% | 44% | 0% | 0% |
The single amino acid mutation results for sequence A _{8}
D-value | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
---|---|---|---|---|---|---|---|
Q-value | 0 | 5 | 3 | 7 | 11 | 1 | 0 |
R-value | 0% | 19% | 11% | 26% | 41% | 3% | 0% |
Summary of the single mutation results
Sequence ID | H^{0} | P^{0} | H^{1} | P^{1} | H^{2} | P^{2} |
---|---|---|---|---|---|---|
B | 12 | 5 | 12 | 5 | 1 | 2 |
C | 10 | 10 | 10 | 10 | 5 | 0 |
D | 9 | 9 | 16 | 16 | 7 | 0 |
A _{8} | 6 | 21 | 6 | 21 | 5 | 1 |
Sequences with the same length
We selected five sequences from Table 3 to test what kind of protein structures are more stable upon single amino acid mutation by TPPSO^{2}. We changed every amino acid of these sequences, then recalculated and recorded the H-H interactions of every mutated sequence.
Single amino acid mutation of sequences with 27 amino acids
Sequence ID | H-H | mass number | UC number | H | P | P →H | H →P |
---|---|---|---|---|---|---|---|
A _{8} | 4 | 9 | 4 | 6 | 21 | 19 | 4 |
A _{3} | 8 | 7 | 2 | 9 | 18 | 17 | 8 |
A _{5} | 8 | 8 | 7 | 9 | 18 | 12 | 8 |
A _{10} | 11 | 17 | 7 | 11 | 16 | 9 | 11 |
A _{4} | 15 | 13 | 10 | 14 | 13 | 3 | 14 |
All these results illustrated that: a) the more hydrophobic amino acids, the more H-H interactions; b) sequence with more H-H interactions tends to be more stable when single amino acid is mutated; c) hydrophobic amino acid mutation tends to alter the protein structure largely.
According to the above observations, we summarize that the sequence with more hydrophobic amino acids will be less susceptible to single amino acid mutation.
Double neighbouring amino acids mutation
Amino acid does not work alone and multiple amino acids coordinate to maintain stability and perform function. Our in-silicon simulation allows us to go beyond single amino acid mutation and explore the combinatorial effect of amino acid mutation. In this section, we explore the effect of double neighbouring amino acids mutation (two adjacent amino acids are mutated) in protein folding. Double neighbouring amino acids mutations were classified as HH → PP, PP → HH, HP → PH, PH → HP.
Double amino acids mutation results for sequence B
D-vale | HH → PP (9) ^{a} | PP → HH (3) ^{a} | HP → PH (2) ^{a} | PH → HP (2) ^{a} |
---|---|---|---|---|
-2 | 4(H,M,T) ^{b} | 0 | 0 | 0 |
-1 | 5 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 0 |
+1 | 0 | 1 | 1 | 1 |
+2 | 0 | 2(T) | 0 | 1(T) |
Double amino acids mutation results for sequence C
D-vale | HH → PP (2) | PP → HH (3) | HP → PH (7) | PH → HP (7) |
---|---|---|---|---|
-3 | 1(H) | 0 | 0 | 0 |
-2 | 1(T) | 0 | 0 | 0 |
-1 | 0 | 0 | 3 | 3 |
0 | 0 | 0 | 4 | 4 |
+1 | 0 | 2 | 0 | 0 |
+2 | 0 | 1(H) | 0 | 0 |
Double amino acids mutation results for sequence D
D-vale | HH → PP (4) | PP → HH (11) | HP → PH (4) | PH → HP (5) |
---|---|---|---|---|
-2 | 4(H,M,T) | 0 | 0 | 0 |
-1 | 0 | 0 | 0 | 3 |
0 | 0 | 0 | 2 | 1 |
+1 | 0 | 5 | 1 | 1 |
+2 | 0 | 6(M,T) | 1(T) | 0 |
Tables 14, 15 and 16 recorded the variation of H-H interactions and the position of pivotal double amino acids. According to these tables, we concluded that: a) If double amino acids mutation was HH → PP or PP → HH, the H-H interactions must be changed. But PH →HP and HP →PH maybe have variation. b)HH → PP and PP → HH must make the H-H interactions decrease and increase, respectively. c) The effect of double adjacent amino acids mutation which belongs to HP → PH or PH → HP was finite. d) The position of pivotal double adjacent amino acids mutation tend to locate be at the head or tail of sequence.
Double arbitrary amino acids mutation
We continued to explore the combinatorial effect of amino acid mutation. In this section, we check the effect of double amino acids mutations with arbitrary distance in protein folding. The amino acid mutations were classified ed as HH → PP, PP → HH, HP → PH, PH → HP. We simulated the sequence B in Table 7 with 20 amino acids. There are 10 hydrophobic amino acids and 10 hydrophilic amino acids in sequence B. We folded the conformations of all of mutation sequences.
Double arbitrary amino acids mutation results for sequence B
Combination | O-num | V-num | V-rate |
---|---|---|---|
H+H | 45 | 45(↓) | 100% |
P+P | 45 | 43(↑) | 96% |
H+P | 50 | 29(↑ ↓) | 58% |
P+H | 50 | 18(↑ ↓) | 38% |
Combination D-value and pivotal amino acids results for sequence B
Combination D-value and pivotal amino acids | ||
---|---|---|
H+H | -4 | H _{1} H _{3},H _{18} H _{20} |
P+P | +2 | P _{4} P _{5},P _{4} P _{13},P _{5} P _{8},P _{8} P _{17},P _{10} P _{13},P _{11} P _{16},P _{13} P _{16},P _{16} P _{19} |
H+P | -2 | H _{3} P _{10},H _{3} P _{16},H _{3} P _{19},H _{6} P _{19} |
P+H | -3 | P _{10} H _{18} |
According to the above observations, we summarized that a) double arbitrary amino acids mutation will be more sensitive to affect protein stability; b) double amino acids mutation with the same hydrophilic or hydrophobic property is more unstable than double amino acids mutation with different property; c) most of sensitive combinations are at the head or tail of sequence.
Discussion
As many research results indicate, HP model is very useful for modelling protein properties though it is simple and has many disadvantages. It captures the main difficulty of the real world problem. HP model has been applied in investigation of ligand binding to proteins [20]. The distinct influences of function, folding, and structure on the evolution of HP model are studied, by exhaustive enumeration of conformation and sequence space on a two dimensional lattice, which costs four week’s computation [21]. These research all show that our effort to fold the HP chain by a hybrid method on 3D lattice is necessary and important.
Also we propose to use HP model to probe the protein stability. HP model serves as a very efficient tool here. The simplification of 20 amino acids to H, and P types dramatically reduce the possible mutation pattern. Especially we can easily perform the double mutation only considering four combinations. Those insights from the HP model can serve as novel hypothesis to guide experiments. We also need to point out that the protein stability results and conclusions are heavily depending on the optimal solution of 3D HP model. We demonstrate the results in some small scale problems. When we want to generalize the study, we need to further improve the hybrid algorithm.
In our study, the computational experiments show that the new hybrid algorithm is efficient for short sequences. When the input space is bigger, there will be some sub-optimal solutions and more difficult to find the minimal energy configurations. It’s really a challenge for large scale HP model. The conformation space grows rapidly as the chain length increases. A possible method is to introduce divide-and-conquer strategy. We can also consider to combine with other algorithms or start from a good initial point from biological view. It will be our future work in devising such an algorithm for large protein.
Conclusion
In this paper, we studied protein structure prediction problem on 3D square lattice. We summarize the findings of this work as follows. Firstly, we formulated the protein structure prediction problem on 3D lattice into a combinatorial optimization problem; secondly, basic PSO algorithm has been enhanced to deal with discrete optimization problem; thirdly, we proposed a novel hybrid method (TPPSO^{2}) and proved its feasibility by simulating; fourthly, we derived some interesting insights for protein stability via single and double amino acid mutation perturbation.
Declarations
Acknowledgements
The open access fee was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB13000000). This work is also supported by National Natural Fund under grant number 11601288, 11422108, 61621003, and 61304178.
Availability of data and materials
Not applicable
About this supplement
This article has been published as part of BMC Systems Biology Volume 11 Supplement 4, 2017: Selected papers from the 10th International Conference on Systems Biology (ISB 2016). The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume-11-supplement-4.
Authors’ contributions
YG developed and implemented the methods. YW and FT participated in the development of the methods. YW conceived the protein stability experiment. All authors draft, read, and approved the final manuscript.
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Dill KA, Bromberg S, Yue K, et al. Principles of protein folding a perspective from simple exact models. Protein Sci. 1995; 4(4):561–602.View ArticlePubMedPubMed CentralGoogle Scholar
- Guo YZ, Wu Z, Wang Y, et al. Extended particle swarm optimization method for folding protein on triangular lattice. IET Sys Bio. 2016; 10(1):30–33.View ArticleGoogle Scholar
- Nardelli M, Tedesco L, Bechini A. Cross-lattice Behavior of General ACO Folding for Proteins in the HP Model. Proc of the 28th Annual ACM Symp on Appl Comput. 2013; 18(22):1320–1327.Google Scholar
- Zhang Y, Wu L. Artificial Bee Colony for Two Dimensional Protein Folding. Adv Electr Eng Syst. 2012; 1(1):19–23.Google Scholar
- Zhang XS, Wang Y, Zhan ZW, Wu LY, Chen LN. Exploring protein’s optimal HP configurations by self-organizing mapping. J Bioinf Comput Biol. 2005; 3(02):385–400.View ArticleGoogle Scholar
- García-Martínez JM, Garzón EM, Cecilia JM, et al. An efficient approach for solving the HP protein folding problem based on UEGO. J Math Chem. 2015; 53(3):794–806.View ArticleGoogle Scholar
- Lin CJ, Su SC. Protein 3D HP model folding simulation using a hybrid of genetic algorithm and particle swarm optimization. Int J Fuzzy Syst. 2011; 13:140–147.Google Scholar
- Benítez CMV, Lopes HS. Protein structure prediction with the 3D-HP side-chain model using a master-slave parallel genetic algorithm. J Braz Comput Soc. 2010; 16:69–78.View ArticleGoogle Scholar
- Tsay JJ, Su SC. An effective evolutionary algorithm for protein folding on 3D FCC HP model by lattice rotation and generalized move sets. Proteome Sci. 2013; 11(1):1.View ArticleGoogle Scholar
- Eberhart RC, Kennedy J. A new optimizer using particle swarm theory. Proc of the sixth Int Symp on micro Mach Hum Sci. 1995; 1:39–43.View ArticleGoogle Scholar
- Anfinsen CB. Principles that govern the folding of protein chains. Sci. 1973; 181(4096):223–230.View ArticleGoogle Scholar
- Glover F. Tabu search-part I. J Comput. 1989; 1(3):190–206.Google Scholar
- Guo YZ, Feng EM. The simulation of the three-dimensional lattice hydrophobic-polar protein folding. J Chem Phys. 2006; 125(23):234703.View ArticlePubMedGoogle Scholar
- Liu J, Li G, Yu J, et al. Heuristic energy landscape paving for protein folding problem in the three-dimensional HP lattice model. Comput Biol Chem. 2012; 28:17–26.View ArticleGoogle Scholar
- Pascal L, Stefan G, Abdullah K, Valentina C, Paul JB, Christian M, Mering C, Paola P. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science. 2017; 355(6327):812.Google Scholar
- Parthiban V, Michael MG, Schomburg D. CUPSAT: prediction of protein stability upon point mutation. Nucleic Acids Res. 2006; 34(suppl 2):W239-W242.View ArticlePubMedPubMed CentralGoogle Scholar
- Cheng J, Randall A, Baldi P. Prediction of Protein Stability Changes for Single Site Mutations Using Support Vector Machines, Proteins: Str. Func Bioi. 2006; 62:1125–32.View ArticleGoogle Scholar
- Shortle D, Stites WE, Meeker AK. Contributions of the large hydrophobic amino acids to the stability of staphylococcal nuclease. Biochem. 1990; 29:8033–41.View ArticleGoogle Scholar
- Perl D, Mueller U, Heinemann U, Schmid FX. Two exposed amino acid residues confer thermostability on a cold shock protein. Nat Struct Bio. 2000; 7(5):380–3.View ArticleGoogle Scholar
- Miller DW, Dill KA. Ligand binding to proteins: The binding landscape model. Prot Sci. 1997; 6(10):2166–79.View ArticleGoogle Scholar
- Blackburne BP, Hirst JD. Evolution of functional model proteins. J Chem Phys. 2001; 115(4):1935–42.View ArticleGoogle Scholar