- Open Access
Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture
© The Author(s). 2018
- Published: 22 November 2018
Ubiquitination, which is also called “lysine ubiquitination”, occurs when an ubiquitin is attached to lysine (K) residues in targeting proteins. As one of the most important post translational modifications (PTMs), it plays the significant role not only in protein degradation, but also in other cellular functions. Thus, systematic anatomy of the ubiquitination proteome is an appealing and challenging research topic. The existing methods for identifying protein ubiquitination sites can be divided into two kinds: mass spectrometry and computational methods. Mass spectrometry-based experimental methods can discover ubiquitination sites from eukaryotes, but are time-consuming and expensive. Therefore, it is priority to develop computational approaches that can effectively and accurately identify protein ubiquitination sites.
The existing computational methods usually require feature engineering, which may lead to redundancy and biased representations. While deep learning is able to excavate underlying characteristics from large-scale training data via multiple-layer networks and non-linear mapping operations. In this paper, we proposed a deep architecture within multiple modalities to identify the ubiquitination sites. First, according to prior knowledge and biological knowledge, we encoded protein sequence fragments around candidate ubiquitination sites into three modalities, namely raw protein sequence fragments, physico-chemical properties and sequence profiles, and designed different deep network layers to extract the hidden representations from them. Then, the generative deep representations corresponding to three modalities were merged to build the final model. We performed our algorithm on the available largest scale protein ubiquitination sites database PLMD, and achieved 66.4% specificity, 66.7% sensitivity, 66.43% accuracy, and 0.221 MCC value. A number of comparative experiments also indicated that our multimodal deep architecture outperformed several popular protein ubiquitination site prediction tools.
The results of comparative experiments validated the effectiveness of our deep network and also displayed that our method outperformed several popular protein ubiquitination site prediction tools. The source codes of our proposed method are available at https://github.com/jiagenlee/deepUbiquitylation.
- Protein ubiquitination site
- Multiple modalities
- Deep learning
- Convolution neural network
- Deep neural network
Ubiquitin is discovered by Goldstein et al.  in 1975, which is a small protein consists of 76 amino acids . Under the effects of E1 activation, E2 conjugation and E3 ligation enzymes, ubiquitin may conjugate to a substrate protein on a certain lysine residue [3, 4]. ubiquitination is one of the most important reversible protein posttranslational modifications (PTMs) and plays the significant roles in protein degradation and other cellular functions [5, 6]. The ubiquitination system is also associated with immune response, cellular transformation and inflammatory response .
Owing to its importance and complexity of ubiquitination, recognizing potential ubiquitination sites contributes to obtaining a deep understanding of protein regulation and molecular mechanism. Traditional experimental techniques such as CHIP-CHIP analysis and mass spectrometry are time-consuming, costly and laborious, while computational approaches that could effectively and accurately identify protein ubiquitination sites are urgently needed.
Some computational methods have been developed for the identification of protein ubiquitination sites. Huang et al.  developed a predictor called UbiSite, which fused multiple features such as amino acid composition (AAC), positional weighted matrix (PWM), position-specific scoring matrix (PSSM), solvent-accessible surface area (SASA) and MDDLogo-identified substrate motifs into a two-layer Support Vector Machine (SVM) model to predict protein ubiquitination sites. Nguyen et al.  also applied SVM to build the prediction model, using three features including amino acid composition, evolutionary information and amino acid pair composition. Additionally, the motif discovery tool, MDDLogo, was also used in their predictor. Qiu et al. established the tool iUbiq-Lys , which adopted sequence evolutionary information and gray system model, to identify protein ubiquitination sites. Chen constructedUbiProber  to combine sequence information, physico-chemical properties and amino acid composition with SVM, In which they respectively trained general model for a eukaryotic proteome and species-specific model for three species-specific proteomes. ESA-UbiSite  proposed by Wang et al., introduced physico-chemical properties into SVM. But they applied evolutionary screening algorithm (ESA) to select effective negative dataset from the whole dataset.
These existing machine learning approaches have good performance on small-scale data, nevertheless, there are still some challenges for large-scale protein ubiquitination site prediction: (1) Weakness of artificially designed features. All existing methods utilized feature engineering in feature extraction stage, which relied on expert knowledge, and usually lead to incomplete and biased feature vectors [13, 14]. (2) Heterogeneity among features. Almost existing prediction tools chose to fuse multiple features to improve the accuracy, but neglected the intrinsic heterogeneity among them. (3) Unbalanced distributions between positive and negative samples . In the whole proteome, only a small part of lysine residues can be attached to ubiquitin, which determines protein ubiquitination site prediction as an extreme unbalanced issue. Existing methods do not perform well in identifying potential protein ubiquitination site under such unbalanced circumstance. Deep learning as a trendy machine learning technique for large scale data is considered promising to solve these problems. It provides multiple-layer networks and non-linear mapping operations to excavate deep characteristics and reveal their internal association, especially on large-scale data. The deep-learning framework detects potential complex patterns from raw input signals, and generates homogenous deep representations for classification tasks. A variety of deep learning networks have been applied to genomic and proteomic analyses successfully [16–18]. However, deep learning technique is yet to utilize to predict protein ubiquitination sites.
In this paper, we established a multimodal deep architecture by using three different kinds of protein modalities, namely raw protein sequence fragments, selected physico-chemical properties of amino acids, and corresponding position-specific scoring matrix (PSSM). In the deep architecture, we built multiple convolution layers for detecting raw information from protein sequence representations, and combined the physico-chemical properties of amino acids with the help of some stacked fully connected layers, and brought other multiple convolution layers to explore the evolutionary profile toward potential ubiquitination sites. Then, such three sub-nets were trained separately so that these multiple modalities were transformed into more compatible representations for combination to predict unseen protein ubiquitination sites. As far as we know, this is the first published work that employs deep architecture to protein ubiquitination site prediction.
Large scale dataset collection
For implementing the large scale prediction of ubiquitination sites, we collected data from Protein Lysine Modification Database 3.0 version (PLMD) consisting of 25,103 proteins with 121,742 ubiquitination sites. PLMD is a specialized dataset containing 20 types of protein lysine modifications, and extends from CPLA 1.0 dataset and CPLM 2.0 dataset. As we know up to now, this is probably the largest-scale available protein ubiquitination database, and is never referred in any other researches of protein ubiquitination site prediction. For the sake of avoiding overestimation caused by homologous sequences, we utilized CD-HIT tool  to screen the similar protein sequences by 40% similarity in all data, and finally extracted 60,879 annotated protein ubiquitination sites from 17,406 proteins. Moreover, these protein sequences were divided into training dataset and testing dataset by random partition for constructing prediction model. Thus, there are totally 12,100 protein sequences with 54,586 ubiquitylated sites in training dataset and 1345 proteins with 6293 ubiquitylated sites in the independent testing dataset.
Details of training dataset, validation dataset and independent testing dataset
Number of sequences
Number of positive data
Number of negative data
Random partitioning in each training iteration
Encoding of protein fragments
One hot vector: every sample included m amino acids was constructed as an m × k 2-dimensional (2D) matrix, using a k dimensional zero vector with a one corresponding to the amino acid at the index of protein sequence. We assigned 0.05 to the positions whose left or right neighboring amino acids cannot fit the window size. Therefore, each protein fragment was mapped into a sparse and exclusive coding within its relative position information.
Physico-chemical properties: Prior researches [15, 20] demonstrated that there were strong correlations between physico-chemical properties of amino acids and ubiquitination sites. Many researches introduced physico-chemical properties in diverse protein post-translation modification site predictions such as acetylation, phosphorylation and sulfation . These physico-chemical properties corresponding to each amino acid can be found in an AAindex database . It recorded 544 physico-chemical properties which would lead to excessive model parameters in deep architecture . To reduce redundancy information and control complexity of model, we only select top thirteen physico-chemical properties that have been validated strongly related to ubiquitination in literature , and then a m × 13 2D matrix was formulated as another encoding modality for each sample. The details of these selected physio-chemical properties are shown in Table 2.
PSSM Profile: In this paper, we also employed PSSM to represent the evolutionary profile of the protein sequence. We referred the non-redundant database Swiss-Prot as the search source, generating the raw PSSMs of all protein sequences by utilizing the Basic Local Alignment Search Tool (BLAST) with the parameter “-j 3 -h 0.001” . In one raw PSSM, a 20 dimensional vector demonstrated approximately the preference of 20 types of amino acids at each position of protein sequence. In order to focus on the potential ubiquitination sites, we extracted the PSSM fragment corresponding to the window size m from the PSSM matrix from the whole protein sequence, which recorded the position-specific evolutionary profiles of protein fragment. Hence, we obtained an m × 20 2D matrix as PSSM encoding for each protein fragment.
The selected physico-chemical properties
Atom-based hydrophobic moment
Entropy of formation
Flexibility parameter for two rigid neighbors
Average accessible surface area
Apparent partition energies calculated from Janin index
Percentage of exposed residues
Transfer free energy
Average gain in surrounding hydrophobicity
Normalized flexibility parameters, average
Average non-bonded energy per atom
Multimodal deep architecture construction
For the purpose of precisely detecting implicit sequence-type features, we used 3 hidden layers of one dimensional Convolution Neural Network (1D CNN) to process one hot vector. Because of its inherent sparsity , a main function of CNN is to transform one hot vector into a given range of feature maps as detected sequential information. When this hierarchical convolution process ended, all newly generated feature maps were merged together into three fully connected dense layers, which may produce lower dimensional feature representations . We found that this structure was impactful to detect sequential feature representations.
For physico-chemical properties, a Deep Neural Network (DNN) with three dense layers was introduced to generate their deep representations . Physico-chemical properties reflected characteristic of proteins from various prospective, so that fully connected DNN structure that interconnects all these factors was utilized for their joint effect and useful combination. .
For the input modality of PSSM, we mainly applied 1D CNN with 3 hidden layers to detect potential informative descriptions among amino acids through evolution to the protein fragment. Differing from the sub-net of one hot vector, the trans-positioned PSSM vector was inputted into another three layers 1D CNN to obtain deep evolutionary characterization among different sequence positions. Then the feature maps involving two 1D CNNs were jointly merged to produce completely PSSM representations by three following fully connected dense layers.
The hyper-parameters of the proposed deep architecture
One hot vector
Phsico- chemical properties
For controlling the training process under balanced data, one training strategy was introduced to our model. Considering the considerable model parameters in three subnets, each subnet was respectively trained to guarantee the optimality of their weights, and then reloaded these trained weights as initialization to the whole multi-modal deep architecture. In the following training process of whole network, overall weights including the weights of last merged layer would be fine-tuned until they achieved global optimum. We implemented the training procedure of the whole deep architecture and subnets following the bootstrapping strategy. Let pos and neg represented the number of positive samples and negative samples respectively. Because of relative small size of positive samples, pos negative samples were randomly chosen to build balanced training dataset with all positive samples in each bootstrapping iteration [28, 29]. Therefore, all negative samples were divided into N = ⌊neg/pos⌋ bins, and our deep-learning network would be trained N times. The early stop rule  was introduced to control epoch numbers in our work, and the training process stopped automatically by the time the observed metric had not changed any more for a default epoch iterations (50 in this study).
We established this deep architecture using Keras 1.1.0 with Theano 0.9, and ran it on a graphic processing unit (GPU) GTX1080Ti. Due to the advantage of GPU computations and no need of feature engineering in modeling, the average time for predicting ubiquitination sites in a protein was in a few minutes, although it took about 2 h to train the model on 12,100 protein sequences. Nevertheless, the training process only needed to conduct once.
Performance of the multimodal deep architecture
In Fig. 2, we can see that when window size reached to 49, the three kinds of modalities achieved comparable accuracies to other candidates. This conclusion was inconsistent with some existing studies [8, 11], which implied that our deep architecture needed longer sequence fragments to offer potential long distance information.
Benefiting from the data-driven combination way, the whole multi-modal network achieved better performance than any subnets of uni-modality. The AUC (area under the ROC curves) and mean precision (area under the precision-recall curves) of multi-modal deep network reached 0.73 and 0.24 as shown in Fig. 3. Due to the pre-training of three subnets, the optimal weights of trained subnets for one hot vector, physico-chemical property and PSSM profile would be searched in advance for combination. Thus, the applicable weights of whole multi-modal deep architecture was able to appear by the following supervised fine tune. Figure 3 also indicated that one hot vector outperformed among three input modalities. It suggested that deep learning architecture may detect effective potential features hidden in raw protein sequences.
Comparisons with other classifiers
Comparative results with SVM classifier and Random Forest
One hot vector
One hot vector
Our deep architecture
One hot vector
Table 4 indicated that our deep architecture was superior to other models. The SVM and random forest models using uni-modal obtained general high specificity and a low sensitivity. It can be concluded that these traditional machine learning modeling approaches were incapacity of generating discriminative features from raw inputs. That is the reason why existing tools did not choose to input raw sequence fragments and properties, while further transformed these modalities into meaningful feature vectors, i.e. amino acid composition, for modeling. Meanwhile our deep architecture had the ability of detecting useful information from raw sequence fragments without feature engineering. The same situation occurred in the experiments of multi-modalities among different, which revealed that our deep architecture may carry out multi-modal fusion in a conductive way. The overall estimator Matthews correlation coefficients (MCC) of the traditional machine learning models were much lower than that of our architecture, which reflected that our bootstrapping training strategy may consolidate the generalization of our architecture on unbalanced training dataset from another respective.
Comparisons with other protein Ubiquitylation site prediction tools
Comparison of independent testing performance with other ubiquitination site prediction tools
Our deep architecture
Figure 5 exhibited that our model had evident overall advantages in terms of ROC and precision-recall carves. It proved high confidence of deep architectureon large-scale protein ubiquitination site data. It is worth noting that under a certain minor recall, Ubisite achieved higher precision among the three methods,probably because Ubisite introduced more prior knowledge from positive training samples to its classification model. It divided positive training samples into 12 subgroups according to the clustered results of significant substrate motifs using the MDDLogo tool . And then it trained 12 sub-models using the 12 subgroups of positive training samples and the same number of negative samples to implement a boosting classification. Such classification models emphasized the feature patterns of positive samples, and guided to detect potential homologous protein fragments with high similarity to its positive training samples. Consequently, it resulted in better precision than that of our deep architecture only when the recall was less than 3.89%.
The performance of our deep architecture on different datasets
Even though our deep learning architecture promoted the performance of protein ubiquitination site prediction on large scale data, there is still room for improvement. In the future, we would like to continue studying the optimization strategyfor guiding the selection of deep learning hyper-parameters, and cooperate with biologists to upgrade the model more biologically interpretable and reliable.
In this paper, a multimodal deep architecture was proposed method to predict large scale protein ubiquitination sites. Three different modalities include one hot vector, physico-chemical properties and PSSM, were employed to build the predition model. Comparative results on the available largest scale protein ubiquitination site database PLMD validated the effectiveness of our method. From the t-SNE visualization, it can be found that our deep architecture can generate powerful discriminative features to distinguish ubiquitination sites from non-ubiquitination sites in protein sequences. The success of our method is mainly due to the data-driven feature detection in deep learning, the multimodal fusion of deep representations, and the bootstrapping algorithm. Our source codes are freely available at https://github.com/jiagenlee/deepUbiquitylation.
The authors would like to thank Dong Xu and Duolin Wang for their pioneering work and helpful suggestions concerning the work.
This research is partially supported by National Natural Science Foundation of China (61403077, 61802057), China Postdoctoral Science Foundation (2017 M621192), the open project program of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University under Grant No.93K172018K02, and the Jilin Scientific and Technological Development Program of China (20150101057JC, 20170520058JH, 20180520022JH). Publication costs were funded by the Jilin Scientific and Technological Development Program of China (20170520058JH).
Availability of data and materials
Model design and parameterization are fully detailed in the main text and supplementary information.
About this supplement
This article has been published as part of BMC Systems Biology Volume 12 Supplement 6, 2018: Selected articles from the IEEE BIBM International Conference on Bioinformatics & Biomedicine (BIBM) 2017: systems biology. The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume-12-supplement-6.
FH conceived and supervised the project. RW, JGL and LLB were responsible for the design, computational analyses, and the implementation of the codes. DX revised the manuscript. XWZ drafted the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Goldstein G, Scheid M, Hammerling U, Schlesinger DH, Niall HD, Boyse EA. Isolation of a polypeptide that has lymphocyte-differentiating properties and is probably represented universally in living cells. Proc Natl Acad Sci U S A. 1975;72(1):11–5.View ArticleGoogle Scholar
- Wilkinson KD. The discovery of ubiquitin-dependent proteolysis. Proc Natl Acad Sci U S A. 2005;102(43):15280–2.View ArticleGoogle Scholar
- Welchman RL, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol. 2005;6(8):599–609.View ArticleGoogle Scholar
- Pickart CM, Eddins MJ. Ubiquitin: structures, functions. mechanisms Biochim Biophys Acta. 2004;1695(1–3):55–72.View ArticleGoogle Scholar
- Peng J, Schwartz D, Elias JE, Thoreen CC, Cheng D, Marsischky G, Roelofs J, Finley D, Gygi SP. A proteomics approach to understanding protein ubiquitination. Nat Biotechnol. 2003;21(8):921–6.View ArticleGoogle Scholar
- Hicke L, Schubert HL, Hill CP. Ubiquitin-binding domains. Nat Rev Mol Cell Biol. 2005;6(8):610–21.View ArticleGoogle Scholar
- Schwartz AL, Ciechanover A. The ubiquitin-proteasome pathway and pathogenesis of human diseases. Annu Rev Med. 1999;50:57–74.View ArticleGoogle Scholar
- Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines. BMC Syst Biol. 2016;10(Suppl 1):6.View ArticleGoogle Scholar
- Nguyen VN, Huang KY, Huang CH, Lai KR, Lee TY. A new scheme to characterize and identify protein ubiquitination sites. IEEE/ACM Trans Comput Biol Bioinform. 2017;14(2):393–403.View ArticleGoogle Scholar
- Qiu WR, Xiao X, Lin WZ, Chou KC. iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn. 2015;33(8):1731–42.View ArticleGoogle Scholar
- Chen X, Qiu JD, Shi SP, Suo SB, Huang SY, Liang RP. Incorporating key position and amino acid residue features to identify general and species-specific ubiquitin conjugation sites. Bioinformatics. 2013;29(13):1614–22.View ArticleGoogle Scholar
- Wang JR, Huang WL, Tsai MJ, Hsu KT, Huang HL, Ho SY. ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives. Bioinformatics. 2017;33(5):661–8.View ArticleGoogle Scholar
- Yuan Y, Xun G, Jia K, Zhang A, Acm: a multi-view deep learning method for epileptic seizure detection using short-time Fourier transform; 2017.Google Scholar
- Yuan Y, Xun G, Jia K, Zhang A. A Novel Wavelet-based Model for EEG Epileptic Seizure Detection using Multi-context Learning. In: Hu XH, Shyu CR, Bromberg Y, Gao J, Gong Y, Korkin D, Yoo I, Zheng JH, editors. 2017 Ieee International Conference on Bioinformatics and Biomedicine; 2017. p. 694–9.View ArticleGoogle Scholar
- Tung CW, Ho SY. Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics. 2008;9:310.View ArticleGoogle Scholar
- Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RKC, Hua Y, Gueroussov S, Najafabadi HS, Hughes TR, et al. The human splicing code reveals new insights into the genetic determinants of disease. Science. 2015;347(6218).View ArticleGoogle Scholar
- Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015;12(10):931–4.View ArticleGoogle Scholar
- Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015;33(8):831.View ArticleGoogle Scholar
- Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26(5):680–2.View ArticleGoogle Scholar
- Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW, Goebl MG, Iakoucheva LM. Identification, analysis, and prediction of protein ubiquitination sites. Proteins. 2010;78(2):365–80.View ArticleGoogle Scholar
- Kawashima S, Ogata H, Kanehisa M. AAindex: amino acid index database. Nucleic Acids Res. 1999;27(1):368.View ArticleGoogle Scholar
- Liu H, Sun J, Zhang H. Post-processing of associative classification rules using closed sets. Expert Syst Appl. 2009;36(3):6659–67.View ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ, Gapped BLAST. PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.View ArticleGoogle Scholar
- Yu Z, Sun T, Sun H, Yang F. Research on combinational forecast models for the traffic flow. Math Probl Eng. 2015.Google Scholar
- Yao M, Qi M, Yi Y, Shi Y, Kong J. An improved information hiding method based on sparse representation. Math Probl Eng. 2015.Google Scholar
- Wang J, Zhang B, Qi M, Kong J. Linear discriminant projection embedding based on patches alignment. Image Vis Comput. 2010;28(12):1624–36.View ArticleGoogle Scholar
- Yi J-H, Wang J, Wang G-G. Improved probabilistic neural networks with self-adaptive strategies for transformer fault diagnosis problem. Advances in Mechanical Engineering. 2016;8(1).View ArticleGoogle Scholar
- Wang D, Zeng S, Xu C, Qiu W, Liang Y, Joshi T, Xu D. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics. 2017;33(24):3909–16.View ArticleGoogle Scholar
- Pan X, Shen HB. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinformatics. 2017;18(1):136.View ArticleGoogle Scholar
- Yao Y, Rosasco L, Caponnetto A. On early stopping in gradient descent learning. Constr Approx. 2007;26(2):289–315.View ArticleGoogle Scholar
- Tung C-W. Prediction of pupylation sites using the composition of k-spaced amino acid pairs. J Theor Biol. 2013;336:11–7.View ArticleGoogle Scholar
- van der Maaten L, Hinton G. Visualizing Data using t-SNE. J Mach Learn Res. 2008;9:2579–605.Google Scholar
- Lee TY, Lin ZQ, Hsieh SJ, Bretana NA, Lu CT. exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences. Bioinformatics. 2011;27(13):1780–7.View ArticleGoogle Scholar
- Liu Z, Wang Y, Gao T, Pan Z, Cheng H, Yang Q, Cheng Z, Guo A, Ren J, Xue Y. CPLM: a database of protein lysine modifications. Nucleic Acids Res. 2014;42(Database issue):D531–6.View ArticleGoogle Scholar
- Chen Z, Zhou Y, Song J, Zhang Z. hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. Biochim Biophys Acta. 2013;1834(8):1461–7.View ArticleGoogle Scholar