Integrating multiple networks for protein function prediction

Background High throughput techniques produce multiple functional association networks. Integrating these networks can enhance the accuracy of protein function prediction. Many algorithms have been introduced to generate a composite network, which is obtained as a weighted sum of individual networks. The weight assigned to an individual network reflects its benefit towards the protein functional annotation inference. A classifier is then trained on the composite network for predicting protein functions. However, since these techniques model the optimization of the composite network and the prediction tasks as separate objectives, the resulting composite network is not necessarily optimal for the follow-up protein function prediction. Results We address this issue by modeling the optimization of the composite network and the prediction problems within a unified objective function. In particular, we use a kernel target alignment technique and the loss function of a network based classifier to jointly adjust the weights assigned to the individual networks. We show that the proposed method, called MNet, can achieve a performance that is superior (with respect to different evaluation criteria) to related techniques using the multiple networks of four example species (yeast, human, mouse, and fly) annotated with thousands (or hundreds) of GO terms. Conclusion MNet can effectively integrate multiple networks for protein function prediction and is robust to the input parameters. Supplementary data is available at https://sites.google.com/site/guoxian85/home/mnet. The Matlab code of MNet is available upon request.


Evaluation Metrics
Here, we provide the definition of the five evaluation metrics MacroF1, MicroF1, Fmax, function-wise Area Under the Curve (fAUC ) and protein-wise AUC (pAUC ). These evaluation metrics are extensively applied to evaluate the performance of multi-label learning algorithm and protein function prediction [1,4,5].
Let p c and r c be the precision and recall of the c-th label, computed as: T P c T P c + F P c r c = T P c T P c + F N c T P c , F P c , and F N c are the true positive, false positive, and false negative of the c-th function label.
MacroF1 is the average of harmonic mean of precision and recall of different labels: where C is the number of labels. MacroF1 give equal weight to each label, and it is more affected by the performance of the labels containing fewer member proteins.
MicroF1 calculates the F 1 measure on the predictions of different labels as a whole: MicroF1 does not give equal weights to each label. The labels having more member proteins have larger impacts on MicroF1 than the labels having fewer member proteins. Thus, MicroF1 is more affected by the labels having more member proteins.
Fmax is a protein centric evaluation metric used in CAFA [4], Fmax is an F -measure computed as: is the the precision at threshold t ∈ [0, 1], p i (t) is the precision on the i-th protein, m(t) is the number of proteins on which at least one prediction was made above the threshold t, r(t) = 1 u u i=1 r i (t) is the recall across u proteins at threshold t.
fAUC first computes the AUC score for each label, it gives equal weights to each AUC and then averages these AUC scores. Each AUC score is calculated as the 1 Eq. (8) in the main text {10 −2 , 10 −1 , · · · , 10 5 } Eq. (10) in [2] {1.2, 1.5, 2, 3, 4, 5, 6} LIG C the number of subnetworks [3] {1, 5, 10, 20, 30} area under the ROC curve,which is created by plotting the fraction of true positives out of the total actual positives vs. the fraction of false positives out of the total actual negatives. It measures the overall quality of the ranking induced by the classifier, instead of the quality of a single value of the threshold in that ranking. pAUC first ranks all the labels for each test protein in the descending order of the predicted likelihoods; it then varies the number of predicted labels from 1 to the total number of labels, and computes the receiver operator curve by calculating true positive rate and false positive rate for each number of predicted labels. It finally computes the area under the curve of all labels to evaluate the prediction [6].
MacroF1 and MicroF1 require the predicted likelihood score vector f i to be a binary indicator vector. Similar to [1], we take the functions corresponding to the k largest values of f i as the functions of the i-th protein, k is set to the average number of functions (round to the next integer) of all proteins.

Protein Function Prediction
In the main text, we reported the protein function prediction results on the Yeast dataset. The experimental results on the Human, Mouse and Fly datasets are provided in the Fig. 1 and Fig. 4, respectively. The results in Fig. 1 and Fig. 4 give similar conclusion as in the main text. Table 2 gives the results of MNet usingỸ (weighting labels) and Y (without weighting the labels).

Networks Relevance Estimation
The extra results of network relevance estimation on the Human dataset annotated with BP functions, CC functions and MF functions are reported in the Fig. 4-Fig. 7. These results also demonstrate MNet can assign large weights to high quality individual networks, whereas the other two comparing methods (SW and ProMK) can not always work in the same way.

Parameter Sensitivity Analysis
In the main text, we reported the results of MNet, ProMK, OMG and LIG under different values of parameters on the Yeast dataset annotated with BP functions. The additional results on the Yeast dataset annotated with CC functions, MF functions, and the Human dataset annotated with BP functions are given in the Fig. 8-Fig.10. These results also support the conclusion in the main text that MNet can select effective parameters in a wide range of values, and less affected by parameters selection problem than other comparing algorithms.