Computational systems biology in the big data era

A report of the 6th IEEE International Conference on Systems Biology (IEEE ISB2012), 18-20 August, Xi'an, China.


Meeting report
A three-day international conference on Computational and Systems Biology was held in in Xi'an, China, August 18-20. More than 200 researchers including engineers, physicians, mathematicians, and biologists from China mainland, United States, Hong Kong, Taiwan, Japan, Korea enjoyed both academic exchanges and cultural scenes in Xi'an. Different with previous conferences, ISB2012 added the highlight track to invite the authors to present their research progress in recent published paper. Also ISB2012 set up the best paper award to support young researchers.
The Proceedings of the 6th International Conference on Computational Systems Biology (IEEE ISB2012) have been published by IEEE and are available online (http:// ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punum-ber=6306342). Fifty-seven papers in this volume cover wide range of computational systems biology. Moreover, the reviewers from the Program Committee of IEEE ISB2011 selected 15 papers to be recommended for a special issue in BMC Systems Biology after significant extension of their original versions on the Proceedings. Each submission has been peer reviewed and evaluated by three independent reviewers on the quality, originality, soundness, and significance of its contributions and the significant improvement regarding to the IEEE ISB2012 proceeding paper. Here we focus on some of the highlights of the meeting by categorizing and briefly introducing these selected papers.

Deep Sequencing data analysis and integration
We are currently generating massive data sets. Especially sequencing data is growing astronomically. New algorithms for data analysis and integrations are in pressing need. In this issue, Vladimir Trifonov et al. noticed that next-generation sequencing technologies have become a major tool for obtaining the difference between the samples. They developed a method, Statistical Algorithm for Variant Frequency Identification (SAVI), to estimate the frequency of alleles in a set of samples from RNA sequencing experiments. Hao Zhang et al. aimed to predict miRNA target from large scale data analysis and developed machine-learning features and conducted comprehensive data training for predicting interactions between H1N1 genome segments and host miRNA. Junhua Zhang et al. tamed the Cancer Genome Atlas (TCGA) glioblastoma multiforme (GBM) and ovarian carcinoma data and proposed a novel method to identify Mutated Core Modules in Cancer without any prior information other than cancer genomic data from patients with tumors. Yangfan Hu et al. performed meta-analysis on different published glioma gene expression profiles and showed an integrated dataset of expression microarrays, microRNA and ChIP-seq profiles representing the significant signatures of different data sets is more similar at pathway level than at gene level. Jingde Bu et al. proposed a two steps heuristic splice alignment tool to deal with RNA-Seq data and can provide both nonconservative and conservative splice junction information. Zhiyuan Yang et al. focused on the recent genome sequencing project of the naked mole rat (NMR, Heterocephalus glaber) and carried out genome-wide comparative analysis of NMR and rat genes. This study provided insights into understanding the possible anti-cancer mechanisms of NMR as well as searching for new cancer-related candidate genes.

Network systems biology
Network is a way to manage large quantities of biological data by modelling the biological molecules interactions. Biomolecular network concept is well-known as "network biology" and "network medicine" to understand cellular behavior in the systems level in terms of the spatiotemporal interactions among cellular components [6]

Software development
Developing software is the typical way for computational biologists to tackle the big data challenges. In this issue, Qiang Huang et al. observed that a lot of biomolecular networks are built from the large scale experimental data produced by the rapidly developing high-throughput techniques as well as literature and other sources. They developed a novel network querying method CNetQ and CNetA, which are implemented in a new R package Corbi (http://doc.aporc.org/wiki/Corbi) and are freely accessible. The computational experiments on the simulated and real data show that their methods get the best accuracy. Huayong Xu et al. released the cGRNB (combinatorial Gene Regulatory Networks Builder): a web server for building combinatorial gene regulatory networks through integrated engineering of seed-matching sequence information and gene expression datasets. The cGRNB enables two major network-building modules, one for MPGE (miRNA-perturbed gene expression) dataset and another for parallel miRNA/mRNA expression datasets. Yue Deng et al. proposed ppiPre, an opensource framework for PPI analysis and prediction using a combination of heterogeneous features including semantic similarities based on GO, co-pathway similarity based on KEGG and similarities based on PPI network topology. ppiPre is implemented in R language and is freely available on the CRAN (http://cran.r-project.org/web/ packages/ppiPre/).