Module-detection approaches for the integration of multilevel omics data highlight the comprehensive response of Aspergillus fumigatus to caspofungin

Background Omics data provide deep insights into overall biological processes of organisms. However, integration of data from different molecular levels such as transcriptomics and proteomics, still remains challenging. Analyzing lists of differentially abundant molecules from diverse molecular levels often results in a small overlap mainly due to different regulatory mechanisms, temporal scales, and/or inherent properties of measurement methods. Module-detecting algorithms identifying sets of closely related proteins from protein-protein interaction networks (PPINs) are promising approaches for a better data integration. Results Here, we made use of transcriptome, proteome and secretome data from the human pathogenic fungus Aspergillus fumigatus challenged with the antifungal drug caspofungin. Caspofungin targets the fungal cell wall which leads to a compensatory stress response. We analyzed the omics data using two different approaches: First, we applied a simple, classical approach by comparing lists of differentially expressed genes (DEGs), differentially synthesized proteins (DSyPs) and differentially secreted proteins (DSePs); second, we used a recently published module-detecting approach, ModuleDiscoverer, to identify regulatory modules from PPINs in conjunction with the experimental data. Our results demonstrate that regulatory modules show a notably higher overlap between the different molecular levels and time points than the classical approach. The additional structural information provided by regulatory modules allows for topological analyses. As a result, we detected a significant association of omics data with distinct biological processes such as regulation of kinase activity, transport mechanisms or amino acid metabolism. We also found a previously unreported increased production of the secondary metabolite fumagillin by A. fumigatus upon exposure to caspofungin. Furthermore, a topology-based analysis of potential key factors contributing to drug-caused side effects identified the highly conserved protein polyubiquitin as a central regulator. Interestingly, polyubiquitin UbiD neither belonged to the groups of DEGs, DSyPs nor DSePs but most likely strongly influenced their levels. Conclusion Module-detecting approaches support the effective integration of multilevel omics data and provide a deep insight into complex biological relationships connecting these levels. They facilitate the identification of potential key players in the organism’s stress response which cannot be detected by commonly used approaches comparing lists of differentially abundant molecules. Electronic supplementary material The online version of this article (10.1186/s12918-018-0620-8) contains supplementary material, which is available to authorized users.


Single-seed and multi-seed ModuleDiscoverer approach
ModuleDiscoverer (MD) provides two different techniques for identifying cliques within the protein-protein interaction network (PPIN): The single-seed approach and the multi-seed approach.
The decision of identifying cliques based either on the single-or the multi-seed approach has to be made already in the first step of the MD algorithm in which the approximation of the PPIN's community structure takes place. The term 'single-seed' means that the algorithm starts from only one randomly selected seed node to identify minimal cliques of size three and followed by extending them to maximal cliques which represent the basis of the final regulatory module. As reported in Vlaic et al. [1], the single-seed approach favors the enumeration of large maximal cliques in dense regions of highly overlapping cliques. Hence, some proteins, which are only part of small cliques could be missed. Addressing this issue, the multi-seed approach uses two or even more seed nodes to identify cliques. This leads to a breakdown of large maximal cliques by using multiple seeds competing for nodes during the enumeration of cliques. On the one hand, this increases the probability of identifying proteins which are only part of small cliques. On the other hand, the resulting regulatory module contains a higher number of proteins which are not associated to DEGs. Vlaic et al. showed that the multi-seed approach produces very similar results to those received by the single-seed approach. In the end, it can be regarded as a comprehensive extension of the single-seed approach due to the additionally considered small-clique-proteins.
Here, we focused on the single-seed approach for two reasons: First, this approach is comparable with other well-established maximal clique enumeration problem-based algorithms (e.g., Barrenäs et al. [2] or Gustafsson et al. [3]). Second, Vlaic et al. showed that the multi-seed identified modules can be essentially considered as an extension of the single-seed modules.
Nevertheless, to estimate the comprehensiveness of the single-seed-generated regulatory modules, we performed further analyses and applied the multi-seed approach to the experimental data. The estimation of the required number of seed nodes was based on the application of MD to rat data performed by Vlaic et al.. Since the high-confidence (score > 0.7) PPIN of Aspergillus fumigatus (4 121 proteins) roughly contains a third of the nodes of the Rattus norvegicus network (15 436 proteins) used in the study of Vlaic et al. we decided to use 10 seed nodes (roughly a third).
Interestingly, Vlaic et al. also showed that the values around a chosen number of seed nodes does not significantly impact the overall structure of the resulting regulatory module.
When comparing the regulatory modules received by the single-and the multi-seed approach, we observed that multi-seed-generated modules comprised 100 % of all single-seed-generated modules and also contained an additional number of module components (Supplementary Table 1).
For the multi-seed overall regulatory module (ORM), we performed a generalized topological overlap measurement (as done for the single-seed ORM) to compare significantly enriched biological processes of the multi-seed-with those of the single-seed-based ORM. We found that multi-seed-based ORM clusters are significantly associated with biological processes that are also enriched for single-seed-based clusters. Such processes are, for instance, activation of kinase activity, actin-filament-based processes, response to oxidative stress, carbohydrate metabolic processes, amino acid metabolic processes, transport mechanisms and secondary and lipid metabolic process. The complete lists of significantly enriched biological processes can be found in Additional File 4. By analyzing key factors in the fungal response, we detected β-(1,3)-D-glucan synthase within the ORM, the main target of caspofungin. In addition, we identified the polyubiquitin UbiD among the top five ORM nodes ranked by both node degree and betweenness centrality. There is only a slight multi-/single-seed-difference observable for the UbiD node degree (single-seed: 111, multi-seed: 117) and betweenness centrality (single-seed: 0.396, multi-seed: 0.359). Filtering for transcription factors led to the same results for both ORM's, including the CBF/NF-Y family transcription factor.
In conclusion, the multi-seed MD approach allows for effectively integrating multilevel omics data.
It contains the regulatory modules received by the single-seed approach and provides even higher numbers of regulatory module components. The ORM generated by the multi-seed approach confirms the already observed key players and significantly associated processes. Altogether, the multi-seed MD can be considered as an extension of the single-seed MD.

Methods
All analyses performed for the KPM-application are based on methods described in the 'Methods' part of the main manuscript. KPM was applied as described in section 'Application of moduledetecting approaches'. Analyses regarding the overlap of molecular levels and time points were described in section 'Comparison of the simple approach and a module-detecting approach'. Details on the GO-term enrichment analysis for the KPM-based ORM were shown in 'Enrichment analysis (functional annotation of biological processes)'.

Results and Discussion
The following analyses are based on the KPM regulatory modules as presented in Table 5   Based on a GO-term enrichment analysis regarding biological processes, we found that the KPMgenerated ORM is significantly associated with biological processes which are also enriched for the MD-based ORM. Such processes are, for instance, (1→3)-alpha-glucan biosynthetic process, carbohydrate catabolic process, alpha-amino acid catabolic process, lipid and secondary metabolic processes or oxidation-reduction processes. In Supplementary Figure 3