Network reconstruction of the mouse secretory pathway applied on CHO cell transcriptome data
© The Author(s). 2017
Received: 20 April 2016
Accepted: 27 February 2017
Published: 15 March 2017
Protein secretion is one of the most important processes in eukaryotes. It is based on a highly complex machinery involving numerous proteins in several cellular compartments. The elucidation of the cell biology of the secretory machinery is of great importance, as it drives protein expression for biopharmaceutical industry, a 140 billion USD global market. However, the complexity of secretory process is difficult to describe using a simple reductionist approach, and therefore a promising avenue is to employ the tools of systems biology.
On the basis of manual curation of the literature on the yeast, human, and mouse secretory pathway, we have compiled a comprehensive catalogue of characterized proteins with functional annotation and their interconnectivity. Thus we have established the most elaborate reconstruction (RECON) of the functional secretion pathway network to date, counting 801 different components in mouse. By employing our mouse RECON to the CHO-K1 genome in a comparative genomic approach, we could reconstruct the protein secretory pathway of CHO cells counting 764 CHO components. This RECON furthermore facilitated the development of three alternative methods to study protein secretion through graphical visualizations of omics data. We have demonstrated the use of these methods to identify potential new and known targets for engineering improved growth and IgG production, as well as the general observation that CHO cells seem to have less strict transcriptional regulation of protein secretion than healthy mouse cells.
The RECON of the secretory pathway represents a strong tool for interpretation of data related to protein secretion as illustrated with transcriptomic data of Chinese Hamster Ovary (CHO) cells, the main platform for mammalian protein production.
KeywordsChinese hamster ovary cells Pathway reconstruction RNA-Seq Secretion pathway Protein secretion
Protein secretion is one of the most important processes in eukaryotes, allowing diverse events from enzyme secretion in saprobes to hormonal signalling in multicellular organisms, and facilitates production of recombinant proteins in most eukaryotic production hosts. Protein secretion is a complex process, which involves a large number of proteins and a series of steps spanning several cellular compartments. The secretory pathway has two main functions: 1) performing proper folding and post-translational modifications (PTMs) of proteins e.g. glycosylation and sulfation, and 2) sorting proteins to their final cellular or extracellular destination. The diverse processes along the secretory pathway are handled by so-called secretory components . The actual protein traffic is regulated by the organised action of numerous structural and regulatory proteins. Additionally, a number of regulatory proteins are dedicated to secure the proper response of the protein secretion pathway to environmental changes, nutrient availability, stress conditions, as well as differentiation signals . In humans, malfunctions in secretory components can result in Huntington’s, Alzheimer’s, or Parkinson’s disease, and protein specific misfolding can lead to cystic fibrosis and antitrypsin deficiency [42, 48].
Such a highly complex process is difficult to describe using a reductionist approach, and therefore a promising avenue is to employ the tools of systems biology. A particularly useful tool is a network reconstruction – a compilation of a list of the known components in a specific area of cell biology and the interaction of said components. Such network reconstructions (RECONs) have helped to analyse complex cellular pathways and networks related to metabolism, transcriptional regulation, protein-protein interactions (PPI), and genetic interactions among others . As RECONs allow the analysis of gene- or protein-level data in their biological context, they become tools for hypothesis-driven biological discovery .
To our knowledge, so far none has built dedicated RECONs of the protein secretion pathway with a focus on the secretory components and regulators. Models of the metabolic elements of the secretory pathway have recently been presented for fungi [13, 30]. However, there are few well-defined biochemical reactions in protein secretion, and metabolic models fail to capture all of the regulatory processes and protein interactions. Furthermore, these models have limited applicability in mammalian production systems due to the phylogenetic distance between fungi and mammals. Another approach has been to examine the systems properties of protein secretion through generating a map of the PPIs in the human secretory systems [5, 10]. Such maps provide valuable information about protein organisation and potential protein interactions, but are still static pictures of the interconnectivity. A major weakness of PPI-based networks is that the presence of an interaction between proteins does not necessarily indicate a biologically functional relationship under all conditions .
Here, we are interested in applying RECONs to the mammalian secretory pathway and related cell processes due to the importance to biopharmaceutical manufacturing. In 2013, the global market of biopharmaceuticals reached 140 billion USD of which the majority of proteins requiring post-translational modifications are produced in mammalian cells .
Among mammalian expression systems, CHO cell-based systems are most commonly used for therapeutic protein production in the biopharmaceutical industry due to the robustness of the cell, their ability to produce glycosylation patterns similar to humans, and that they are well adapted to industrial production in suspension without serum . However, the quality of genome-level data in the CHO system is still at its infancy compared to more developed model organisms such as mouse or humans. The first genome of the CHO cell line was only published in 2011 [16, 46] followed by publications of the draft Chinese hamster genome and several other CHO genomes in 2013 [7, 22, 29]. Therefore, in order to provide a RECON of high quality for understanding protein secretion in CHO cells, one will have to utilize the information from other model organisms, where the annotation is more developed.
In this study, we provide a holistic view of protein secretion which allows the interpretation of genome-scale data from mammalian cell lines, in particular mouse and CHO cells. For this use, we have generated a RECON of the secretory machinery that can integrate data with transcriptomics, proteomics, and genomics. Through manual curation of literature on human and mouse secretory pathways, supplemented by characterizations in yeast, we provide a comprehensive catalogue of characterized secretory components, including with functional annotation and the interconnectivity of the components, thus establishing – to our knowledge – the largest RECON of the functional secretion pathway to date. This serves both as a knowledge repository and as a tool for interpretation of complex genome-scale data from mammalian cells. In this study, we have applied the RECON to transcriptome data from both mouse and Chinese hamster ovary (CHO) cell lines.
Cell culture and media
Chinese hamster ovary cell lines and culture conditions
Control no IgG
0% NEAA supplement
Secretion stress (NaBu 5 mM)
A recombinant suspension CHO DG44 cell line stabile expressing a human IgG (DG44IgG), kindly provided by Symphogen A/S, was grown in PowerCHO medium (Lonza, Thermo Ficher Scientific) supplemented with 5 mM L-Glutamine (Gibco, Life Technologies), 0.1 mM MEM Non-Essential Amino Acid Solution ((Lonza, Thermo Ficher Scientific), and 0.4% Anti-Clumping Agent (Gibco, Life Technologies). A sub clone of the DG44IgG cell line was adapted to growth without MEM Non-Essential Amino Acid Solution (DG44 IgG-0NEAA) (see Table 1).
All cell lines were expanded in Erlenmeyer cell culture flasks (Corning, Sigma-Aldrich) and grown at 80 rpm in a humidified incubator at 37 °C with 5% CO2. Cell viability was measured with NucleoCounter NC-100 cell counter (Chemometec, Allerød, Denmark) according to manufacturers protocol.
Measurements of metabolites and productivity
Glutamine and glutamate were determined by YSI 2700 Select Biochemistry Analyzer (YSI Life Sciences, USA) calibrated with standard solution from YSI: L-glutamine 2737 and L-glutamate 2756. Glucose and lactate were determined by YSI 2300 Select Biochemistry Analyzer (YSI Life Sciences) calibrated with standard solution from YSI: D-glucose 2356 and L-lactate 1530. The IgG concentration was quantified by Biolayer Interferometry on ForteBIO Octet QK instrument (ForteBIO, USA) using the Protein A biosensor kit according to manufacturer’s protocol.
RNA purification and next-generation sequencing
Batch cultures were conducted in 250 ml Erlenmeyer cell culture flasks (Corning). The cells were seeded at 3.8 × 105 cells mL−1 in 80 ml. The cultures were maintained at 37 °C and a constant agitation speed of 80 rpm. 2 ml were sampled twice a day to monitor the cultures viability and productivity.
In order to analyse the transcriptome, we wanted RNA samples obtained from cells in exponential growth phase as well as in stationary phase. When seeded at 3.8 × 105 cells mL−1, CHO-K1 entered the exponential phase within 20 h of cultivation and had not reached stationary phase after 50 h. The CHO DG44 cell lines also entered the exponential phase after 20 h of cultivation and had not entered stationary phase 70 h after seeding.
Mouse RNA-Seq samples downloaded from the Encode Project
Total RNA was isolated using phenol–chloroform extraction from Trizol lysed CHO cell pellets. In brief, 2 × 106 CHO suspension cells were washed in ice-cold PBS and lysed in 400 μl TRI reagent (Sigma–Aldrich) and stored at −80 °C. Total RNA was extracted using chloroform and purification was performed by RNeasy mini kit (Qiagen, USA). Concentration and purity were analysed through absorption at 230, 260, and 280 nm using a NanoDrop spectrophotometer (Thermo Scientific) and Qubit 2.0 (Invitrogen, MA, USA). RNA integrity was assessed using RNA 2100 Bioanalyzer (Agilent Technologies, Germany).
Multiplexed cDNA library generation using the TruSeq RNA Sample Preparation Kit v2 (Illumina, Inc., San Diego, CA) and next-generation sequencing were performed by AROS Applied Biotechnology (Aarhus, Denmark) using eight samples per lane in an Illumina Hiseq 2000 system for paired-end sequencing (SRA accession: SRP073484).
Processing next-generation sequencing data
The FASTQC tool version 0.11.3 (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc) was used to evaluate the quality of fastq files before and after treatment. Quality trimming and adapter clipping were performed using Prinseq-lite version 0.20.3 , trimming trailing bases below quality 20, cutting adaptamer (first 14 bp), and discarding clipped reads shorter than 40 bp. Reads whose mates were discarded due to quality trimming and length constraints were processed as single end reads. The trimmed reads were mapped to the CHO-K1 genome (assembly and annotation) released in 2012 (NCBI Accession: GCF_000223135.1) using TopHat2 version 2.0.9 (using Bowtie 2.2.0) with default settings [25, 26]. Read counts for each transcript were obtained with HTSeq count version 0.5.4p3 using the intersection none-empty mode .
The subsystems of the secretory pathway
# Components mouse
# Components CHO cells
RNA sequencing data analysis
The read counts were normalised using EdgeR (version 3.6.8)  in R . Genes with detected counts per million (CPM) in at least two samples were kept. The normalised read counts were utilised for clustering the major sub-networks gene expression patterns. Hierarchical cluster analysis was performed in R using the package pvclust (version 1.2–2)  with average linkage method and the number of bootstrap set to 1000. Main clusters were identified for α = 0.95 and standard errors for approximately unbiased (AU) p-values. All genes of the transcriptome dataset were correlated to identify expression profile clusters by calculating Spearman and Spearman squared correlation coefficients. Following identification of the expression levels for all genes in the CHO genome, the Spearman correlation coefficient was calculated for each gene to the productivity of IgG and growth rate μ using R. Genes were considered to correlate significantly with productivity with Spearman’s correlation > 0.81 or < −0.81 (constituting two standard deviations from the mean of all measured correlations).
Differential gene expression analysis
Differential expression analyses were conducted for the CHO RNA-Seq data of Table 1. To take known sources of variation into account, the differential analyses were performed using the GLM likelihood ratio test in EdgeR for the experiments with multiple factors. A p-value of 0.05 and a false discovery rate (FDR) < 0.05 as well as ± log2.0 fold changes, were used as the default thresholds to identify the differentially expressed genes.
Gene ontology enrichment analysis
A BLASTp search of the CHO proteome from Genbank (downloaded March, 2013) based on the Protein Genbank IDs extracted from the CHO K1 genome annotation file (NCBI Accession: GCF_000223135.1] was performed against the mouse, human and rat proteome from UniProt and The Ensembl BioMart (downloaded March, 2013) to find the closest homologous proteins (lowest E-value) in these species. Identifiers, including RefSeq Protein Accession, ENSEMBL gene ID, and UniProt accession for each protein were subsequently obtained using the Gene ID conversion Tool from the DAVID database  (from November, 2013). Gene ontology (GO) enrichment analysis was performed by use of the online server of Gene Ontology Consortium  and PANTHER classification system  using the mouse UniProt accession numbers and Mus musculus as background.
Mouse functional secretory network
A list of components was drafted based on pathway data from mouse retrieved from the Kyoto Encyclopedia of Genes and Genome database . Additional information from UniProt  and Reactome  of functional annotation and described interaction was included. The draft was refined and expanded by manually curation based on a literature survey of the secretion machinery related genes in yeast, human, and mouse. The genes were categorised in sub-pathways manually according to closest relation found in literature.
CHO cell line specific secretory network
A local BLASTp of the complete mouse secretion network was performed against the CHO-K1 genome (downloaded from Genbank as assembly GCF_000223135.1 with RefSeq annotation, March 2013). To find the closest homologous of CHO; lowest E-value and identity level >90% was considered a CHO homolog.
Graphic representation of the secretory network
The secretion network was made compatible for visualisation in Cytoscape version 3.2.1 . Colours of nodes were set based on ± 2.0 fold change. Thickness of lines encircling nodes were increased by p-value when < 0.05. The significance of the networks is calculated using Fisher’s exact test, and the p-value is the executed negative logarithmic transformation.
Reconstruction of the mammalian secretory network for mouse proteins
Our first goal was to establish a RECON of the secretory pathway based on the highest possible quality of annotation data. Initially, a draft RECON of the secretory machinery pathways in mouse was generated based on data retrieved from the Kyoto Encyclopedia of Genes and Genome database . Additional information from UniProt  and Reactome  was included to expand the network beyond the functions covered in KEGG. Furthermore, the draft RECON was curated by adding and refining biological functions found in an extensive literature review of secretion pathway proteins in yeast, human, and mouse. In order to achieve as holistic a view of protein secretion as possible, we also included 75 genes that in literature have been tentatively associated with the secretory machinery. As a result, the generated secretory RECON comprises 801 components, all supported by literature (Additional file 1: Table S1).
Two hundred eighty-seven of the 801 components represent the core components of the protein secretory machinery that are directly involved in the translocation, folding, post-translational modifications and transport of the proteins (Additional file 1: Table S2). The post-translational modifications comprising N- and O- glycosylation systems occurring in the Golgi compartment are seen as independent systems and are therefore not included in this RECON. The reconstructed network thus condenses our current knowledge of the protein secretory machinery excluding the Golgi compartment.
Ontology of the RECON: components, subsystems, and functions of the secretory machinery
Overview of culture condition at RNA sampling
CHO DG44 IgG
CHO DG44 IgG
CHO DG44 IgG
Exponential phase, 0% NEAA
CHO DG44 IgG
Stationary phase, 0% NEAA
CHO DG44 IgG
Secretory stress by NaBu
CHO DG44 IgG
Secretory stress by NaBu
To provide an overview of the 801 components, we first categorized them by the different subsystems. Within each subsystem, components and complexes of components were grouped according to their function described in literature, termed functional groups. A component can be assigned to one or several functional groups if literature reports different functions. A functional network is thus the network of reported interactions within a functional group.
The network was then further expanded by including the following: 1) Branches into the subsystems of autophagy, apoptosis, and ER stress. These branches serve to identify if expression or activity is shifted into these subsystems, which are not as such a part of the protein secretion pathway. Therefore, these branches appear incomplete in term of components. 2) All reports of links between the components, be it DNA-DNA, protein-DNA, or protein-protein interactions (Fig. 1c.)
Conversion of the secretory network to a Cytoscape representation for data analysis and visualisation
The complete network of the RECON was made compatible with Cytoscape  allowing the integration of omics data for analysis and visualisation. Components with previously described interconnectivity, functional annotation and/or protein complexes were included. As this leaves out components with no described interactions, the Cytoscape representation includes 655 connected components of the secretory RECON. The architecture of the network was expanded to include 42 nodes, which mark protein complexes, as well as the 103 functional groups. Additional nodes were included if isoproteins had previously been reported. The network is provided as a Cytoscape Input File (Additional file 2: Cytoscape input file). Supplying the RECON as a network facilitates extraction of sub-networks for further analysis and the addition of new components and interactions. Furthermore, it serves to ease data interpretation, in general to focus on the part of an ‘omics-dataset involved in protein secretion and in particular to identify co-regulated genes of the same protein complexes and/or from the same function.
Test of the functional secretory machinery network for data interpretation
We wanted to test that the reconstruction can be used for data analysis and interpretation. As a first step, we wanted to examine how the the defined subsystems of the secretory RECON represented data, and assess the inference of these systems. In order to achieve this, and furthermore demonstrate the use of the network in relation to transcriptome data analysis, we used an RNA-Seq dataset obtained from the ENCODE project: an assortment of 22 samples of mouse tissue from seven different embryonic tissues of mouse, covering several stages of embryo development [19, 33].
Our results show here that the clustering of the data (the biological co-regulation, as shown by the dendrogram in Fig. 2) is in very good accordance with the functional categories in the subsystems and protein complexes defined in the secretory RECON (As seen by the colors in Fig. 2).
Exploring the potential of clustering according to functions
The RNA-Seq data clustering of the mouse tissue samples allow us to identify new genes potentially associated with the secretory system, based on similar expression profiles suggesting co-regulation . In particular, if a transcription factor is self-regulated, one would expect it to cluster together with the genes it is regulating.
Reconstruction of CHO cells secretory machinery network
The next step was thus to employ the mouse-based secretory RECON to reconstruct the protein secretory pathway of CHO cells using a comparative genomic approach. Through homolog protein search, 726 CHO-K1 genes were mapped to the mouse secretory components with identity over 80% (at the protein level). For an additional 38 ORFs homologs, the identity was only > 60%, although being the best hit, with a significant e-value and bit scores above 50 . These proteins were also added to the CHO cell network. Of the identified homologs, 39 were noted partial in the description and two of those components were found to also have partial annotation (SRP54 and CREP). 39 components were not identified by BLAST or annotated as pseudo genes and thus not included.
As a result, the CHO-K1 secretory RECON comprises 764 components (see Additional file 1: Table S1). 270 core components of the protein secretory machinery were identified and the distribution within the major subsystems are listed in Table 3. The graphical representation of the CHO-K1 secretory RECON was created using Cytoscape as described for the mouse network (Fig. 6a).
Examination of protein secretion in CHO cells
To illustrate the application of the secretory network for analysis of omics data within protein secretion in a cell factory, the secretory RECON was applied to a RNA-Seq dataset from the biopharmaceutical workhorse CHO cells:
A RNA-Seq data set was generated from CHO cells using the following conditions: two growth conditions (exponential growth and stationary phase), two cell lines (CHO-K1 and CHO DG44), with and without expression of IgG antibodies, with and without sodium butyrate (NaBu) treatment, and absence and presence of NEAA in the growth medium (See Table 2). These diverse conditions provide a range of transcript expression levels for genes that are relevant for optimisation of the secretory pathway for heterologous gene expression, with NaBu in particular added to induce secretory stress [12, 39]. The RNA-Seq dataset is experimentally designed to minimise noise from differences between batches and biological variation. Each sample represent a combination of conditions, and the full set secures biological replicates for each condition.
For quality control of the biological replicates, the RNA-seq data was investigated by multi-dimensional scaling (Additional file 3: Figure S1). As expected, the differences between the two cell lines CHO-K1 and CHO DG44 are separated in the first dimension, while the second dimension separates the normal non-treated cells from the sodium butyrate treated cells. The paired nature of the samples, exponential and stationary phases, was confirmed, with the exception of samples with and without NEAA, which seemed to have no effect.
Differential expression analyses
Summary of differential gene expression analysis (see Additional file 1: Tables S5–S8)
Total # analysed
# FDR < 0.05
# |log2 FC| ≥ 2
Secretion stress NaBu
0% NEAA sup.
We determined the transcriptional effect of heterologous IgG production in CHO cells by comparing CHO-K1 not producing heterologous proteins (Table 5, 1.1-1.2) with CHO DG44 producing recombinant IgG at industrial levels (Table 5, 2.1.1–2.2.2), using the cultivation phases as blocking. Of the 25,029 examined genes, 16,446 were above the cutoff for expression, and 6540 genes were differentially expressed (false discovery rate (FDR) < 0.05). We identified 1953 genes with |log2 Ratio| ≥ 2, where 1542 were up-regulated and 411 were down-regulated. In a similar fashion, the exponential growth phase was compared to stationary phase with the same set of samples (Table 5, 1.1–1.2, 2.1.1–2.2.2), but using cell lines as blocking. Similar strategies were used to examine the effect of NaBu and NEAA medium supplements (Table 5). In any of the four conditions, the number of genes not expressed was just above 8000. A comparison revealed that these 8000 genes are largely the same in all conditions.
An alternative to the use of our secretory pathway RECON, is the use of the functional annotations from the gene ontology (GO). We thus applied a GO enrichment analysis for comparison to our method: For the CHO-K1 genome, only a limited number of genes have assigned GO-terms. Consequently, we performed a BLASTP search to retrieve mouse UniProt accession numbers that matched the CHO-K1 genome. GO-terms were assigned to the mouse identifiers through the online server of Gene Ontology Consortium . Of the 6540 genes found to be significantly differentially expressed in the IgG production comparison, 1447 genes were mapped to GO-terms using a BLAST comparison of mouse and CHO-K1. A GO enrichment analysis was performed using a cutoff of p-value < 0.05 to identify significantly overrepresented GO-terms for each of the main GO categories, biological processes (BP), cellular compartments (CC), and metabolic function (MF) as well as a GO-slim for BP (Additional file 1: Table S9). In summary, the majority of the overrepresented GO terms for BP are regulation or positive regulation of signal transduction (GO:0009966; GO:0009967), signalling (GO:0023056, GO:0023051), response to stimulus (GO:0048584; GO:0048583; GO:0050896), and metabolic, biological or cellular processes (GO:0044710; GO:0048518; GO:0008150). In comparison, the GO-slim enrichment for BP revealed terms that could be associated with the effect of protein production: vesicle-mediated transport (GO:0016192), protein transport (GO:0015031), and intracellular protein transport (GO:0006886). GO enrichment thus gives a broad overview of the cellular processes engaged, but not at the level of detail in the secretory RECON (Fig. 2).
Comparative cluster analysis of gene expression levels within the secretory machinery for CHO cells to mouse
Following the changed regulation, we examined the expression patterns of the potential regulators found in mouse expression data (described above). For the gene Mecp2 no sequence homolog could be found in the CHO-K1 genome. The expression of Rbbp7 was plotted against all CHO samples (Additional file 3: Figure S3), but no correlation seems to be present in this data, further supporting the observation of decreased regulation.
Gene expression level correlated with protein and growth within the secretory network
We furthermore developed a method to use the genes of the RECON to analyse gene-phenotype correlations for protein production and growth in the secretory network. Extracting all expression values for the 764 CHO genes in the secretory RECON and comparing these with growth rates and IgG titers, we analysed Spearman and Pearson correlations to find monotonic and linear relationships, respectively. Correlation coefficients are available in Additional file 1: Table S10.
Application of the functional network for interpretation of protein secretion in CHO cells
Motivated by the complexity of the secretory pathway, we have developed a network reconstruction of the secretory machinery using a systems biology approach based on manual curation. In this study, we have provided a catalogue of 801 proteins from the mouse with functional annotation and their interconnectivity. The functional annotation of the components and their grouping in subsystems were based on literature. Furthermore, we provide an implementation of this network that integrates with multi-omics data for visualization of genome-scale data. Prior work in this area includes a reconstruction of the yeast secretory machinery presented by Feizi et al. , which was based on well-defined stoichiometry reactions as a part of a genome-scale model reconstruction of metabolism. This study reports a network of 163 components in yeast. For the more complex organism Aspergillus oryzae, Lui et al.  presented a reconstruction of the secretory pathway using the yeast network as a base. They reported a list of 369 genes (putative end experimentally verified), including biosynthesis of GPI and dolichol. The network presented in this study covers mammalian protein secretion, excluding the N- and O-glycosylation, therefore the GPI biosynthesis and dolichol pathways are not included. The components of the cell wall, which are naturally not part of the mammalian secretory network, are also not included. The network of our study thus includes more biological processes linked to the secretion pathway than any previous study. Furthermore, we include aspects of the processes of stress in connection with heterologous protein production, specifically components of the subsystems: autophagy, apoptosis, and ER stress. The subsystems: translocation, protein folding, protein transport, UPR, and ERAD comprise a total of 512 components. Moreover, the network can be expanded and improved in the future when new components or connections are identified.
We further examined – as a test of the network – whether RNA-Seq data clustered based on biological data reflects the subsets, functional groups, and complexes of the network. As could be expected for normal, healthy cells, the components of the subsystems of ERAD, protein folding, and translocation as well as the proteasome are grouped into major clusters (Fig. 2a). One cluster contained all components associated with functions related to protein folding and translocation, while the other cluster held mainly components linked to ERAD.
The complexity of the secretory pathway was also exemplified in the close biological association between protein folding and the machinery involved in identification of terminally misfolded proteins. Furthermore, the components of the proteasome were found in two tight clusters (Fig. 2b, approximately unbiased (AU) = 86). The expression pattern of the proteasome units has similarities to the PF expression patterns of Fig. 2c, as might be expected, as both are a part of the normal growth-related functions of the cell. The bottom part of Fig. 2a holds mainly ERAD-associated components (AU > 73), however, one sub-cluster (Fig. 2d) shows a significantly different expression pattern (AU > 95). However, since all samples are from healthy growing tissues, activity of ERAD is not expected to occur, thus explaining that stress-related ERAD-associated components may not be induced in these samples.
In summary, clustering of the transcriptome data was used to assess the functional secretory network, and confirms that the literature-based sorting of the proteins into the subsystems and functional groups of ERAD, protein folding, and translocation seems meaningful. Despite the high complexity of the secretory pathway, we see that our functional categories are representative of the un-supervised clusters formed from analysis of RNA-Seq data. This also demonstrates that such analysis can provide meaningful data on the biological system by querying the network.
The functional secretory network based on the well-characterised organism mouse, as well as human and yeast, provided the foundation for constructing of a CHO cell secretory network. Despite the fact that the CHO-K1 genome is still at the draft stage, 764 homolog components were identified. For the absent 39 components, the cause is most likely either missing annotation  or gaps in the genome. However, wrong annotations of identified components are also likely to be present, but in a limited number since the BLAST was performed by manual curation of significant hits. As only <5% of the identified mouse network is missing, the CHO network is still a comprehensive representation of CHO protein secretion.
In order to apply the secretory RECON for studying protein secretion and to identify novel engineering targets, transcriptomic data was applied from healthy mouse tissues as reference and generated for CHO cell lines.
The transcriptome data of CHO showed that for the effect of IgG production, the high number of differentially expressed genes (6540) confirms that heterologous protein production affects the overall gene expression and general cellular processes (Table 5). This was confirmed by GO enrichment analysis, which identified biological processes terms within various types of regulation. However, using differential gene analysis alone, it was difficult to approach more specific traits within protein secretion for the IgG production.
When examining the differences due to the change of cultivation phases, few genes (333) were significantly changed more than |logFC| > 2. This is perhaps to be expected, as the experimental design removes differences between the cell lines and eliminates all growth-related genes which do not change.
Our addition of NaBu which causes hyperacetylation , and leads to increased transcription as well as increased recombinant protein expression [8, 14, 45], gave rise to the highest number of differentially expressed genes (8121) with a false discovery rate (FDR) < 0.05. Of these, the majority are upregulated, as to be expected with the NaBu effect of transcriptional activation. The difference between the cell lines adapted and grown in medium with or without supplementation of NEAA respectively was very little, which was confirmed by the few differentially expressed genes (27 with FDR < 0.05 and 5 having |logFC| > 2). The identified differentially expressed genes were not connected to amino acid metabolism in literature. Based on this, and the low number, we believe that they might be false positives.
We examined the transcription levels of the components from the major subsystems protein folding, translocation, proteasome, and ERAD, and observed tighter clustering in mouse than in CHO cells. This is interesting, as the mouse samples are from different tissues, while the CHO samples are the same cell type. We thus see it as a sign of less tight regulation in the cancerous CHO cells than in the healthy mouse cells. As an example, the eight subunits of the OST protein complex (addition of N-glycans on proteins in the ER lumen), cluster tightly in mouse (Fig. 2a), but have a very diverse expression profile in CHO. One exception from the apparent difference in regulation is chaperones, which are observed to have similar expression profiles in mouse CHO. Another interesting observation is that five components of the translocation complex Sec61 cluster together with protein folding components in mouse (as would be expected), while in CHO cells they clustered with components of ERAD. This could indicate that the retro-translocation function of this protein complex might be more active in CHO cells. We therefore speculate that CHO cells in general are de-regulated, at least compared to healthy mouse cells, but in traits where there has been a deliberate selection for functionality – e.g. within folding of (heterologous) proteins – the regulation has been retained.
In this study, we further presented three alternative methods to study protein secretion using omics data illustrated with transcriptomic data: Method 1) functional clustering of the secretory network for identification of regulators, Method 2) correlation of the specific IgG production and maximum growth rate to the expression levels within the secretory network, and Method 3) graphical representation of the secretory network as a method for studying protein secretion from a holistic view, and with the possibility of focusing on specific subsystems or protein complexes.
Method 1 enables identification of regulators of selected functions. Here, we identified possible regulators of protein folding, as such potential targets for cell engineering. The identified histone-binding protein (Rbbp7) (Fig. 3a) could serve as a potential target since in literature it is described as a co-repressor [15, 47]. The Methyl-CpG-binding protein (Mecp2) was identified as anti-correlating to protein folding (Fig. 3b) and could be of particular interest as a target, since it has been associated with regulating expression of a wide range of genes and that it can function as both an activator and repressor of transcription . However, these two potential discoveries proved difficult to transfer to CHO, partially due to genome quality and partially due to apparent deregulation: The identified repressor/activator Mecp2 in mouse was not annotated in the CHO-K1 draft genome, and Rbbp7 was observed to have a significantly different expression profile in CHO cells (see Additional file 3: Figure S2).
For method 2 – correlating expression levels to IgG productivity and growth, we used the CHO transcriptome data for protein production. In Fig. 5 it is noticed that several of the known targets for optimised protein production in CHO cells (XBP1, ATF4, BIP, ERP72/PDI4, CNX [27, 32] have high negative correlation to growth, but interestingly with little correlation to IgG production. Possibly, many of these are a part of a stress response under normal regulation, and therefore correlate with low growth rates. In the other end of the growth axis is the gene p53 which is highly correlated with growth, but not correlated to IgG production, which is expected as it is a well-described target for improved cell viability . Similar improvements are reported for the genes BAX and BAK1 , here they correlate only to some degree with both growth and protein production (negative correlation). In contrast, the genes P4Hb/PDI1/ERP59 and GADD34  also previously described as positive targets for protein production, are located as negatively correlated to IgG production and with no correlation to growth. Within genes that correlate highly with IgG production, we see the targets known for cell survival, e.g. BCL-XL, possibly suggesting that our cells are stressed by the protein production. Other proteins previously described as positive targets for protein production are ERP57/PDIA3  and Hsp90b1/GRP94 , but here we see that they are correlated with growth and not significantly with protein production. Finally, the different caspases are scattered across the plot and are not correlated with either growth or IgG production. This however is easily explained, as caspases are regulated by phosphorylation, which cannot be seen at the transcriptional level.
It is interesting that we see discrepancies between our calculated correlations, and approximately half of the previously reported targets. Possibly this supports that reported improvements are often cell line or in particular protein specific. However, several are identified in accordance with literature. Possibly more interesting, is how we see that known targets placed at the rim of Fig. 5, suggesting that genes placed here are interesting targets in general. In particular, novel engineering targets within the secretory pathway might be found in close proximity to the known successful targets.
Method 3 was the use of a graphical representation of the secretory RECON for studying the specific subsystems and protein complexes of protein secretion that could not be observed by simple differential gene expression analysis (too many genes) or GO enrichment analysis (too broad terms) (Fig. 6).
Illustrated by the example of the OST protein complex, all subunits of the complex, which we could identify in the CHO-K1 genome, are up-regulated (p = 0.2 Fisher’s exact test) in comparison with the complete network. In the same way, the majority of the expressed subunits of the proteasome protein complex were found to be down-regulated, which in comparison with the complete network is found highly significant with a p-value < 0.05. Importantly, none of the components has any change above 2 fold, meaning that they would not be found in a regular differential expression analysis. In contrast, the functional group of ER glycosylation components is not found to be significantly up or down-regulated. Furthermore, the subunits of the two transport complexes between the ER and the Golgi compartment, COPI and COPII, were found to have diverse expression patterns. Within a protein complex, it is expected that all the subunits have relatively similar expression patterns as observed for the mouse gene expression data (see Fig. 2). Once again, this suggests a lower or less strict level of regulation in the CHO cancer line cells.
In this study, we have generated a comprehensive catalogue of characterized proteins of the secretory pathway with functional annotation and their interconnectivity and functions, and thus – to our knowledge – established to date the most elaborate RECON of the secretion pathway. The secretory network was mapped for both the well-characterised mouse (801 components) and the relatively uncharacterised CHO cell line (764 components). The RECON serves as a frame for meaningful interpretation of omics data. In particular, we present three different methods to study protein secretion through omics data: 1) Using clustering of the transcription levels of the RECON elements to identify new potential regulators. 2) Correlation of transcriptome to IgG production and growth. 3) a graphical presentation for analysing transcriptome data in relation to protein complexes or functional groups. All three are highly useful tools as demonstrated through specific findings and the general observation in several methods that CHO cells seem to have less strict transcriptional regulation than the healthy mouse cells.
The secretory pathway RECON therefore represents a strong tool in optimization of protein production and growth of CHO cell lines, the main platform for mammalian protein production.
American Type Culture Collection
Chinese Hamster Ovary
Counts per million
False discovery rate
Kyoto Encyclopedia of Genes and Genome
Non-essential amino acids
Open reading frame
Unfolded protein response
United States of America
United States Dollars
Yellow spring instruments
H.F.K. and L.E.P. thanks the Novo Nordisk Foundation for financial support. The foundation had no role in the design of the study and collection, analysis, and interpretation of data and did not contribute to the writing of the manuscript.
Availability of data and materials
All data and materials are available as supplementary materials, through cited publications, and through download from NCBI Sequence Read Archive (SRA accession: SRP073484).
AML carried out the experiments, participated in the design of the study, analysed data, developed algorithms, and drafted the manuscript. CSK participated in the design of the study, and analysed data. JB and LEP analysed data. HFK participated in the design of the study, analysed data and helped to draft the manuscript. CK participated in the design of the study, and helped to draft the manuscript. MRA participated in the design of the study, analysed data, and helped to draft the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
A local ethics committee ruled that no formal ethics approval was required in this particular case.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9.View ArticlePubMedGoogle Scholar
- Arden N, Betenbaugh MJ. Life and death in mammalian cell culture: strategies for apoptosis inhibition. Trends Biotechnol. 2004;22:174–80.View ArticlePubMedGoogle Scholar
- Arden N, Majors BS, Ahn S, Oyler G, Betenbaugh MJ. Inhibiting the apoptosis pathway using MDM2 in mammalian cell cultures. Biotechnol Bioeng. 2007;97:601–14.View ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Behrends C, Sowa ME, Gygi SP, Harper JW. Network organization of the human autophagy system. Nature. 2010;466:68–76.View ArticlePubMedPubMed CentralGoogle Scholar
- Bordbar A, Nagarajan H, Lewis NE, Latif H, Ebrahim A, Federowicz S, Schellenberger J, Palsson BO. Minimal metabolic pathway structure is consistent with associated biomolecular interactions. Mol Syst Biol. 2014;10:737.View ArticlePubMedPubMed CentralGoogle Scholar
- Brinkrolf K, Rupp O, Laux H, Kollin F, Ernst W, Linke B, Kofler R, Romand S, Hesse F, Budach WE, Galosy S, Müller D, Noll T, Wienberg J, Jostock T, Leonard M, Grillari J, Tauch A, Goesmann A, Helk B, et al. Chinese hamster genome sequenced from sorted chromosomes. Nat Biotechnol. 2013;31:694–5.View ArticlePubMedGoogle Scholar
- Candido EPM, Reeves R, Davie JR. Sodium butyrate inhibits histone deacetylation in cultured cells. Cell. 1978;14:105–13.View ArticlePubMedGoogle Scholar
- Chahrour M, Jung SY, Shaw C, Zhou X, Wong STC, Qin J, Zoghbi HY. MeCP2, a key contributor to neurological disease, activates and represses transcription. Science. 2008;320:1224–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Christianson JC, Olzmann JA, Shaler TA, Sowa ME, Bennett EJ, Richter CM, Tyler RE, Greenblatt EJ, Wade Harper J, Kopito RR. Defining human ERAD networks through an integrative mapping strategy. Nat Cell Biol. 2012;14:93–105.View ArticleGoogle Scholar
- Consortium TU. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–12.View ArticleGoogle Scholar
- Dorner AJ, Wasley LC, Kaufman RJ. Increased synthesis of secreted proteins induces expression of glucose-regulated proteins in butyrate-treated Chinese hamster ovary cells. J Biol Chem. 1989;264:20602–7.PubMedGoogle Scholar
- Feizi A, Österlund T, Petranovic D, Bordel S, Nielsen J. Genome-scale modeling of the protein secretory machinery in yeast. PLoS ONE. 2013;8:e63284.View ArticlePubMedPubMed CentralGoogle Scholar
- Fomina-Yadlin D, Mujacic M, Maggiora K, Quesnell G, Saleem R, McGrew JT. Transcriptome analysis of a CHO cell line expressing a recombinant therapeutic protein treated with inducers of protein expression. J Biotechnol. 2015;212:106–15.View ArticlePubMedGoogle Scholar
- Giri R, Yeh H-H, Wu C-H, Liu H-S. SUMO-1 overexpression increases RbAp46 protein stability and suppresses cell growth. Anticancer Res. 2008;28:3749–56.PubMedGoogle Scholar
- Hammond S, Swanberg JC, Kaplarevic M, Lee KH. Genomic sequencing and analysis of a Chinese hamster ovary cell line using Illumina sequencing technology. BMC Genomics. 2011;12:67.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2008;4:44–57.View ArticleGoogle Scholar
- Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat. 1996;5:299.Google Scholar
- Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 2011;21:1543–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Joshi-Tope G, Gillespie M, Vastrik I, D’Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, Lewis S, Birney E, Stein L. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 2005;33:D428–32.View ArticlePubMedGoogle Scholar
- Kaas CS, Bolt G, Hansen JJ, Andersen MR, Kristensen C. Deep sequencing reveals different compositions of mRNA transcribed from the F8 gene in a panel of FVIII-producing CHO cell lines. Biotechnol J. 2015;10:1081–9.View ArticlePubMedGoogle Scholar
- Kaas CS, Kristensen C, Betenbaugh MJ, Andersen MR. Sequencing the CHO DXB11 genome reveals regional variations in genomic stability and haploidy. BMC Genomics. 2015;16:160.View ArticlePubMedPubMed CentralGoogle Scholar
- Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44:D457–62.View ArticlePubMedGoogle Scholar
- Kaufman RJ. Stress signaling from the lumen of the endoplasmic reticulum: coordination of gene transcriptional and translational controls. Genes Dev. 1999;13:1211–33.View ArticlePubMedGoogle Scholar
- Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36.View ArticlePubMedPubMed CentralGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.View ArticlePubMedPubMed CentralGoogle Scholar
- Le Fourn V, Siffroi-Fernandez S, Ferrand M, Franc J-L. Competition between calnexin and BiP in the endoplasmic reticulum can lead to the folding or degradation of human thyroperoxidase†. Biochemistry (Mosc). 2006;45:7380–8.View ArticleGoogle Scholar
- Le H, Chen C, Goudar CT. An evaluation of public genomic references for mapping RNA-Seq data from Chinese hamster ovary cells. Biotechnol Bioeng. 2015;112:2412–6.View ArticlePubMedGoogle Scholar
- Lewis NE, Liu X, Li Y, Nagarajan H, Yerganian G, O’Brien E, Bordbar A, Roth AM, Rosenbloom J, Bian C, Xie M, Chen W, Li N, Baycin-Hizal D, Latif H, Forster J, Betenbaugh MJ, Famili I, Xu X, Wang J, et al. Genomic landscapes of Chinese hamster ovary cell lines as revealed by the Cricetulus griseus draft genome. Nat Biotechnol. 2013;31:759–65.View ArticlePubMedGoogle Scholar
- Liu L, Feizi A, Österlund T, Hjort C, Nielsen J. Genome-scale analysis of the high-efficient protein secretion system of Aspergillus oryzae. BMC Syst Biol. 2014;8:73.View ArticlePubMedPubMed CentralGoogle Scholar
- Mi H, Muruganujan A, Casagrande JT, Thomas PD. Large-scale gene function analysis with the PANTHER classification system. Nat Protoc. 2013;8:1551–66.View ArticlePubMedGoogle Scholar
- Nishimiya D. Proteins improving recombinant antibody production in mammalian cells. Appl Microbiol Biotechnol. 2014;98:1031–42.View ArticlePubMedGoogle Scholar
- Parkhomchuk D, Borodina T, Amstislavskiy V, Banaru M, Hallen L, Krobitsch S, Lehrach H, Soldatov A. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 2009;37:e123.View ArticlePubMedPubMed CentralGoogle Scholar
- Patil C, Walter P. Intracellular signaling from the endoplasmic reticulum to the nucleus: the unfolded protein response in yeast and mammals. Curr Opin Cell Biol. 2001;13:349–55.View ArticlePubMedGoogle Scholar
- Pearson WR. An introduction to sequence similarity (‘homology’) searching. Curr Protoc Bioinformatics. 2013;Chapter 3:Unit3.1. Ed. Board Andreas Baxevanis Al.PubMedGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40.View ArticlePubMedGoogle Scholar
- Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011;27:863–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504.View ArticlePubMedPubMed CentralGoogle Scholar
- Sung YH, Song YJ, Lim SW, Chung JY, Lee GM. Effect of sodium butyrate on the production, heterogeneity and biological activity of human thrombopoietin by recombinant Chinese hamster ovary cells. J Biotechnol. 2004;112:323–35.View ArticlePubMedGoogle Scholar
- Suzuki R, Shimodaira H. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 2006;22:1540–2.View ArticlePubMedGoogle Scholar
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999;22:281–5.View ArticlePubMedGoogle Scholar
- Vembar SS, Brodsky JL. One step at a time: endoplasmic reticulum-associated degradation. Nat Rev Mol Cell Biol. 2008;9:944–57.View ArticlePubMedPubMed CentralGoogle Scholar
- Walsh G. Biopharmaceutical benchmarks 2014. Nat Biotechnol. 2014;32:992–1000.View ArticlePubMedGoogle Scholar
- Wu G, Feng X, Stein L. A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 2010;11:R53.View ArticlePubMedPubMed CentralGoogle Scholar
- Wulhfard S, Baldi L, Hacker DL, Wurm F. Valproic acid enhances recombinant mRNA and protein levels in transiently transfected Chinese hamster ovary cells. J Biotechnol. 2010;148:128–32.View ArticlePubMedGoogle Scholar
- Xu X, Nagarajan H, Lewis NE, Pan S, Cai Z, Liu X, Chen W, Xie M, Wang W, Hammond S, Andersen MR, Neff N, Passarelli B, Koh W, Fan HC, Wang J, Gui Y, Lee KH, Betenbaugh MJ, Quake SR, et al. The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line. Nat Biotechnol. 2011;29:735–41.View ArticlePubMedPubMed CentralGoogle Scholar
- Yang J, Kiefer SM, Rauchman M. Characterization of the gene encoding mouse retinoblastoma binding protein-7, a component of chromatin-remodeling complexes. Genomics. 2002;80:407–15.View ArticlePubMedGoogle Scholar
- Yoshida H. ER stress and diseases. FEBS J. 2007;274:630–58.View ArticlePubMedGoogle Scholar