Skip to main content


Comprehensive proteomic analysis of bovine spermatozoa of varying fertility rates and identification of biomarkers associated with fertility

  • 11k Accesses

  • 149 Citations



Male infertility is a major problem for mammalian reproduction. However, molecular details including the underlying mechanisms of male fertility are still not known. A thorough understanding of these mechanisms is essential for obtaining consistently high reproductive efficiency and to ensure lower cost and time-loss by breeder.


Using high and low fertility bull spermatozoa, here we employed differential detergent fractionation multidimensional protein identification technology (DDF-Mud PIT) and identified 125 putative biomarkers of fertility. We next used quantitative Systems Biology modeling and canonical protein interaction pathways and networks to show that high fertility spermatozoa differ from low fertility spermatozoa in four main ways. Compared to sperm from low fertility bulls, sperm from high fertility bulls have higher expression of proteins involved in: energy metabolism, cell communication, spermatogenesis, and cell motility. Our data also suggests a hypothesis that low fertility sperm DNA integrity may be compromised because cell cycle: G2/M DNA damage checkpoint regulation was most significant signaling pathway identified in low fertility spermatozoa.


This is the first comprehensive description of the bovine spermatozoa proteome. Comparative proteomic analysis of high fertility and low fertility bulls, in the context of protein interaction networks identified putative molecular markers associated with high fertility phenotype.


Male infertility is a major problem for mammalian reproduction. The nature of sub-fertility due to the male is as complex as that of the female [1]. Infertility due to male factor contributes approximately 40% of the infertility cases in humans. For this reason it is very important to investigate the factors that affect male fertility. Here we used bovine spermatozoa to model human male fertility because cattle provide several advantages as a model for male factor infertility. These include good breeding records fertility data records and progeny records. In cattle breeding, Artificial insemination (AI), a common breeding technique, utilizes semen from genetically superior sires to inseminate cows. In the United States more than ~70% of cows are bred by AI but only ~50% of these matings result in successful full term pregnancy [2]. The underlying molecular events/mechanisms that determine the fertilizing potential of a semen sample are not well defined. A thorough understanding of these mechanisms is essential for obtaining consistently high reproductive efficiency and to ensure lower cost and time-loss by breeder.

Fertility traits of semen can be categorized as compensable or uncompensable [1, 37]. Defects in compensable traits (motility and morphology) can be overcome by increasing the number of spermatozoa per insemination [1]. Defects in uncompensable traits affect the function of spermatozoa during the later stages of fertilization and in embryonic development [1, 8] and as such cannot be compensated. Uncompensable traits include nuclear vacuoles [9], morphological deficiencies that do not suppress movement [4], defective chromatin structure [10]. Low fertility in bulls has an uncompensable component that includes reduced cleavage rate and delayed pronuclear formation following in vitro fertilization [1, 11]. Currently available fertility assays assess the defects that affect functional competence of spermatozoa (i.e. capacitation, acrosome reaction, sperm-oocyte interaction) [8, 12], however these cannot definitively predict fertility. At present, the molecular nature of sperm fertility defects or biomarkers for accurate fertility prediction is not known [13].

Spermatozoa are transcriptionally inactive so the only comprehensive method to understand the molecular functions in spermatozoa is via proteomics [13]. Published proteomic studies with bull spermatozoa described the sub-proteome of the sperm and functions of proteins from its surrounding cells. Accessory gland (AG) proteins were shown to modulate important sperm functions after ejaculation and in the female reproductive tract such as capacitation, acrosome reaction, sperm-oocyte interaction, and sperm protection [14]. It is known that fertile associated antigen (FAA), a heparin binding protein from seminal vesicles and prostate glands, binds to spermatozoa membrane and modulates heparin-sperm interactions that are indicative of fertility [15]. Two seminal plasma proteins such as, prostaglandin-D-synthetase and osteoponin were more abundant in the semen of high fertility bulls when compared to low fertility bulls [16, 17].

Here we describe a comprehensive proteomic analysis of bull sperm using differential detergent fractionation (DDF) two-dimensional liquid chromatography followed by electrospray ionization tandem mass spectrometry (DDF 2-LC ESI MS2; [18]). We compared protein expression profiles of sperm from high and low fertility bulls to characterize the differences in fertility at the protein level. Our results show that expression of 2051 and 2281 proteins was specific to high and low fertility bull spermatozoa, respectively and 1518 proteins were common to both. Differential expression of 125 proteins was significant between high and low fertility bull spermatozoa and these proteins are potential biomarkers for bovine male fertility. Biological systems utilize highly complex, interrelated metabolic and signaling pathways to function. Therefore, to identify signaling pathways involved in fertility, we carried out systems modeling of our proteomic datasets using Gene Ontology (GO) and Ingenuity Pathway Analysis (IPA). We identified differences in the signaling pathways between high and low fertility bull spermatozoa and found that EGF and PDGF signaling pathways were specific to high fertility.


Proteome profiles of spermatozoa from high and low fertility bulls

We identified 3569 and 3799 proteins in high and low fertility group spermatozoa respectively (see additional file 1). Among these 1518 (20.4%) were common to both groups and 2051 and 2281 proteins were unique to high and low fertility groups respectively (Figure 1). Only those proteins identified by at least three peptides were included in the analysis for differential expression and we identified 125 proteins as differentially-expressed between the high and low fertility spermatozoa. Compared to low fertility bull spermatozoa, expression of 74 proteins increased and there was a decrease in the expression of 51 proteins in high fertility spermatozoa (Table 1). Only a small proportion of proteins identified in this study have been previously described (15.1% of the high fertility group specific and 14.3% of the low fertility group specific proteins (Figure 1)). The majority of the identified proteins are 'predicted' (i.e. predicted based on sequence similarity to known proteins in other species and are frequently found in NRPD database for species that have had their genomes sequenced [19]). We contributed to the annotation of the newly sequenced bovine genome by experimentally confirming the in vivo expression of 4,313 electronically predicted proteins (see additional file 1). We also identified 10.6% and 9.8% 'hypothetical' (i.e. proteins predicted from nucleic acid sequences and that have not been shown to exist by experimental protein chemical evidence [20]) proteins specific to high fertility and low fertility spermatozoa respectively.

Figure 1

Comparison of proteins identified in high fertility and low fertility spermatozoa. Distribution of predicted, known and hypothetical proteins is shown. a known proteins, b predicted proteins, c hypothetical proteins.

Table 1 Differentially expressed proteins.

Predicted and hypothetical proteins do not have any functional annotation associated with them and they represent ~80% of differentially expressed proteins between high and low fertility spermatozoa (Table 1). This poses a problem for meaningful biological modeling of our data without carrying out some functional annotation first. Therefore, we annotated all differentially expressed proteins in our data sets using AgBase GO resources.

Membrane and nuclear proteins

Membrane and nuclear proteins are fundamental for inter and intra cellular signaling and are thus fundamental for modeling cell-cell interactions. Sperm oocyte fusion is a key element for fertilization. This process is facilitated by sperm surface proteins and leads to specific binding of the sperm surface-active component with the egg zona pellucida and, ultimately, sperm-egg fusion [21]. To identify proteins from the sperm membrane and the nucleus which function in cell fusion, we focused on membrane and nuclear proteins identified in our datasets. Based on the GO associations of known proteins, 40.6% (395) are membrane proteins. We also identified 112 nuclear proteins based on GO associations. Biological process annotation of membrane proteins revealed that majority of membrane proteins involved in transport (33%), cell communication (18%) and metabolism (17%).

We GO annotated all differentially expressed proteins and applied the generic GO Slim [22] to identify 7 functional super-categories represented in differentially expressed proteins in high fertility spermatozoa. Most GO Slim categories, including processes such as metabolism, cell communication and cell motility showed overall up regulation of protein expression in the high fertility group while transport proteins showed an overall down regulation in the high fertility group (Figure 2).

Figure 2

Overall effects in GO Slims of differentially expressed proteins of high and low fertility spermatozoa. Biological process GO annotations of all significantly altered proteins between high and low fertility spermatozoa were used to generate GO Slims. For each GO Slim, the difference in the numbers of proteins with increased expression and the number of proteins with decreased expression (relative to low fertility spermatozoa) was calculated to estimate the net regulatory effect.

High fertility and low fertility sperm proteomes: molecular network and pathway analysis

Protein identification from biological samples on a global scale is important. However, there is a need to move beyond this level of analysis; Instead of simply enumerating a list of proteins, the analysis needs to include their interactions as parts of complexes, pathways and biological networks. To achieve this level of analysis with our high fertility and low fertility spermatozoa proteomic datasets we used Ingenuity Pathway Analysis (IPA). At IPA thresholds for significance, 71, and 73 networks and 68, and 73 functions/diseases were significantly represented in the proteomes of high fertility and low fertility spermatozoa respectively. The top 10 functions/diseases (ranked based on significance), and the associated signaling pathways are shown in Table 2 and Table 3 for proteomes of high and low fertility groups respectively. Analysis of the top 10 functions revealed that functions like cellular movement, cell to cell signaling and interaction were identified only in the high fertility sperm proteome (Table 2). Whereas, functions like cell death and reproductive system disease were identified only in the low fertility sperm proteome (Table 3).

Table 2 Top ten functions/diseases and their respective top ten signaling pathways in high fertility group spermatozoa.
Table 3 Top ten functions/diseases and their respective top ten signaling pathways in low fertility group spermatozoa.

Compared to low fertility sperm proteome (9), the high fertility sperm proteome (20) had a 2-fold enrichment in signaling pathways. However, the number of significant metabolic pathways represented was comparable between the low (8) and high (9) fertility spermatozoa. Epidermal growth factor (EGF) signaling was the most prominent signaling pathway specific to high fertility sperm (Figure 3). EGF signaling is known to promote proliferation, survival, and differentiation of a wide variety of mammalian cells [23]. In addition to the EGF signaling pathway, platelet derived growth factor (PDGF) signaling, peroxisome proliferated activator receptor (PPAR) signaling, interleukin(IL) -4 signaling, NF-kβ signaling, chemokine signaling, and insulin growth factor (IGF)-1 signaling were identified only in high fertility spermatozoa. In low the fertility group, Cell cycle: G2/M DNA damage check point regulation was the most significant pathway followed by integrin signaling.

Figure 3

EGF signaling pathway generated by the Ingenuity Pathway Analysis (IPA) software. EGF and PDGF signaling pathways were the top two pathways in the top 10 functions/diseases associated with the high fertility spermatozoa (Table 2). Each node represents a protein; proteins in shaded nodes were found in the high fertility spermatozoa dataset (see additional file 1) while proteins in clear nodes were not found in the high fertility spermatozoa dataset.

Proteins with significantly altered expression: molecular network and pathway analysis

Systems analysis of global proteomes revealed that some signaling pathways are differentially represented between the high and low fertility group spermatozoa. To further analyze these differentially expressed pathways, we carried out IPA analysis with just the 125 differentially expressed proteins. In high fertility spermatozoa, expression of 74 proteins was increased when compared to low fertility spermatozoa. IPA analysis identified three significant networks with scores of 22, 19, and 13 respectively. Proteins identified in the top three networks are participants in EGF signaling, PDGF signaling, oxidative phosophorylation, and pyruvate metabolism pathways. Expression of two proteins, ATP synthase, H+ transporting, mitochondrial F1 complex (ATP5B), and cytochrome c oxidase subunit III (COX3) involved in oxidative phosphorylation and casein kinase II involved in EGF signaling and PDGF signaling were higher in the high fertility spermatozoa compared to low fertility spermatozoa (Table 1). IPA also identified pyruvate metabolism as the most significant pathway in up regulated proteins of high fertility spermatozoa. In the low fertility sperm proteome, expression of 51 proteins increased when compared to high fertility spermatozoa. IPA analysis identified two significant networks in highly expressed proteins of low fertility sperm. Proteins identified in the top two significant networks are participants in integrin signaling and estrogen receptor signaling.


Male fertility can be described as the success by spermatozoa to fertilize oocytes and of the resulting zygotes continue on through embryonic and fetal development until birth [11]. In this study we used bovine spermatozoa to study fertility as it can serve as a model for understanding human male infertility and reproductive diseases. Studying Bovine male fertility on its own merit has implications in agro-economics involving cattle industry worldwide.

A spermatozoon must reach the site of fertilization and be capacitated for successful fertilization to occur. A subsequent step is the acrosome reaction characterized by fusion of a spermatozoon outer acrosomal membrane with overlying plasma membrane [8]. The molecular mechanisms and signal transduction pathways mediating the processes of capacitation and acrosome reaction have been partially defined [8]. Bull sperm cytosolic fraction proteomic analysis showed enrichment for tyrosine kinases which are essential for phosphorylation of specific sperm proteins during capacitation [24]. The abundance of a variety of proteins from cells surrounding the sperm has been proposed to indicate male fertility [2, 14, 15]. Most of the studies used 2-dimensional electrophoresis (2-DE) for isolation and identification of sperm proteins [13, 2528]. To our knowledge this is the first comprehensive non-electrophoretic proteomic study of bull sperm proteome. The aim of our study was to identify proteins that were differentially expressed between high and low fertility bull spermatozoa and interrelated metabolic and signaling pathways that have a role in fertility.

We identified 125 proteins as differentially expressed in between the high and low fertility sperm even though 1518 proteins were common to both groups and about 2000 were unique to each. The reasons for this apparent discrepancy are that we took a conservative approach to the statistical analysis: only proteins identified by at least three peptides were included in the analysis for differential expression and the statistical method used in ProtQuant is very conservative. ProtQuant specifically address the issue of "missing" mass spectra that occurs in all 2-D LC MS2 -based expression proteomics methods. No other published method (either non-isotopic or isotopic) addresses this issue. Missing mass spectra are due to the inherent limitations of the mass spectrometers, the probabilistic nature of sampling and the cutoffs used to determine "true" assignments of peptides to mass spectra [29]. ProtQuant is highly conservative method which is based on sum of Xcorr method itself increases the specificity of spectral counting and reduce the type I errors of differential expression. Regardless, proteins were analyzed from each of three of the areas represented in Figure 1 and differentially-expressed proteins occurred in all three (i.e. proteins unique to the high and low fertility sperm as well as those common to both).

From proteome profiles of specific cells or tissues, one acquires large datasets that are inherently complex. As a result we consider it beneficial to model our bovine sperm proteome data sets using GO and IPA. From GO associations of differentially expressed proteins we found that there was a comparative up regulation of three biological processes in high fertility spermatozoa: metabolism, cell communication and cell motility (Fig 2).

Up regulation of metabolism is consistent with the fact that capacitation is coupled to a specific type of metabolism, that is glycolysis or oxidative respiration [30]. Pyruvate metabolism and glycolysis were the top most significant metabolic pathways represented in high fertility sperm proteome by IPA. In glycolysis, expression of pyruvate kinase (PKM2) was higher in high fertility spermatozoa. PKM2 catalyzes the production of pyruvate and ATP from phosphoenol pyruvate. Pyruvate formed in this process serves as an energy source for cells [31]. Impaired or lower pyruvate metabolism could limit the cell's ability to produce energy and this could be one of the reasons for reduced fertility in the low fertility group.

Expression of COX 3 and ATP5B involved in oxidative respiration was higher in high fertility spermatozoa compared to low fertility spermatozoa. COX3 is a member of the large transmembrane protein complex found in the mitochondrion and is the last protein in the electron transport chain. Coupling of electron transport to oxidative respiration maintains the high mitochondrial transmembrane potential required for mitochondrial ATP production [32]. ATP5B catalyzes the production of ATP from ADP in the presence of a proton gradient across the mitochondrial membrane and this ATP is utilized for the motility of sperm and capacitation [33].

Communication between sperm and oocyte is critical for successful fertilization. We found that there was up regulation of cell communication in the high fertility sperm proteome when compared to low fertility sperm proteome (Figure 2). To bring about cell to cell communication several signaling pathways are necessary. EGF signaling and PDGF signaling were the top two significant signaling pathways identified in high fertiliy spermatozoa. EGF and PDGF signaling pathways stimulate tyrosine phosphorylation of various MAP kinases and their upstream activators MEK1, MEK2 and MEKK [34, 35]. EGF signaling has an important role in sperm capacitation as it stimulates tyrosine phosphorylation of many proteins [36]. In addition, EGF signaling also activates phospholipase C (PLC) [36] (Figure 3). PLC is important for the acrosome reaction (AR), fertilization and embryo development. PLC catalyzes the production of inositol 1, 4, 5-triphosphate (IP3) from phosphatidylinositol 4, 5-biphosphate. IP3 generated by PLC activates the extra cellular calcium influx required for the AR via binding to the IP3 receptor (IP3R) gated calcium channel located on the acrosome membrane [37]. Mutations in mouse PLCB1 reduced the AR rate, fertilization rate and embryo development [38]. EGF signaling was specific to high fertility bull sperm. Defects in EGF signaling in low fertility spermatozoa may prevent capacitation.

Expression of casein kinase 2 (CKII) prime poly peptide in EGF signaling was higher in high fertility spermatozoa compared to low fertility spermatozoa (Table 1). CKII is preferentially expressed in late stages of spermatogeneis and is involved in sperm chromatin decondensation after sperm oocyte fusion [39, 40]. CKII deficient mice are infertile with oligospermia and globozoospermia[40]. EGF signaling also induces actin polymerization in bovine sperm capcitation [41]. Actin polymerization is essential for incorporation of sperm into egg cytoplasm [42] and for sperm nuclei decondensation [43].

Comparing the proteome profiles of bull sperm of high and low fertility showed some molecular features associated with low fertility. Cell cycle: G2/M DNA damage check point regulation was the topmost significant signaling pathway followed by integrin signaling in low fertility bull sperm (Table 3). The G2/M DNA damage checkpoint could help in maintaining the integrity of the genome during different stages of development. Progression through different phases of the cell cycle requires the sequential activation of various cyclin dependent kinases and these kinases in turn are regulated by integrin signaling. Integrin signals are necessary for cells to traverse the cell division cycle [44]. These two pathways may be a compensatory response for reproductive system disease function which was identified only in low fertility sperm (Table 3).

In addition to differences in signaling and metabolic pathways between high and low fertility spermatozoa, we identified differences in protein expression that had implications in sperm motility. Expression of A-kinase anchor protein-4 (AKAP4) was significantly higher in high fertility spermatozoa (Table 1). AKAP4 is a major fibrous sheath protein of the principal piece of the sperm flagellum. AKAP4 recruits Protein kinase A to the fibrous sheath and facilitates local phosphorylation to regulate flagellar function in humans [45]. It also serves as a scaffolding protein for signaling proteins and proteins involved in metabolism. Higher expression of AKAP4 in the high fertility group sperm could result in higher motility.


In summary, this is the first comprehensive description of the spermatozoa proteome of bovine. Comparative proteomic analysis of high fertility and low fertility bulls, in the context of protein interaction networks identified putative molecular markers associated with high fertility phenotype. We observed marked differences in signaling and metabolic pathways between high fertility and low fertility spermatozoa that have implications in sperm capacitation, acrosomal reaction and sperm-oocyte communication.


Selection of high and low fertility bulls

Frozen semen samples and bull fertility data (see additional file 2) from six mature and progeny tested Holstein bulls with satisfactory semen quality were provided by Alta Genetics (Watertown, WI).

Sample and Data Sources

The fertility data were established by a progeny testing program named Alta Advantage®, which is the industry's most reliable source of fertility information. It consisted of insemination records collected from 180 well managed partner dairy farms located in different geographical regions across the United States. This breeding program provided the advantages of DNA verification of the paternity of the offspring, and diagnosed pregnancies by veterinary palpation, instead of just relying on non-return rates 60–90 days after breeding.

Bull Fertility Prediction

To predict fertility of the bulls from the given source, a sub-set of data were generated consisting of 962,135 insemination records from 934 bulls with an average of 1,030 breedings ranging from 300 to 15,194. The environmental and herd management factors that influence fertility performance of sires were adjusted using threshold models which were similar to previously published models by Zwald et al [46, 47]. Parameters estimation and fertility prediction were obtained using Probit.F90 software developed by Y. M. Chang [48].

Therefore, for the definition of fertility, instead of relying only on the number of pregnant cows (verified using palpation by a veterinarian or ultrasound examination) divided by the total number of cows examined for pregnancy, we considered the outcome of each breeding event and adjusted the environmental factors such as the effects of herd-year-month, parity, cow, days in milk, sire proven status (young, proven, colored) in order to rank the bulls based on their breeding values for fertility. Further, the fertility of each bull was calculated and expressed as the percent deviation of its conception from the average conception of all bulls having at least 300 breeding in the data set.

Selection of high and low fertility bulls

For this study, we used an arbitrary threshold for classifying high and low fertility bulls. However, the bulls scoring highest and lowest fertility deviation from average with highest reliability (>1,000 breeding/bull) were selected for this study. The differences in the average fertility indexes between high and low fertility groups were 5.46% which was obtained from bulls having adequate records for higher reliability. While three bulls which were scored 5.3% above the average were considered high fertile, three bulls which were scored 10.76% below the average were defined as low fertility (see additional file 2). Two separated pools of sperm cells (3 × 108) were constituted by mixing equal amounts of sperm cells from either three low or three high fertility bulls. The experiment was replicated three times.

Isolation of pure sperm cells

Spermatozoa were collected from high and low fertility bulls and frozen in 0.25 ml straws. For each bull, the total spermatozoa collected were purified by Percoll gradient centrifugation: 90% Percoll solution in water was prepared with DL-Lactate (19 μM), CaCl2 (2 μM), NaHCO3 (25 mM), MgCl2 (400 μM), KCl (3 μM), NaH2PO4 (310 μM), NaCl (2 mM) and Hepes (10 mM). 90% Percoll solution was diluted to 45% with sperm diluent medium (1 mM pyruvate, 10 mM Hepes, 0.021 mM DL-Lactate in Tyrode's salt solution, pH 7.4). A density gradient of Percoll was prepared in an Eppendorf tube (0.1 ml of 90% fraction under 1 ml of the 45% fraction). Spermatozoa were thawed at 35°C for 1 min and layered on top of the percoll gradient. The spermatozoa were pelleted by centrifugation (956 g; 15 min) followed by two washes in phosphate-buffered solution (PBS) (956 g; 5 min,). The total sperm count was obtained using an Improved Neubauer Hemacytometer and 108 sperm cells were aliquoted and stored at -80°C.

Protein extraction by DDF

DDF sequentially extracts proteins from different cellular compartments using a series of detergents and this off-line pre-fractionation step in sample preparation increases the proteome coverage. Another advantage of using DDF is that based on the DDF fractions from which proteins are identified, proteins can be found in different cellular locations. Proteins were isolated using DDF as previously described [18]. Cytosolic proteins were extracted by six sequential incubations in a buffer containing digitonin (10 min each); next a fraction containing predominantly membrane proteins was isolated by incubating the cells in 10% Triton X-100 buffer for 30 min and then removing the soluble protein. Nuclear DDF buffer containing deoxycholate (DOC) was then added to the remaining insoluble material and subjected to freeze-thawing to disrupt the nucleus. Nuclear proteins were collected from the resulting soluble fraction and the sample was then aspirated through an 18 g needle and treated with a mixture of DNase I (50U, Invitrogen, Carlsbad CA;) and RNase A (50 mg; Sigma-Aldrich, St Louis, MO) at 37°C for 1 h) to digest nucleic acids. Any remaining pellet, containing the least soluble proteins, was treated with a buffer containing 5% SDS.


Proteomic analysis was carried out with triplicate samples of spermatozoa from the high fertility group and low fertility group spermatozoa as described [19]. Proteins were precipitated with 25% tricholoroacetic acid to remove salts and detergents. Protein pellets were resuspended in 0.1 M ammonium bicarbonate with 5% HPLC grade acetonitrile (ACN), reduced (5 mM, 65°C, 5 min), alkylated (iodoacetamide, 10 mM, 30°C, 30 min) and then trypsin digested until there was no visible pellet (sequencing grade modified trypsin, Promega; 1:50 w/w 37°C, 16 h). Peptides were desalted using a peptide macrotrap (Michrom BioResources, Inc., Auburn, CA) and eluted using a 0.1% trifluoroacetic acid, 95% ACN solution. Desalted peptides were dried in a vacuum centrifuge and resuspended in 20 μL of 0.1% formic acid and 5% ACN. LC analysis was accomplished by strong cation exchange(SCX) followed by reverse phase liquid chromatography (RP-LC) coupled directly in line with an ESI ion trap mass spectrometer (LCQ Deca XP Plus; ThermoElectron Corporation; San Jose, CA). Samples were loaded into a LC gradient ion exchange system (Thermo Separations P4000 quaternary gradient pump coupled with a 0.32 × 100 mm BioBasic strong cation exchange column). A flow rate of 3 μL/min was used for both SCX and RP columns.

A salt gradient was applied in steps of 0, 10, 15, 20, 25, 30, 35, 40, 45, 50, 57, 64, 90, and 700 mM ammonium acetate in 5% ACN, 0.1% formic acid, and the resultant peptides were loaded directly into the sample loop of a 0.18 × 100 mm BioBasic C18 reverse phase liquid chromatography column of a Proteome X workstation (ThermoElectron). The reverse phase gradient used 0.1% formic acid in ACN and increased the ACN concentration in a linear gradient from 5% to 30% in 20 min and then 30% to 95% in 7 min, followed by 5% for 10 min for 0, 10, 15, 25, 30, 45, 64, 90, and 700 mM salt gradient steps. For 20, 35, 40, 50 and 57 mM salt gradient steps ACN concentration was increased in a linear gradient from 5% to 40% in 65 min 95% for 15 min and 5% for 20 min.

The mass spectrometer was configured to optimize the duty cycle length with the quality of data acquired by alternating between a single full MS scan followed by three tandem MS scans on the three most intense precursor masses (as determined by Xcalibur software in real time) from the full scan. The collision energy was normalized to 35%. Dynamic mass exclusion windows were set at 2 min, and all of the spectra were measured with an overall mass/charge (m/z) ratio range of 300–1700.

All searches were done using TurboSEQUEST™ (Bioworks Browser 3.2; ThermoElectron). Mass spectra and tandem mass spectra were searched against an in silico trypsin-digested database of bovine RefSeq proteins downloaded from the National Center for Biotechnology Institute [NCBI; 12/26/2006; 24,853 entries]. Trypsin digestion including mass changes due to cysteine carbamidomethylation (C, 57.02 Da) and methionine mono- and di-oxidation (15.99 Da and 32 Da), was included in the search criteria. The peptide (MS precursor ion) mass tolerance was set to 1.5 Da and the fragment ion (MS2) mass tolerance was set to 1.0 Da. Rsp Value less than 5.

As a primary filter we first limited our Sequest search output to include only peptides ≥ 6 amino acids long, with ΔCn ≥ 0.08 and Sequest cross correlation (Xcorr) scores of 1.5, 2.0 and 2.5 for +1, +2, and +3 charge states, respectively. We next used a decoy database search strategy [49] (using the same primary filter for the real database search) to calculate P values for peptide identifications as this allows us to assign the probability of a false identification based on the real data from the experiment itself [4952]. Since the accuracy of peptide identification depends on the charge state we calculated P values for +1, +2, and +3 charge states separately. The probability that peptide identification from the original database is really a random match (P value) is estimated based on the probability that a match against the decoy database will achieve the same Xcorr [51, 53]. Protein probabilities were calculated exactly as described [54, 55] using only peptides with a P < 0.05 and only those proteins were used for further modeling. All protein identifications and their associated MS data have been submitted to the PRoteomics IDEntifications database (PRIDE ;[56]) and PRIDE accession numbers are 1883–1888.

Differential protein expression

Label free quantification approaches design to quantify relative protein abundances directly from high throughput proteomic analyses with out labeling techniques. Here, we used ProtQuant [29], a java based tool for label free quantification that uses a spectral counting method with increased specificity (and thus decreased false positive i.e. type I errors). This increased specificity is achieved by incorporating the quantitative aspects of the Sequest cross correlation (XCorr) into the spectral counting method. ProtQuant also computes the statistical significance of differential expression of control and treatment for each protein using one-way ANOVA (α ≤ 0.05). This method requires at least 3 peptides for each protein from the combination of the control and treatment before to calculate a p-value.

Gene Ontology Annotation

We used Gene Ontology (GO) resources and tools available at AgBase [57] to identify the molecular functions and biological processes represented in differentially expressed proteins in our datasets. We used GORetriver tool to obtain all existing GO annotations available for known proteins in our datasets. We first GO-annotated differentially expressed proteins in our datasets using existing annotations from probable orthologs with ≥90% sequence identity using the UniRef 90 database. Proteins without annotation at UniRef 90, but between 70–90% sequence identities to presumptive orthologs with GO annotation were GO-annotated using GOanna tool [22]. Biological process annotations for these proteins were grouped into more generalized categories using GOSlim viewer [22].

Modeling using Ingenuity pathway analysis

To gain insights into the biological pathways and networks that are significantly represented in our proteomic datasets we used Ingenuity Pathways Analysis (IPA; Ingenuity Systems, California). Currently IPA accepts gene/protein accession numbers from human, mouse, and rats only. Therefore, to use IPA, we mapped bovine proteins from our datasets to their corresponding human orthologs by identifying reciprocal-best-BLAST hits and uploaded these accession numbers into IPA. IPA selects "focus genes" to be used for generating biological networks. Focus genes are based on proteins from our datasets that are mapped to corresponding gene objects in the Ingenuity Pathways Knowledgebase (IPKB) and are known to interact with other genes based on published, peer reviewed content in the IPKB. Based on these interactions IPA builds networks with a size of no more than 35 genes or proteins. A P-value for each network and canonical pathway is calculated according to the fit of the user's set of significant genes/proteins. IPA computes a score for each network from P-value and indicates the likelihood of the focus genes in a network being found together due to chance. We selected networks scoring ≥ 2, which have > 99% confidence of not being generated by chance [58, 59].

Biological functions are assigned to each network by using annotations from scientific literature and stored in the IPKB. Fisher exact test is used to calculate the P-value determining the probability of each biological function/disease or pathway being assigned by chance. We used P ≤ 0.05 to select highly significant biological functions and pathways represented in our proteomic datasets [58].


  1. 1.

    Saacke RG, Dalton JC, Nadir S, Nebel RL, Bame JH: Relationship of seminal traits and insemination time to fertilization rate and embryo quality. Animal reproduction science. 2000, 60–61: 663-677. 10.1016/S0378-4320(00)00137-8

  2. 2.

    Killian GJ: High – Fertility Proteins Enhance Reproduction Rates in Dairy Cattle. 1999

  3. 3.

    Watson PF: The causes of reduced fertility with cryopreserved semen. Animal reproduction science. 2000, 60–61: 481-492. 10.1016/S0378-4320(00)00099-3

  4. 4.

    DeJarnette JM, Saacke RG, Bame J, Vogler CJ: Accessory sperm: their importance to fertility and embryo quality, and attempts to alter their numbers in artificially inseminated cattle. Journal of animal science. 1992, 70 (2): 484-491.

  5. 5.

    Dejarnette JM: The effect of semen quality on reproductive efficiency. The Veterinary clinics of North America. 2005, 21 (2): 409-418. 10.1016/j.cvfa.2005.02.011

  6. 6.

    Love CC: The sperm chromatin structure assay: a review of clinical applications. Animal reproduction science. 2005, 89 (1–4): 39-45. 10.1016/j.anireprosci.2005.06.019

  7. 7.

    Evenson DP: Loss of livestock breeding efficiency due to uncompensable sperm nuclear defects. Reproduction, fertility, and development. 1999, 11 (1): 1-15. 10.1071/RD98023

  8. 8.

    Braundmeier AG, Miller DJ: The search is on: finding accurate molecular markers of male fertility. Journal of dairy science. 2001, 84 (9): 1915-1925.

  9. 9.

    Saacke RG, DeJarnette JM, Bame JH, Karabinus DS, Whitman SS: Can spermatozoa with abnormal heads gain access to the ovum in artificially inseminated super- and single-ovulating cattle?. Theriogenology. 1998, 50 (1): 117-128. 10.1016/S0093-691X(98)00119-8

  10. 10.

    Ballachey BE, Evenson DP, Saacke RG: The sperm chromatin structure assay. Relationship with alternate tests of semen quality and heterospermic performance of bulls. Journal of andrology. 1988, 9 (2): 109-115.

  11. 11.

    Eid LN, Lorton SP, Parrish JJ: Paternal influence on S-phase in the first cell cycle of the bovine embryo. Biology of reproduction. 1994, 51 (6): 1232-1237. 10.1095/biolreprod51.6.1232

  12. 12.

    Aitken RJ: Sperm function tests and fertility. International journal of andrology. 2006, 29 (1): 69-75. discussion 105–108. 10.1111/j.1365-2605.2005.00630.x

  13. 13.

    Pixton KL, Deeks ED, Flesch FM, Moseley FL, Bjorndahl L, Ashton PR, Barratt CL, Brewis IA: Sperm proteome mapping of a patient who experienced failed fertilization at IVF reveals altered expression of at least 20 proteins compared with fertile donors: case report. Human reproduction (Oxford, England). 2004, 19 (6): 1438-1447. 10.1093/humrep/deh224

  14. 14.

    Moura AA, Koc H, Chapman DA, Killian GJ: Identification of proteins in the accessory sex gland fluid associated with fertility indexes of dairy bulls: a proteomic approach. Journal of andrology. 2006, 27 (2): 201-211. 10.2164/jandrol.05089

  15. 15.

    McCauley TC, Zhang H, Bellin ME, Ax RL: Purification and characterization of fertility-associated antigen (FAA) in bovine seminal fluid. Molecular reproduction and development. 1999, 54 (2): 145-153. 10.1002/(SICI)1098-2795(199910)54:2<145::AID-MRD6>3.0.CO;2-6

  16. 16.

    Gerena RL, Irikura D, Urade Y, Eguchi N, Chapman DA, Killian GJ: Identification of a fertility-associated protein in bull seminal plasma as lipocalin-type prostaglandin D synthase. Biology of reproduction. 1998, 58 (3): 826-833. 10.1095/biolreprod58.3.826

  17. 17.

    Henault MA, Killian GJ: Effect of homologous and heterologous seminal plasma on the fertilizing ability of ejaculated bull spermatozoa assessed by penetration of zona-free bovine oocytes. Journal of reproduction and fertility. 1996, 108 (2): 199-204.

  18. 18.

    McCarthy FM, Burgess SC, van den Berg BH, Koter MD, Pharr GT: Differential detergent fractionation for non-electrophoretic eukaryote cell proteomics. Journal of proteome research. 2005, 4 (2): 316-324. 10.1021/pr049842d

  19. 19.

    McCarthy FM, Cooksey AM, Wang N, Bridges SM, Pharr GT, Burgess SC: Modeling a whole organ using proteomics: the avian bursa of Fabricius. Proteomics. 2006, 6 (9): 2759-2771. 10.1002/pmic.200500648

  20. 20.

    Lubec G, Afjehi-Sadat L, Yang JW, John JP: Searching for hypothetical proteins: theory and practice based upon original data and literature. Progress in neurobiology. 2005, 77 (1–2): 90-127. 10.1016/j.pneurobio.2005.10.001

  21. 21.

    Bhattacharyya AK, Kanjilal S: Assessment of sperm functional competence and sperm-egg interaction. Molecular and cellular biochemistry. 2003, 253 (1–2): 255-261. 10.1023/A:1026024202288

  22. 22.

    McCarthy FM, Wang N, Magee GB, Nanduri B, Lawrence ML, Camon EB, Barrell DG, Hill DP, Dolan ME, Williams WP: AgBase: a functional genomics resource for agriculture. BMC genomics. 2006, 7: 229- 10.1186/1471-2164-7-229

  23. 23.

    Dikic I: Mechanisms controlling EGF receptor endocytosis and degradation. Biochemical Society transactions. 2003, 31 (Pt 6): 1178-1181.

  24. 24.

    Lalancette C, Faure RL, Leclerc P: Identification of the proteins present in the bull sperm cytosolic fraction enriched in tyrosine kinase activity: a proteomic approach. Proteomics. 2006, 6 (16): 4523-4540. 10.1002/pmic.200500578

  25. 25.

    Holmes-Davis R, Tanaka CK, Vensel WH, Hurkman WJ, McCormick S: Proteome mapping of mature pollen of Arabidopsis thaliana. Proteomics. 2005, 5 (18): 4864-4884. 10.1002/pmic.200402011

  26. 26.

    Hozumi A, Satouh Y, Ishibe D, Kaizu M, Konno A, Ushimaru Y, Toda T, Inaba K: Local database and the search program for proteomic analysis of sperm proteins in the ascidian Ciona intestinalis. Biochemical and biophysical research communications. 2004, 319 (4): 1241-1246. 10.1016/j.bbrc.2004.05.118

  27. 27.

    Martinez-Heredia J, Estanyol JM, Ballesca JL, Oliva R: Proteomic identification of human sperm proteins. Proteomics. 2006, 6 (15): 4356-4369. 10.1002/pmic.200600094

  28. 28.

    Tan Y, Fan L, Luo K, Zhu W, Lu G: [Establishment of the two-dimensional gel electrophoretic protein map of the human sperm head]. Zhonghua nan ke xue = National journal of andrology. 2004, 10 (12): 886-889.

  29. 29.

    Bridges SM, Bryce GB, Wang N, Williams WP, Burgess SC, Nanduri B: ProtQuant: a tool for the label-free quantification of mudPIT proteomics data. BMC Bioinformatics. 2007, 8 (Supple 7): S24- 10.1186/1471-2105-8-S7-S24

  30. 30.

    Ferrandi B, Lange Consiglio A, Chiara F, Uber E, Marchini M, Baglioni A, Carnevali A, Cremonesi F, Porcelli F: Cytochemical study on human spermatozoa metabolism during in vitro capacitation. Andrologia. 1987, 19 (Spec No): 278-283.

  31. 31.

    Pithukpakorn M: Disorders of pyruvate metabolism and the tricarboxylic acid cycle. Molecular genetics and metabolism. 2005, 85 (4): 243-246. 10.1016/j.ymgme.2005.06.006

  32. 32.

    Guthrie HD, Welch GR: Determination of intracellular reactive oxygen species and high mitochondrial membrane potential in Percoll-treated viable boar sperm using fluorescence-activated flow cytometry. Journal of animal science. 2006, 84 (8): 2089-2100. 10.2527/jas.2005-766

  33. 33.

    Peterson RN, Freund M: ATP synthesis and oxidative metabolism in human spermatozoa. Biology of reproduction. 1970, 3 (1): 47-54.

  34. 34.

    Hunter T: When is a lipid kinase not a lipid kinase? When it is a protein kinase. Cell. 1995, 83 (1): 1-4. 10.1016/0092-8674(95)90225-2

  35. 35.

    Payne DM, Rossomando AJ, Martino P, Erickson AK, Her JH, Shabanowitz J, Hunt DF, Weber MJ, Sturgill TW: Identification of the regulatory phosphorylation sites in pp42/mitogen-activated protein kinase (MAP kinase). The EMBO journal. 1991, 10 (4): 885-892.

  36. 36.

    Breitbart H, Naor Z: Protein kinases in mammalian sperm capacitation and the acrosome reaction. Reviews of reproduction. 1999, 4 (3): 151-159. 10.1530/ror.0.0040151

  37. 37.

    Walensky LD, Snyder SH: Inositol 1, 4, 5-trisphosphate receptors selectively localized to the acrosomes of mammalian sperm. The Journal of cell biology. 1995, 130 (4): 857-869. 10.1083/jcb.130.4.857

  38. 38.

    Choi D, Lee E, Hwang S, Jun K, Kim D, Yoon BK, Shin HS, Lee JH: The biological significance of phospholipase C beta 1 gene mutation in mouse sperm in the acrosome reaction, fertilization, and embryo development. Journal of assisted reproduction and genetics. 2001, 18 (5): 305-310. 10.1023/A:1016622519228

  39. 39.

    Mudgal P, Anand SR: Casein kinase II activity of buffalo sperm chromatin. Molecular reproduction and development. 1998, 50 (2): 178-184. 10.1002/(SICI)1098-2795(199806)50:2<178::AID-MRD8>3.0.CO;2-H

  40. 40.

    Xu X, Toselli PA, Russell LD, Seldin DC: Globozoospermia in mice lacking the casein kinase II alpha' catalytic subunit. Nature genetics. 1999, 23 (1): 118-121. 10.1038/12729

  41. 41.

    Breitbart H, Cohen G, Rubinstein S: Role of actin cytoskeleton in mammalian sperm capacitation and the acrosome reaction. Reproduction (Cambridge, England). 2005, 129 (3): 263-268.

  42. 42.

    Sanchez-Gutierrez M, Contreras RG, Mujica A: Cytochalasin-D retards sperm incorporation deep into the egg cytoplasm but not membrane fusion with the egg plasma membrane. Molecular reproduction and development. 2002, 63 (4): 518-528. 10.1002/mrd.10203.

  43. 43.

    Kumakiri J, Oda S, Kinoshita K, Miyazaki S: Involvement of Rho family G protein in the cell signaling for sperm incorporation during fertilization of mouse eggs: inhibition by Clostridium difficile toxin B. Developmental biology. 2003, 260 (2): 522-535. 10.1016/S0012-1606(03)00273-2

  44. 44.

    Giancotti FG, Ruoslahti E: Integrin signaling. Science (New York, NY). 1999, 285 (5430): 1028-1032.

  45. 45.

    Huang Z, Somanath PR, Chakrabarti R, Eddy EM, Vijayaraghavan S: Changes in intracellular distribution and activity of protein phosphatase PP1gamma2 and its regulating proteins in spermatozoa lacking AKAP4. Biology of reproduction. 2005, 72 (2): 384-392. 10.1095/biolreprod.104.034140

  46. 46.

    Zwald NR, Weigel KA, Chang YM, Welper RD, Clay JS: Genetic selection for health traits using producer-recorded data. II. Genetic correlations, disease probabilities, and relationships with existing traits. Journal of dairy science. 2004, 87 (12): 4295-4302.

  47. 47.

    Zwald NR, Weigel KA, Chang YM, Welper RD, Clay JS: Genetic selection for health traits using producer-recorded data. I. Incidence rates, heritability estimates, and sire breeding values. Journal of dairy science. 2004, 87 (12): 4287-4294.

  48. 48.

    Chang YM, Gianola D, Heringstad B, Klemetsdal G: Effects of trait definition on genetic parameter estimates and sire evaluation for clinical mastitis with threshold models. Animal science. 2004, 79: 355-364.

  49. 49.

    Elias JE, Gygi SP: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature methods. 2007, 4 (3): 207-214. 10.1038/nmeth1019

  50. 50.

    Elias JE, Gibbons FD, King OD, Roth FP, Gygi SP: Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nature biotechnology. 2004, 22 (2): 214-219. 10.1038/nbt930

  51. 51.

    Park GW, Kwon KH, Kim JY, Lee JH, Yun SH, Kim SI, Park YM, Cho SY, Paik YK, Yoo JS: Human plasma proteome analysis by reversed sequence database search and molecular weight correlation based on a bacterial proteome analysis. Proteomics. 2006, 6 (4): 1121-1132. 10.1002/pmic.200500318

  52. 52.

    Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP: Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. Journal of proteome research. 2003, 2 (1): 43-50. 10.1021/pr025556v

  53. 53.

    Qian WJ, Liu T, Monroe ME, Strittmatter EF, Jacobs JM, Kangas LJ, Petritis K, Camp DG, Smith RD: Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome. Journal of proteome research. 2005, 4 (1): 53-62. 10.1021/pr0498638

  54. 54.

    Nesvizhskii AI, Keller A, Kolker E, Aebersold R: A statistical model for identifying proteins by tandem mass spectrometry. Analytical chemistry. 2003, 75 (17): 4646-4658. 10.1021/ac0341261

  55. 55.

    MacCoss MJ, Wu CC, Yates JR: Probability-based validation of protein identifications using a modified SEQUEST algorithm. Analytical chemistry. 2002, 74 (21): 5593-5599. 10.1021/ac025826t

  56. 56.

    Martens L, Hermjakob H, Jones P, Adamski M, Taylor C, States D, Gevaert K, Vandekerckhove J, Apweiler R: PRIDE: the proteomics identifications database. Proteomics. 2005, 5 (13): 3537-3545. 10.1002/pmic.200401303

  57. 57.

    McCarthy FM, Bridges SM, Wang N, Magee GB, Williams WP, Luthe DS, Burgess SC: AgBase: a unified resource for functional analysis in agriculture. Nucleic acids research. 2007, D599-603. 35 Database

  58. 58.

    Gerling IC, Singh S, Lenchik NI, Marshall DR, Wu J: New data analysis and mining approaches identify unique proteome and transcriptome markers of susceptibility to autoimmune diabetes. Mol Cell Proteomics. 2006, 5 (2): 293-305.

  59. 59.

    Huang Y, Yan J, Lubet R, Kensler TW, Sutter TR: Identification of novel transcriptional networks in response to treatment with the anticarcinogen 3H-1, 2-dithiole-3-thione. Physiological genomics. 2006, 24 (2): 144-153. 10.1152/physiolgenomics.00258.2005

  60. 60.

    Fazal MA, Palmer VR, Dovichi NJ: Analysis of differential detergent fractions of an AtT-20 cellular homogenate using one- and two-dimensional capillary electrophoresis. Journal of chromatography. 2006, 1130 (2): 182-189. 10.1016/j.chroma.2006.05.053

Download references


This study was the result of collaboration between the Laboratories of Drs. Burgess and Memili.

This study was funded by the Life Sciences and Biology Institute & Mississippi Agricultural and Forestry Experiment Station, Mississippi State University (Manuscript number: J-11138), and by Alta Genetics, Inc. We would like to acknowledge Dr. Susan M Bridges and Bryce Magee for helping us with Biomarker discovery tool and Dr. Tibor Pechan for MS/MS analysis.

Author information

Correspondence to Erdogan Memili.

Additional information

Authors' contributions

DP performed the proteomics sample preparation, data generation, analyzed and interpreted proteomic data, systems biology modeling and analysis and wrote the draft of the manuscript. BN developed the biomarker discovery computational tools, participated in design of this study and helped to interpret the systems biology modeling. AK and JF did sample collection and pre-proteomic sample preparation. EM facilitated sample collection, contributed to design of the study, provided expert knowledge and interpretation in reproductive biology and helped to draft the manuscript. SCB conceived of the study, participated in its design and coordination, helped analyze and interpret the statistical analysis of the proteomics data and helped to draft the manuscript. All authors read and approved the final manuscript.

Divyaswetha Peddinti, Bindu Nanduri and Erdogan Memili contributed equally to this work.

Electronic supplementary material

Additional file 1: Proteins identified by DDF-MudPIT and their distribution in high or low fertility group spermatozoa. Column A show the GI numbers of the identified proteins, Column B indicates the corresponding protein names (assigned by NCBI). Column C shows the protein distribution in high or low fertility group spermatozoa or common to both (HF: High fertility group spermatozoa; LF: Low fertility group spermatozoa; C: common to both). For each protein we provided the information about number of peptides, Sequest cross correlation score (∑Xcorr) and DDF fraction information (DDF1, 2, 3 and 4). DDF sequentially extracts proteins from different sub cellular locations. DDF1, 2, 3, 4 corresponds to cytosolic, membrane, nuclear and cytoskeletal fractions respectively [18, 60]. We identified few proteins in more than one DDF fraction. This may be due to membrane proteins identified in all DDF fractions with increasing number of transmembrane domains in each DDF fraction. Many of the proteins that function in the nucleus at some stage may be present in the cytoplasm and can thus be found in all the fractions [18]. (XLS 1020 KB)

Additional file 2: Fertility data of bulls whose sperm samples were used for this study. For each bull we provided the information about bull number, number of services, percent difference from average breeding rate and standard deviation. Sperm samples from three high fertility (HF) bulls were pooled as HF group, and Sperm from three low fertility (LF) were pooled as LF group. (XLS 18 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Peddinti, D., Nanduri, B., Kaya, A. et al. Comprehensive proteomic analysis of bovine spermatozoa of varying fertility rates and identification of biomarkers associated with fertility. BMC Syst Biol 2, 19 (2008) doi:10.1186/1752-0509-2-19

Download citation


  • Gene Ontology
  • Ingenuity Pathway Analysis
  • Fertility Group
  • Epidermal Growth Factor Signaling
  • Label Free Quantification