A systematic analysis of FDA-approved anticancer drugs

Background The discovery of novel anticancer drugs is critical for the pharmaceutical research and development, and patient treatment. Repurposing existing drugs that may have unanticipated effects as potential candidates is one way to meet this important goal. Systematic investigation of efficient anticancer drugs could provide valuable insights into trends in the discovery of anticancer drugs, which may contribute to the systematic discovery of new anticancer drugs. Results In this study, we collected and analyzed 150 anticancer drugs approved by the US Food and Drug Administration (FDA). Based on drug mechanism of action, these agents are divided into two groups: 61 cytotoxic-based drugs and 89 target-based drugs. We found that in the recent years, the proportion of targeted agents tended to be increasing, and the targeted drugs tended to be delivered as signal drugs. For 89 target-based drugs, we collected 102 effect-mediating drug targets in the human genome and found that most targets located on the plasma membrane and most of them belonged to the enzyme, especially tyrosine kinase. From above 150 drugs, we built a drug-cancer network, which contained 183 nodes (150 drugs and 33 cancer types) and 248 drug-cancer associations. The network indicated that the cytotoxic drugs tended to be used to treat more cancer types than targeted drugs. From 89 targeted drugs, we built a cancer-drug-target network, which contained 214 nodes (23 cancer types, 89 drugs, and 102 targets) and 313 edges (118 drug-cancer associations and 195 drug-target associations). Starting from the network, we discovered 133 novel drug-cancer associations among 52 drugs and 16 cancer types by applying the common target-based approach. Most novel drug-cancer associations (116, 87%) are supported by at least one clinical trial study. Conclusions In this study, we provided a comprehensive data source, including anticancer drugs and their targets and performed a detailed analysis in term of historical tendency and networks. Its application to identify novel drug-cancer associations demonstrated that the data collected in this study is promising to serve as a fundamental for anticancer drug repurposing and development.


Background
In the last 50 years, numerous remarkable achievements have been made in the fight against cancer, starting from understanding cancer mechanisms to patient treatment. However, cancer remains as one of the leading causes of death in the world, which places a heavy burden on health services and society. Cancer involves abnormal cell growth with the potential to invade or spread to other parts of the body and encompasses more than 100 distinct diseases with diverse risk factors and epidemiology. Over the past five decades, scientific discoveries and technological advances, including modern molecular biology methods, high-throughput screening, structurebased drug design, combinatorial and parallel chemistry, and the sequencing of the human genomes have improved the drug discovery. However, the increasing cost of new drug development and decreasing number of truly efficient medicines approved by the US Food and Drug Administration (FDA) present unprecedented challenges for the pharmaceutical industry and patient healthcare, including the oncology [1,2]. As the increasing availability of FDA-approved drugs and quantitative biological data from the human genome project, multiple strategies have been proposed to shorten the drug development process and significantly lower costs, including drug repurposing [3,4] and network pharmacology [5,6].
With advances in anticancer drug discovery and development in the last several decades, more than 100 anticancer drugs have been discovered and approved by the FDA [7,8]. These drugs can be broadly classified into two basic categories: cytotoxic and targeted agents based on their mechanisms of action [9][10][11]. The cytotoxic agents can kill rapidly dividing cells by targeting components of the mitotic and/or DNA replication pathways. The targeted agents block the growth and spread of cancer through interacting with molecular targets that are involved in the pathways relevant to cancer growth, progression, and spread [12]. Those successful agents and their related data may provide valuable clues for further identification of novel drug targets, the discovery of novel anticancer drug combinations, drug repurposing, and computational pharmacology. Several reviews have provided the historical summary of these drugs, which revealed the trends of increasing proportion of targeted agents, particularly monoclonal antibodies [7,8].
Recently network pharmacology has successfully applied in multiple fields such as target identification, prediction of side effects, and investigation of general patterns of drug actions [5,13,14]. Therefore, besides of updating the FDA-approved anticancer drugs, analysis of drug-disease/target networks will significantly increase our understanding of the molecular mechanisms underlying drug actions and provide valuable clues for drug discovery.
Thus, in this study, we first comprehensively collected the FDA-approved anticancer drugs by the end of 2014 and curated their related data, such as initial approval years, action mechanisms, indications, delivery methods, and targets from multiple data sources. According to their action mechanisms, we classified them into two groups: cytotoxic and targeted drugs. Then, we analyzed these data to reveal the different trends between the two groups. Besides, we analyzed the drug targets by investigating their subcellular locations, functional classifications, and genetic mutations. Finally, we generated anticancer drug-disease and drug-target networks to capture the common anticancer drugs across different types of cancer and to reveal how strongly the anticancer drugs and targets interact or drug-target networks. The network-assisted investigation provides us with novel insights into the relationships among anticancer drugs and disease or drugs and targets, which may provide valuable information for further understanding anticancer drugs and the development of more efficient treatments.

Collection of FDA-approved anticancer drugs and their relation information
We have collected anticancer drugs approved by FDA since 1949 to the end of 2014 from multiple data sources. We started the collection of the anticancer drugs from anticancer drug-focused websites, including National Cancer Institute (NCI) drug information [15], MediLexicon cancer drug list [16], and NavigatingCancer [17]. Then, we employed the tool MedEx-UIMA, a new natural language processing system, to retrieve the generic names for these drugs [18]. Using the generic names, we searched Drug@FDA [19] and downloaded their FDA labels. For those that cannot be found in the drugs@FDA, we obtained their labels from Dailymed [20] or DrugBank [21]. From the drug label, we manually retrieved the initial approval year, drug action mechanism, drug target, delivery method, and indication for each drug. We further checked the multiple sources such as the MyCancerGenome [22], DrugBank, and the several publications [4,23] to obtain the drug targets. For drug category, we manually checked the ChemoCare [24] to assign the drugs as cytotoxic or targeted agents. In our curated drug list, we did not include the medicines to treat drug side effects, cancer pain, other conditions, or cancer prevention.

Classes of drug targets and cancer
For these targeted agents, we collected their targets from FDA drug labels, DrugBank, and MyCancerGenome. We then manually curated the primary effect-mediating targets for each drug. We further retrieved the gene annotation from Ingenuity Pathway Analysis (IPA) [25]    to obtain their subcellular location and family classes.
For the indication, we first collected the detail information from FDA drug labels and then manually classified them into higher-level class for the purpose of data analysis. For example, drug idelalisib can be used to treat relapsed chronic lymphocytic leukemia (CLL), relapsed follicular B-cell non-Hodgkin lymphoma (FL), relapsed small lymphocytic lymphoma (SLL) from FDA labels. In our data analysis, we recorded the drug's therapeutic classes as leukemia and lymphoma.

Cancer genes and somatic mutations of the cancer genome
The cancer gene set contains 594 genes from the Cancer Gene Census, which have been implicated in tumorigenesis by experimental evidence in the literature (July 14, 2016) [26]. We obtained 50 oncogenes (OCGs) and 50 tumor suppressor genes (TSGs) with high confidence from Davioli et al. [27]. The somatic mutations were obtained from Supplementary Table 2 in one previous work [28]. The table contains  . The mutations include missense, silent, nonsense, splice site, readthrough, frameshift indels (insertions/deletions) and inframe indels [28].

Network analysis
We built two networks based on our curated data, drugcancer and drug-cancer-target networks. In the drugcancer network, there are two types of nodes representing drug or cancer types and edges suggesting drug as the approved treatment for the cancer. In the drug-cancer-target network, there are three types of nodes representing cancer types, drug or drug target and edges indicating cancer-drug associations or drug-target interactions. The network degree is used to assess the toplogical feature of each cancer type and drug, i.e., the number of edges of each node in the network.

Common target-based approach
We used common target-based approach to discover novel drug-cancer associations [29]. It is one of the "guiltby-association" strategies based on the knowledge that whether the drugs shared common targets or not. If two drugs A and B have a common target, drug A is in current use for treating cancer type C and drug B is used for cancer type D, it is highly likely to be effective for drug Acancer type D and drug B-cancer type C associations.

FDA-approved anticancer drugs
From 1949 to 2014, a total of 150 medicines has been approved with an indication for at least one type of cancer (Table 1). Notably, in this study, we did not include the drugs used to treat side effects of cancer treatment, cancer pain, and other conditions. Based on the mechanism of action (MOA), we grouped them into two groups: 61 cytotoxic drugs and 89 targeted drugs.
Most of the cytotoxic drugs are alkylating agents, antimicrotubule agents, topoisomerase inhibitors while most of the targeted drugs belong to signal transduction inhibitors, gene expression modulators, apoptosis induces, hormone therapies, and monoclonal antibodies. Figure 1 shows that the number of approved drugs in cancer treatment had a gradual increase. In the later years (1991-2014), the number of approved anticancer (116 drugs) extremely increased compared to that of the previous five decades (1941-1990, 34 drugs (17) was similar to that of cytotoxic drugs (21). However, since the 2000s, the number of targeted drugs (65) was significantly higher than that of the cytotoxic drugs (13), which was about five times. Among 89 targeted drugs, 18 are antibodies, of which two (rituximab and trastuzumab) were approved in 1990, eight in the 2000s (gemtuzumab ozogamicin, alemtuzumab, ibritumomab tiuxetan, tositumomab and iodine I 131 tositumomab, bevacizumab, cetuximab, panitumumab, and ofatumumab) and seven from 2010 to 2014 (denosumab, brentuximab vedotin, ipilimumab, pertuzumab, ado-trastuzumab emtansine, obinutuzumab, and pembrolizumab). The trend was consistent with previous observations [7], which indicated that the advanced molecular understanding of cancer during the period had contributed substantially to the development of the anticancer drug, especially targeted drugs [30].
According to the drug delivery method administered to the patient, one drug can be categorized as a cancer single (individual) drug or a cancer combination drug. A combination drug is a drug that makes up a cancer drug combination that several individual drugs are administered to the patient. Though the targeted agents have become the primary focus of the therapeutic cancer research, investigation of their combined use with other targeted drugs or with cytotoxic drugs has become promising for the development of the effective cancer treatment [31,32]. Among the 150 drugs, 96 drugs could be given to patients one at a time, 22 could be given in combination with other cancer drugs to patients, and 32 drugs could be delivered to patients as the combination drugs or single drugs (Fig. 2). The targeted drugs tended to be delivered as signal drugs (Pearson's correlation: r = 0.92, P < 2.2 × 10 −26 ) while cytotoxic drug tended to be delivered as combination drugs (r = 0.43, P = 0.002) or by both methods (r = 0.44, P = 0.001).

Subcellular location and function of drug targets
In our curated data set, among the 150 anticancer FDAapproved drugs, 89 were targeted drugs that could be used to treat 23 types of cancer and acted on 102 protein targets (Tables 1, 2). To comprehensively understand the target functions and their genetic roles in cancer, we performed a survey from the perspectives of subcellular location, functional classification, and genetic mutations. These insights might be valuable for further understanding of molecular mechanisms of cancer and the advanced development of cancer therapy [30,33,34].
We retrieved the target's subcellular information and function classification from IPA and manually reviewed for each target ( Table 2). The result shows that most of the drug targets (45, 44%) located in the plasma membrane, 27 (26%) in the cytoplasm, 23 (23%) in the cell nucleus, and only seven (7%) in the extracellular space (Fig. 3a). Among the 45 targets in the plasma membrane, 21 were tyrosine kinases, 12 were transmembrane receptors, five were antigens, and five were G-protein coupled Fig. 4 Mutation pattern of drug target genes belonging to cancer genes. The TargetCancer represented the common genes between anticancer drug targets and cancer genes. The TargetOnly represented the genes only belonging to genes encoding drug targets with mutation data. The CancerOnly represented the genes only belonging to cancer genes with mutation data. The Other represented genes with mutation data excluding the genes from above three gene sets. a Comparison of average mutation frequency of four gene sets. b Percentage of genes with at least 2% mutation frequency in the Pan-Cancer. c The function classification, mutation frequency in individual cancer type and Pan-Cancer, and numbers of drugs of 32 TargetCancer genes. We highlighted the mutation frequency higher than 5% of samples in "TargetCancer" genes with red color receptors. Among the 27 targets in the cytoplasm, 23 were enzymes and four were others. Among the 23 targets in the nucleus, 13 were enzymes and 10 were receptors. The observation indicates that, to date, the most successful anticancer drugs target the plasma membrane proteins.

Genetic pattern of targeted anticancer targets
To check if these targets are the cancer candidate genes, we compared them with the cancer gene set which contains 594 genes from the Cancer Gene Census [26]. Among 102 target genes, 32 genes are cancer genes. Compared to all the protein-coding genes in the human (20,729), the anticancer drug targets were significantly enriched with cancer genes (Hypergeometric test, Pvalue = 3.57 × 10 −25 ). Among the 32 cancer genes, 16 were oncogenes while none were tumor suppressor genes according to the the high confidence TSGs and OCGs from Davioli et al. [27].
To further explore the mutation pattern of the anticancer drug targets, we utilized the somatic mutations in 3268 patients across 12 types of cancer from TCGA Pan-Cancer [28]. Among 102 drug targets, 32 were cancer genes. Thus we compared the mutation frequency of four gene sets: 32 genes belonging to drug targets and cancer genes (TargetCancer genes), 70 genes only belonging to genes encoding drug targets (TargetOnly genes), 537 cancer genes only belonging to cancer genes and with mutation data (CancerOnly genes), and 20,308 genes with mutation data excluding the genes from above three gene sets (Other genes). To compare the distribution of mutation frequency of the tumor samples among the four gene sets, we performed the Kolmogorov-Smirnor (K-S) tests. Figure 4a shows the comparison of mutation percentage of all samples in each gene set. The TargetCancer genes had the highest average mutation frequency (2.41%), which was significantly higher than that of TargetOnly (1.19%, K-S test: P = 4.79 × 10 −5 ), Can-cerOnly (1.85%, P = 0.0005), and Other genes (0.97%, P = 1.31 × 10 −9 ). The CancerOnly genes had the second Fig. 5 Drug-cancer network. The red ellipse represents the cancer; the green rectangle represents the cytotoxic drug; the green diamond represents the targeted drug. The cancer abbreviations included in the Table 3 highest average mutation frequency (1.85%), which was significantly higher than that of of TargetOnly (P = 0.0275) and other genes (P < 2.2× 10 −17 ). The Targe-tOnly genes had the third highest average mutation frequency, which was significantly higher than that of other genes (P = 0.0134).
Notably, among the 32 TargetCancer genes, 18 genes (56%) had at least 2% mutation frequency across the Pan-Cancer collection (Fig. 4b). Compared to that of TargetOnly genes (39%), CancerOnly (29%), or Other gene sets (10%), the percentage was significantly higher (Chi-squared test P-values: 0.0002, 0.002, 2.48 × 10 −16 , respectively). Figure 4c shows the percentage of samples with mutations of the 32 TargetCancer genes, their function classification, and number of targeting drugs. Indeed, for the 32 Target Cancer genes, there was a significant correlation between the percentages of samples with mutations and numbers of targeted drugs (Pearson's correlation: r = 0.40, P = 0.0230). Among the 32 genes, the most frequently mutated gene in the Pan-Cancer cohort was EGFR (6.2%). Its mutations significantly occur in the brain cancer GBM (27.1%), lung cancer (13.5%), COAD/READ (5.8%), HNSC (6.2%). Among the seven drugs targeting the gene, three (afatinib, erlotinib, and gefitinib) were used to treat lung cancer, two (cetuximab and panitumumab) were used to treat colorectal cancer, and one (cetuximab) was used to treat head and neck cancer.

Drug-cancer network
To explore the associations between the drugs and cancer types, we generated a drug-cancer network, which comprised 183 nodes (150 drugs and 33 cancer types) and 248 drug-cancer associations (Fig. 5) based on the FDA-approved drug-cancer associations in our curated data.
In the drug-cancer network, the degree (number of cancer types) of the 150 drugs ranged from one to eleven, and the average degree was 1.65. The degree distribution of these drugs was strongly right-skewed, indicating that most drugs had a low degree and only a small portion of the nodes had a high degree. The degree of the cytotoxic drugs was 2.13, which was significantly higher than that of the targeted drugs (1.33, K-S test: P = 0.0378). Most of them (105, 70%) could be used to treat only one cancer type. Among the 105 drugs, 35 belonged to the cytotoxic drugs while 70 belonged to the targeted drugs. Among the rest 45 drugs, 24 (16%) could be used to treat two cancer types and 21 drugs (14%) could be used to treat at least three cancer types. Among the 21 drugs, 15 were cytotoxic drugs while six were targeted drugs. Most of the 21 drugs (16, 76%) were approved by FDA before 2000. The most commonly used drug was doxorubicin that could be used to treat 11 cancer types, including leukemia, breast cancer, stomach cancer, lymphoma, ovarian cancer, lung cancer, sarcoma, thyroid cancer, bladder cancer, kidney cancer, and brain cancer. Doxorubicin is a cytotoxic anthracycline antibiotic isolated from cultures of Streptomyces peucetius var. caesius, which binds to nucleic acids, presumably by specific intercalation of the planar anthracycline nucleus with the DNA double helix [35]. The result indicated that the cytotoxic drugs tended to be used to treat more cancer types than targeted drugs.
In the drug-cancer network, the degree (number of drugs) of the 33 cancer types ranged from one to 40 and the average degree was 7.52. The degree distribution of the cancer types was not obviously right-skewed. Among the 33 cancer types, 11 had one drug, 12 had at least two drugs and less than 10 drugs, and ten had at least ten drugs (Table 3). They were leukemia (number of drugs: 40), lymphoma (28), breast cancer (27), lung cancer (17), prostate cancer (15), ovarian cancer (12), Fig. 6 Network of targeted drugs, targets, and cancer types. The red rectangle represents the cancer; the green rectangle represents the targeted drug, the blue rectangle represents the drug target. The cancer abbreviations included in the Table 3 (10), kidney cancer (10), and stomach cancer (10). Among the 40 drugs used to treat leukemia, 24 belonged to cytotoxic drugs while 16 drugs were the targeted drugs. Similarly, the numbers of cytotoxic drugs and targeted drugs were similar to each other for lymphoma, breast cancer, and lung cancer. However, for prostate cancer, melanoma, and kidney cancer, the numbers of targeted drugs were significantly higher than those of cytotoxic drugs.

Network of targeted drugs, targets, and cancer
Besides the drug-cancer network, we generated a specific network for targeted drugs, their targets, and their indications. The network contained 214 nodes (89 drugs, 102 targets, and 23 cancer types) and 313 edges (118 drugcancer associations and 195 drug-target associations) (Fig. 6) based on the FDA-approved targeted drug-cancer associations and targeted drug-target associations in our curated data. In the network, drugs had two types of neighbors: drug target and drug indication (cancer type). The target degree (number of targets) of the 89 drugs ranged from one to 18, and the average degree was 2.19. The cancer degree (number of cancer types) of the 89 drugs ranged from one to four and the average degree was 1.33. Among the 89 drugs, 22 had more than two targets. The drug regorafenib had 18 targets, which was approved by FDA to treat gastrointestinal stromal tumors and metastatic colorectal cancer. Among the 89 drugs, 19 drugs could be used to treat more than one cancer types. Four drugs bevacizumab, everolimus, hydroxyurea, and recombinant interferon Alfa-2b could be used to treat four types of cancer. The degree (number of drugs) of targets ranged from one to seven and the average degree was 1.91. The EGFR (epidermal growth factor receptor) and KDR (kinase insert domain receptor) were the most popular targets and both could be targeted by seven drugs, separately. The EGFR-related seven drugs could be used to treat six cancer types, while KDR-related drugs could be used to treat seven types of cancer. There were three common cancer types: colorectal cancer, thyroid cancer, pancreatic cancer. The degree (number of drugs) of cancer types ranged from one to 16 and the average degree was 5.13. As we discussed before, leukemia had 16 targeted drugs can be used to treat. The common target-based approach, namely, the drugs that shared common targets could be used to treat the same disease, is one of the "guilt-by-association" strategies to identify the novel drug-disease associations [29]. During the analysis, we noticed that, among the 89 drugs, 70 drugs had at least one common target. Applying the common target-based approach, we discovered 133 novel drug-cancer associations among 52 drugs and 16 cancer types. To evaluate the novel drug-cancer associations, we utilized the clinical trial studies to see if the drug had been investigated in the corresponding cancer type. After searching using the 52 drugs and their predicted cancer types against ClinivalTrials.gov, we found that most of the drug-cancer associations (116) have been investigated in at least one clinical trial (Table 4) while the 17 had not been investigated in clinical trials. The later part of novel drug-cancer associations might provide valuable clues for drug repurposing. The most well-studied association was the thalidomide-lymphoma, which had 174 clinical trial studies, including 15 Phase III clinical trial studies and one Phase IV clinical trial study. The drug thalidomide was approved to treat multiple myeloma. Recently its combination with other drugs entered to treat the peripheral T-cell lymphoma in the Phase 4 study (ClinicalTrials.gov Identifier: NCT01664975).

Conclusion
FDA-approved anticancer medicines play important roles in the successful cancer treatment and novel anticancer drug development. In this study, we comprehensively collected 150 FDA-approved anticancer drugs from 1949 to 2014. According to their action mechanisms, we groups them into two sets: cytotoxic and targeted agency. Then we performed a comprehensive analysis from the perspective of drugs, drug indications, drug targets, and their relationships. For drugs, we summarized their historical characteristics and delivery methods. For targets, we surveyed their cellular location, functional classification, genetic patterns. We further applied network methodology to investigate their relationships. In this study, we provided a comprehensive data source, including anticancer drugs and their targets and performed a detailed analysis in term of historical tendency and networks. Its application to discover novel drug-cancer associations demonstrated that the data collected in this study is promising to serve as a fundamental for anticancer drug repurposing and development.