Skip to main content

Table 1 Data statistics in the construction of training dataset and independent testing dataset

From: UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines

Data set Data resource Number of ubiquitylated proteins Number of ubiquitylated lysines Number of non-ubiquitylated lysines
Training set hCKSAAP_UbiSite 2500 6118 6118
dbPTM 3.0 6259 23,949 228,441
mUbiSiDa 35,494 110,695 1,217,977
Combined non-redundant data 37,647 128,026 1,317,734
Non-homologous data (sequence identity 30 %) 4828 5438 12,663
Independent testing set CPLM 2.0 32,429 139,950 1,109,432
Non-homologous data (sequence identity 30 %) 2894 3732 10,664