Skip to main content

Table 1 Data statistics in the construction of training dataset and independent testing dataset

From: UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines

Data set

Data resource

Number of ubiquitylated proteins

Number of ubiquitylated lysines

Number of non-ubiquitylated lysines

Training set

hCKSAAP_UbiSite

2500

6118

6118

dbPTM 3.0

6259

23,949

228,441

mUbiSiDa

35,494

110,695

1,217,977

Combined non-redundant data

37,647

128,026

1,317,734

Non-homologous data (sequence identity ≦ 30 %)

4828

5438

12,663

Independent testing set

CPLM 2.0

32,429

139,950

1,109,432

Non-homologous data (sequence identity ≦ 30 %)

2894

3732

10,664