Skip to main content

Table 3 Top five complete-linkage clusters with weighted Jaccard coefficient >= 0.7.

From: KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics

(a) clustered by KCF-S descriptor

Cluster

#M

Max MW

Ave MW

Min MW

SD

#1 acyl-CoA molecules

 

144

993.8

C01894

883.8

C04348

767.5

C00010

3.317

#2 enoyl-CoA molecules

 

79

1124

C16388

1026

C16163

891.7

C05276

6.789

#3 metals and inorganic ions

 

48

244.0

C19159

97.75

C00150

1.00

C00080

10.11

#4 acyl-CoA molecules with aromatic substituted groups

 

48

1023

C14118

929.6

C00323

861.6

C00845

6.107

#5 disaccharides

 

35

342.2

C00897

339.3

C04698

326.2

C19758

1.153

(b) clustered by PubChem fingerprint

Cluster

Molecules

Max MW

Ave MW

Min MW

SD

#1 from furanocoumarins to glycosylated flavonoids

 

382

918.8

C12636

372.7

C09956

186.1

C09060

5.993

#2 from biotinyl-5'-AMP to CoA-disulfide

 

237

1533

C02015

959.5

C16339

573.5

C05921

7.893

#3 from flavonoids to pyrones (chromones), aggregated phenols

 

159

668.7

C10669

325.1

C09752

166.1

C10712

6.879

#4 from xanthenes to tannins, glycosylated and acylated flavonoids

 

156

2108

C16302

757.2

C12646

346.2

C09967

27.82

#5 steroids

 

135

514.2

C15359

335.8

C14621

270.3

C14261

3.703

(c) clustered by MACCS fingerprint

Cluster

Molecules

Max MW

Ave MW

Min MW

SD

#1 from pyrimidine 5'-deoxynucleotide to CoA-disulfide

 

432

1533

C02015

823.4

C00100

277.1

C08249

12.13

#2 from 3',5'-cyclic CMP to polypeptidyl UPD-glucose

 

195

1221

C04894

564.8

C00842

305.1

C00941

13.41

#3 from xanthenes to highly glycosylated and aromatic acylated flavonoids

 

167

2108

C16302

642.3

C16290

244.1

C10082

23.76

#4 from xanthenes to C-glycosylated flavonoids

 

159

610.5

C10102

337.7

C10049

222.2

C00799

5.895

#5 from pyrones to biflavonoids

 

157

1120

C10235

502.5

C16191

206.1

C09012

13.34

  1. #M indicates the numbers of molecules in the clusters. Max MW, Ave MW, and Min MW indicate the molecules with the maximum molecular weight, the molecules with the average molecular weight, and the molecules with the minimum molecular weights, respectively, with the respective molecular weights. SD shows the standard deviation of the obtained clusters. Description after the cluster numbers (#1 - #5) represents the group of molecules, in which "from ... to ..." indicates that the molecular structures in the cluster were so diverse that we could not find appropriate words to describe the clusters.