Skip to main content

Table 3 Top five complete-linkage clusters with weighted Jaccard coefficient >= 0.7.

From: KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics

(a) clustered by KCF-S descriptor
Cluster #M Max MW Ave MW Min MW SD
#1 acyl-CoA molecules
  144 993.8 C01894 883.8 C04348 767.5 C00010 3.317
#2 enoyl-CoA molecules
  79 1124 C16388 1026 C16163 891.7 C05276 6.789
#3 metals and inorganic ions
  48 244.0 C19159 97.75 C00150 1.00 C00080 10.11
#4 acyl-CoA molecules with aromatic substituted groups
  48 1023 C14118 929.6 C00323 861.6 C00845 6.107
#5 disaccharides
  35 342.2 C00897 339.3 C04698 326.2 C19758 1.153
(b) clustered by PubChem fingerprint
Cluster Molecules Max MW Ave MW Min MW SD
#1 from furanocoumarins to glycosylated flavonoids
  382 918.8 C12636 372.7 C09956 186.1 C09060 5.993
#2 from biotinyl-5'-AMP to CoA-disulfide
  237 1533 C02015 959.5 C16339 573.5 C05921 7.893
#3 from flavonoids to pyrones (chromones), aggregated phenols
  159 668.7 C10669 325.1 C09752 166.1 C10712 6.879
#4 from xanthenes to tannins, glycosylated and acylated flavonoids
  156 2108 C16302 757.2 C12646 346.2 C09967 27.82
#5 steroids
  135 514.2 C15359 335.8 C14621 270.3 C14261 3.703
(c) clustered by MACCS fingerprint
Cluster Molecules Max MW Ave MW Min MW SD
#1 from pyrimidine 5'-deoxynucleotide to CoA-disulfide
  432 1533 C02015 823.4 C00100 277.1 C08249 12.13
#2 from 3',5'-cyclic CMP to polypeptidyl UPD-glucose
  195 1221 C04894 564.8 C00842 305.1 C00941 13.41
#3 from xanthenes to highly glycosylated and aromatic acylated flavonoids
  167 2108 C16302 642.3 C16290 244.1 C10082 23.76
#4 from xanthenes to C-glycosylated flavonoids
  159 610.5 C10102 337.7 C10049 222.2 C00799 5.895
#5 from pyrones to biflavonoids
  157 1120 C10235 502.5 C16191 206.1 C09012 13.34
  1. #M indicates the numbers of molecules in the clusters. Max MW, Ave MW, and Min MW indicate the molecules with the maximum molecular weight, the molecules with the average molecular weight, and the molecules with the minimum molecular weights, respectively, with the respective molecular weights. SD shows the standard deviation of the obtained clusters. Description after the cluster numbers (#1 - #5) represents the group of molecules, in which "from ... to ..." indicates that the molecular structures in the cluster were so diverse that we could not find appropriate words to describe the clusters.