A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics

Table 2 Quality comparison of different clustering algorithms on bioinformatics datasets

Dataset	Measure metrics	PBC-A	PBC-R	MFCCs	DLBCL-B	Wine	WDBC	MPE-A	MPE-R	ESR
MKDCI	F-m	0.351	0.360	0.728	0.749	0.652	0.858	0.470	0.482	0.491
	aRe	0.956	0.953	0.406	0.526	0.704	0.382	0.693	0.689	0.852
	NMI	0.351	0.362	0.692	0.532	0.414	0.495	0.538	0.554	0.446
	AMI	0.070	0.076	0.615	0.496	0.379	0.453	0.429	0.438	0.219
DBSCAN (MinPts=4, ε₁)	F-m	0.660	0.665	0.509	0.510	0.576	0.811	0.448	0.452	0.350
	aRe	0.999	0.998	0.858	0.956	0.772	0.602	0.796	0.794	0.967
	NMI	0.023	0.026	0.221	0.054	0.361	0.395	0.492	0.499	0.060
	AMI	0.005	0.005	0.124	0.039	0.269	0.295	0.347	0.347	0.003
HDBSCAN (MinPts=4)	F-m	0.623	0.627	0.785	0.565	0.620	0.853	0.265	0.271	0.332
	aRe	0.998	0.998	0.260	0.985	0.715	0.386	0.926	0.923	0.989
	NMI	0.029	0.032	0.686	0.174	0.386	0.469	0.518	0.523	0.082
	AMI	0.019	0.020	0.613	0.115	0.353	0.373	0.335	0.337	0.020
DENCLUE2.0 (ε₂,h=std(X)/5)	F-m	0.023	0.025	0.415	0.493	0.372	0.007	0.304	0.308	0.650
	aRe	0.997	0.996	0.983	0.987	0.908	0.998	0.708	0.699	0.685
	NMI	0.344	0.347	0.105	0.184	0.385	0.322	0.472	0.478	0.472
	AMI	0.061	0.064	0.018	0.114	0.122	0.002	0.392	0.396	0.201
PFClust	F-m	0.315	0.320	0.375	0.442	0.373	0.432	0.202	0.207	0.271
	aRe	0.981	0.978	0.887	0.993	0.971	0.988	0.998	0.998	0.872
	NMI	0.002	0.002	0.123	0.043	0.033	0.019	0.024	0.028	0.135
	AMI	0.001	0.001	0.094	0.001	0.001	0.007	0.006	0.007	0.111
Parameters	ε ₁	24.657	24.657	0.306	19.819	3.626	20.413	2.221	2.221	1.426
	ε ₂	19.591	19.591	0.306	0.413	6.552	1.426	0.432	0.432	1.853

MinPts is the minimum number of data samples required to form a cluster, ε₁ is the maximum distance between two data samples for them to be considered as in the same neighborhood, ε₂ is the convergence threshold for density attractors and h is the parameter of a Gaussian kernel. ε₁ and ε₂ are the corresponding parameters when the better clustering results are obtained for F-m evaluation metric during clustering with ten random values of the parameters between 0.0 and 50.0