An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

Stanescu, Ana; Caragea, Doina

doi:10.1186/1752-0509-9-S5-S1

BMC Systems Biology

Table 1 Table of Results.

From: An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

Imbal. Degree	LBE	CTEO	STEO	CTEP	STEP	CTEOD	STEOD	CTEPD	STEPD
1-to-5	0.452	0.526 ^◇	0.567*	0.647*	0.479 ^◇	0.692*	0.652*	0.644†	0.612 ^◇
1-to-10	0.434	0.462	0.455†	0.557†	0.343†	0.584*	0.573†	0.584†	0.573†
1-to-20	0.437	0.434	0.440 ^◇	0.522†	0.292^◇	0.515 ^◇	0.529†	0.523 ^◇	0.526*
1-to-25	0.437	0.384^◇	0.423^◇	0.497 ^◇	0.245*	0.507 ^◇	0.465 ^◇	0.510 ^◇	0.507†
1-to-30	0.430	0.336*	0.408^◇	0.484 ^◇	0.239*	0.509†	0.470 ^◇	0.503 ^◇	0.514*
1-to-40	0.443	0.404†	0.409	0.492 ^◇	0.222†	0.503 ^◇	0.468	0.504 ^◇	0.497†
1-to-50	0.450	0.372†	0.409^◇	0.491	0.236*	0.508 ^◇	0.451	0.504	0.486
1-to-60	0.471	0.388†	0.398	0.472	0.195†	0.496	0.423	0.494 ^◇	0.474
1-to-70	0.450	0.392†	0.411	0.462	0.207†	0.474 ^◇	0.444	0.480 ^◇	0.478
1-to-75	0.454	0.388	0.399^◇	0.460 ^◇	0.249†	0.483 ^◇	0.435	0.483	0.471
1-to-80	0.449	0.353†	0.386†	0.436	0.204*	0.457	0.421^◇	0.460 ^◇	0.465†
1-to-90	0.453	0.359†	0.410	0.449	0.242	0.470	0.423	0.473†	0.456
1-to-99	0.446	0.376	0.389^◇	0.440†	0.226†	0.464	0.414	0.459	0.457

The values represent averages of auPRC values for the positive class over the five organisms when the class imbalance degree varies from 1-to-5 to 1-to-99 and the amount of labeled instances represents less than 1% of the training data. LBE is the ensemble-based supervised lower bound. CTEO and STEO are the co-training-based and self-training-based ensembles inspired by the original approach in [11]. CTEP and STEP are the co-training and self-training based ensembles that use the "dynamic balancing" approach introduced in [15], in which only positive instances are used in semi-supervised iterations to augment the originally labeled training data. CTEOD and STEOD add positive and negative instances but distribute them among all subclassifiers, such that the balance and diversity of each subclassifier's labeled subset is maintained. CTEPD and STEPD use "dynamic balancing" but also distribute instances among all subclassifiers. The bold font denotes the semi-supervised experiments that outperform the lower bound. The starred (*) values denote experiments whose variation in comparison to the lower bound was found to be statistically significant by the paired t-test in all five organisms. The values marked with a plus (†) indicate experiments that the paired t-test found to be statistically significant in four out of five organisms. The values marked with a diamond (◇) indicate experiments that the paired t-test found to be statistically significant in three out of five organisms.

Back to article page

ISSN: 1752-0509

Contact us

General enquiries: ORSupport@springernature.com