Skip to main content

Table 8 Model selection and evaluation metrics (general and per class) of top 5 models from 36 possible instantiations of pipeline using LC data-set

From: Pipeline design to identify key features and classify the chemotherapy response on lung cancer patients using large-scale genetic data

 

FS

Sampling

Classifier

CV F1

CV Precision

CV Recall

Train

Test

Test

Test

Test

Test

Test

Test

Test

Test

Model

    

Mean ± Std

Mean ± Std

Mean ± Std

F1

F1

Precision

Recall

F1 (0)

Precision (0)

Recall (0)

F1 (1)

Precision (1)

Recall (1)

Parameters

1

RFE-LR

Up-sampling

RF

0,72 ± 0,054

0,686 ± 0,102

0,79 ± 0,039

1

0,722

0,778

0,729

0,871

0,964

0,794

0,2

0,125

0,5

n_estimators =30

2

RLR-L1

SMOTE-sampling

KNN

0,712 ± 0,087

0,68 ± 0,122

0,762 ± 0,066

0,777

0,741

0,806

0,844

0,889

1

0,8

0,222

0,125

1

n_neighbors =5,

                 

C =100

3

ANOVA

No sampling

RF

0,698 ± 0,077

0,651 ± 0,12

0,776 ± 0,061

1

0,652

0,722

0,595

0,839

0,929

0,765

0

0

0

n_estimators =30

4

RFE-LR

SMOTE-sampling

RF

0,689 ± 0,077

0,648 ± 0,119

0,761 ± 0,071

1

0,681

0,778

0,605

0,875

1

0,778

0

0

0

n_estimators =30

5

ANOVA

No sampling

Linear SVM

0,687 ± 0,113

0,687 ± 0,136

0,707 ± 0,112

1

0,811

0,833

0,823

0,9

0,964

0,844

0,5

0,375

0,75

C =0.1

  1. They are ordered by CV F1. FS stands for feature selection, Cv for cross-validation, F1 is the measure of model evaluation defined as: Precision x Recall / (Precision + Recall). Precision is the proportion of examples classified as positive that are truly positive and Recall the proportion of truly positive examples that are classified as positive. Std stands for standard deviation. Train indicates we used the training set to compute the evaluation metric and Test if we used the test set. (0) indicates it’s an evaluation metric for class 0 and (1) for class 1