Skip to main content

Advertisement

Table 8 Model selection and evaluation metrics (general and per class) of top 5 models from 36 possible instantiations of pipeline using LC data-set

From: Pipeline design to identify key features and classify the chemotherapy response on lung cancer patients using large-scale genetic data

  FS Sampling Classifier CV F1 CV Precision CV Recall Train Test Test Test Test Test Test Test Test Test Model
     Mean ± Std Mean ± Std Mean ± Std F1 F1 Precision Recall F1 (0) Precision (0) Recall (0) F1 (1) Precision (1) Recall (1) Parameters
1 RFE-LR Up-sampling RF 0,72 ± 0,054 0,686 ± 0,102 0,79 ± 0,039 1 0,722 0,778 0,729 0,871 0,964 0,794 0,2 0,125 0,5 n_estimators =30
2 RLR-L1 SMOTE-sampling KNN 0,712 ± 0,087 0,68 ± 0,122 0,762 ± 0,066 0,777 0,741 0,806 0,844 0,889 1 0,8 0,222 0,125 1 n_neighbors =5,
                  C =100
3 ANOVA No sampling RF 0,698 ± 0,077 0,651 ± 0,12 0,776 ± 0,061 1 0,652 0,722 0,595 0,839 0,929 0,765 0 0 0 n_estimators =30
4 RFE-LR SMOTE-sampling RF 0,689 ± 0,077 0,648 ± 0,119 0,761 ± 0,071 1 0,681 0,778 0,605 0,875 1 0,778 0 0 0 n_estimators =30
5 ANOVA No sampling Linear SVM 0,687 ± 0,113 0,687 ± 0,136 0,707 ± 0,112 1 0,811 0,833 0,823 0,9 0,964 0,844 0,5 0,375 0,75 C =0.1
  1. They are ordered by CV F1. FS stands for feature selection, Cv for cross-validation, F1 is the measure of model evaluation defined as: Precision x Recall / (Precision + Recall). Precision is the proportion of examples classified as positive that are truly positive and Recall the proportion of truly positive examples that are classified as positive. Std stands for standard deviation. Train indicates we used the training set to compute the evaluation metric and Test if we used the test set. (0) indicates it’s an evaluation metric for class 0 and (1) for class 1