Table 2 Candidate markers identified from the van de Vijver data set using the proposed method

From: Good practice guidelines for biomarker discovery from array data: a case study for breast cancer prognosis

Group Sample size n (good prog + poor prog) Nested CV AUROCC performance Feature list (high expression → poor prognosis) Feature list (high expression → good prognosis)
All patient 146 (68+78) 0.73 (0.04) BIRC5, CCNB2, CENPA, TK1, CCNE2, DKFZp762E1312, PRC1, STK15, SLC16A3, BUB1 CEGP1, SLC11A3, C4A, ZNF145, MATN3, PGR, RAI2, DLX2
ER+ 107 (57+50) 0.76 (0.05) H1F2, COX6C, H2BFB, CCNE2, BLVRB FST, DIO3, NTN4, DLX2, MATN3, COL3A1
Node+ 64 (30+34) 0.80 (0.06) H1F2, H2BFB, HA2FO, H2AFA, HABFB, KFZp762E1312, H2BFS LTF, NTN4, HML2, PER1, DMBT1, ODZ2, WNT5A, SEMA3C
Node- 82 (38+44) 0.72 (0.06) PRAME, FADSD6, TK1, TSSC3, CTSL2, BUB1 CEGP1, ESR1, CYP4B1, SEC14L2, TBX3-iso, ZNF145
ER+/Node+ 50 (26+24) 0.83 (0.06) H1F2, H2BFB, H2AFP, H2AFA, H2BFB, COX6C, MSMB, BLVRB, , BCAS1 LTF, LAMB3, C4A, NTN4, PTPRK, RTN1
  1. Many genes discovered in larger groups can also be discovered in their subgroups. For example, BIRC5 can be discovered in most of the subgroups. These genes are not listed again in subgroups unless they are more significant in the subgroups. A gene may be listed in a larger group only because it is significant in one of its subgroups. For example, H1F2 is listed in lymph node-positive group only because it is significant in ER+/Node+ subgroup. The nested CV performance is listed with estimated standard error.