Skip to main content

Table 1 The performance of models trained on different scale training sets

From: Recognition of bacteria named entity using conditional random fields in Spark

Training set (The number of sentences)

CRF++ on single node

Spark version

Precision

Recall

F-Measure

Precision

Recall

F-Measure

1000

84.679%

73.429%

78.654%

86.715%

80.566%

83.527%

2000

85.442%

76.391%

80.664%

88.031%

80.880%

84.304%

3000

86.287%

78.232%

82.062%

88.623%

81.463%

84.892%

4000

85.707%

78.591%

81.995%

88.389%

82.002%

85.076%

5000

86.447%

78.725%

82.405%

88.699%

81.373%

84.878%

6000

87.831%

80.341%

83.919%

89.492%

82.944%

86.094%

7000

88.456%

80.476%

84.277%

89.981%

83.438%

86.586%

8000

87.745%

80.341%

83.880%

90.398%

83.662%

86.900%

9000

88.345%

80.969%

84.496%

90.847%

84.201%

87.398%

10,000

88.873%

81.373%

84.958%

90.944%

83.842%

87.249%