Skip to main content

Advertisement

Table 1 The performance of models trained on different scale training sets

From: Recognition of bacteria named entity using conditional random fields in Spark

Training set (The number of sentences) CRF++ on single node Spark version
Precision Recall F-Measure Precision Recall F-Measure
1000 84.679% 73.429% 78.654% 86.715% 80.566% 83.527%
2000 85.442% 76.391% 80.664% 88.031% 80.880% 84.304%
3000 86.287% 78.232% 82.062% 88.623% 81.463% 84.892%
4000 85.707% 78.591% 81.995% 88.389% 82.002% 85.076%
5000 86.447% 78.725% 82.405% 88.699% 81.373% 84.878%
6000 87.831% 80.341% 83.919% 89.492% 82.944% 86.094%
7000 88.456% 80.476% 84.277% 89.981% 83.438% 86.586%
8000 87.745% 80.341% 83.880% 90.398% 83.662% 86.900%
9000 88.345% 80.969% 84.496% 90.847% 84.201% 87.398%
10,000 88.873% 81.373% 84.958% 90.944% 83.842% 87.249%