Abstract
Machine learning classifiers have achieved significant performance in the area of biomedical event extraction. For example, support vector machine (SVM) classifiers in the Turku Event Extraction System achieved the best performance in BioNLP09 task. Such classifiers typically rely on the use of large feature sets. Despite their robust performance, however, recent research has suggested that feature sets produced through automatic training need to be further optimized through size reduction in order to improve system performance. The current paper attempts to identify ways to reduce the size of feature sets by investigating the contribution of four different feature sets constructed according to lexical, grammatical, syntactic and semantic information. It reports an experiment based on BioNLP data prepared by the Turku team for biological event extraction and examines to what extent the dimension of the feature sets can be reduced while the classifier can still achieve similar performance. The importance of each feature set is evaluated through a SVM classifier. Our experiments demonstrate that feature set construction according to lexical, grammatical and syntactic information can effectively reduce the set size by as much as 86% while maintaining a comparable performance, hence significantly resolving the feature dimension issue. It is also shown through our experiments that a hybrid feature set constructed according to a combination of lexical and semantic information can achieve the second highest accuracy, hence indicating the useful feasibility of constructing an optimal feature set through dimension reduction and feature combination. We conclude that the experiments reported in the current paper have produced empirical evidence supporting the importance of linguistic information for the construction of high-performance feature sets in addition to domain knowledge for the task of biomedical event extraction.
| Original language | English |
|---|---|
| Pages (from-to) | 1032-1036 |
| Number of pages | 5 |
| Journal | Journal of Food, Agriculture and Environment |
| Volume | 11 |
| Issue number | 1 |
| Publication status | Published - 2013 |
Keywords
- Event extraction
- Feature selection
- Linguistic features
- Semantic information
- Support vector machine
- Syntactic information
- Turku event extraction system
Fingerprint
Dive into the research topics of 'Identification of discriminative features for biological event extraction through linguistically informed feature selection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver