TY - GEN
T1 - How well conditional random fields can be used in novel term recognition
AU - Zhang, Xing
AU - Song, Yan
AU - Fang, Alex Chengyu
PY - 2010
Y1 - 2010
N2 - In this paper, we describe the construction of a machine learning framework that exploit syntactic information in the recognition of biomedical terms and present the limits of machine learning in generating a novel term candidate list. Conditional random fields (CRF), is used as the basis of this framework. We make an effort to find the appropriate use of syntactic information, including parent nodes, syntactic paths and term ratios under this machine learning framework. The experiment results show that CRF model can achieve good precision in term recognition if trained with known term list. However, with regard to discovering potential novel terms for terminology lexicon editors, CRF model fails to show good performance, if trained with known term list only to predict novel terms in testing corpus. Therefore, this result suggests that more semantic information may be needed to determine a word to be a novel term during a specific period.
AB - In this paper, we describe the construction of a machine learning framework that exploit syntactic information in the recognition of biomedical terms and present the limits of machine learning in generating a novel term candidate list. Conditional random fields (CRF), is used as the basis of this framework. We make an effort to find the appropriate use of syntactic information, including parent nodes, syntactic paths and term ratios under this machine learning framework. The experiment results show that CRF model can achieve good precision in term recognition if trained with known term list. However, with regard to discovering potential novel terms for terminology lexicon editors, CRF model fails to show good performance, if trained with known term list only to predict novel terms in testing corpus. Therefore, this result suggests that more semantic information may be needed to determine a word to be a novel term during a specific period.
KW - Conditional random fields
KW - Novel term recognition
KW - Term recognition
UR - https://www.scopus.com/pages/publications/84863877710
M3 - Conference contribution
AN - SCOPUS:84863877710
SN - 9784905166009
T3 - PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation
SP - 583
EP - 592
BT - PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation
T2 - 24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24
Y2 - 4 November 2010 through 7 November 2010
ER -