Extracting Time Expressions and Named Entities with Constituent-Based Tagging Schemes

Zhong, Xiaoshi; Cambria, Erik; Hussain, Amir

doi:10.1007/s12559-020-09714-8

Extracting Time Expressions and Named Entities with Constituent-Based Tagging Schemes

Published: 10 May 2020

Volume 12, pages 844–862, (2020)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

452 Accesses
27 Citations
Explore all metrics

Abstract

Time expressions and named entities play important roles in data mining, information retrieval, and natural language processing. However, the conventional position-based tagging schemes (e.g., the BIO and BILOU schemes) that previous research used to model time expressions and named entities suffer from the problem of inconsistent tag assignment. To overcome the problem of inconsistent tag assignment, we designed a new type of tagging schemes to model time expressions and named entities based on their constituents. Specifically, to model time expressions, we defined a constituent-based tagging scheme termed TOMN scheme with four tags, namely T, O, M, and N, indicating the defined constituents of time expressions, namely time token, modifier, numeral, and the words outside time expressions. To model named entities, we defined a constituent-based tagging scheme termed UGTO scheme with four tags, namely U, G, T, and O, indicating the defined constituents of named entities, namely uncommon word, general modifier, trigger word, and the words outside named entities. In modeling, our TOMN and UGTO schemes model time expressions and named entities under conditional random fields with minimal features according to an in-depth analysis for the characteristics of time expressions and named entities. Experiments on diverse datasets demonstrate that our proposed methods perform equally with or more effectively than representative state-of-the-art methods on both time expression extraction and named entity extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distant Supervision for Chinese Temporal Tagging

INDTime: Temporal Tagger––First Step Toward Temporal Information Retrieval

Injecting Temporal-Aware Knowledge in Historical Named Entity Recognition

Notes

In a supervised-learning procedure, tag assignment occurs in two stages: (1) feature extraction in the training stage and (2) tag prediction in the testing stage. We focus on the training stage to analyze the impact of tag assignment.
OntoNotes5’s 18 entity types include CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW, LOC, MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART.
Those removed entity types are CARDINAL, DATE, MONEY, ORDINAL, PERCENT, QUANTITY, TIME.
https://github.com/ontonotes/conll-formatted-ontonotes-5.0
The p_whole of proper nouns does not reach 100% mainly because each individual dataset is concerned with certain types of named entities and partly because some NNP* words are POS tagging errors, e.g., “SURPRISE DEFEAT” is tagged as “NNPNNP,” but it should be tagged as “JJ NN.”
The BIO scheme in this paper denotes the standard IOB2 scheme described in [67].
The BILOU scheme is also widely known as the BIOES or IOBES scheme.
https://en.wikipedia.org/wiki/Lists_of_cities_by_country and https://en.wikipedia.org/wiki/Lists_of_people_by_nationality.
Note that this kind of uncommon words are not available in the training phase because they are extracted from the unannotated test set.
We followed [82] not to use the Gigaword dataset in experiments because its labels are not ground-truth labels, but are automatically generated by other taggers.

References

Alex B, Haddow B, Grover C. Recognising nested named entities in biomedical text. Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing; 2007. p. 65–72.
Alonso O, Strotgen J, Baeza-Yates R, Gertz M. Temporal information retrieval: challenges and opportunities. Proceedings of 1st International Temporal Web Analytics Workshop; 2011. p. 1–8.
Angeli G, Manning CD, Jurafsky D. Parsing time: learning to interpret time expressions. Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2012. p. 446–55.
Angeli G, Uszkoreit J. Language-independent discriminative parsing of temporal expressions. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics; 2013. p. 83–92.
Bethard S. ClearTK-TimeML: a minimalist approach to TempEval 2013. Proceedings of the 7th International Workshop on Semantic Evaluation. Minneapolis: Association for Computational Linguistics; 2013. p. 10–4.
Borthwick A, Sterling J, Agichtein E, Grishman R. NYU: description of the MENE named entity system as used in MUC-7. Proceedings of the 7th Message Understanding Conference; 1998.
Campos R, Dias G, Jorge AM, Jatowt A. Survey of temporal information retrieval and related applications. ACM Comput Surv 2014;47(2):15:1–41.
Google Scholar
Chambers N, Wang S, Jurafsky D. Classifying temporal relations between events. Proceedings of the ACL on Interactive Poster and Demonstration Sessions. Ann Arbor: Association for computational linguistics; 2007. p. 173–6.
Chang AX, Manning CD. SUTime: a library for recognizing and normalizing time expressions. Proceedings of 8th International Conference on Language Resources and Evaluation; 2012. p. 3735–40.
Chang AX, Manning CD. SUTime: evaluation in TempEval-3. Proceedings of the Second Joint Conference on Lexical and Computational Semantics (SEM); 2013. p. 78–82.
Chinchor NA. MUC-7 named entity task definition. Proceedings of the 7th Message Understanding Conference; 1998.
Chinchor NA. Overview of MUC-7/MET-2. Proceedings of the 7th Message Understanding Conference; 1998.
Collins M, Singer Y. Unsupervised models for named entity classification. Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. College Park: Association for Computational Linguistics; 1999.
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa PP. Natural language processing (almost) from scratch. J Mach Learn Res 2011;12:2493–537.
MATH Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: Association for Computational Linguistics; 2019. p. 4171–86.
Do QX, Lu W, Roth D. Joint inference for event timeline construction. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012. p. 677–87.
Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R. The automatic content extraction (ACE) program tasks, data, and evaluation. Proceedings of the 2004 Conference on Language Resources and Evaluation; 2004 . p. 1–4.
Ferro L, Gerber L, Mani I, Sundheim B, Wilson G. 2005. TIDES 2005 standard for the annotation of temporal expressions. MITRE.
Filannino M, Brown G, Nenadic G. ManTIME: temporal expression identification and normalization in the TempEval-3 challenge. Proceedings of the 7th International Workshop on Semantic Evaluation; 2013. p. 53–7.
Finkel JR, Grenager T, Manning C. Incorporating non-local information into information extraction systems by gibbs sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics; 2005. p. 363–70.
Finkel JR, Manning C. Nested named entity recognition. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing; 2009. p. 141–50.
Giuliano C. Fine-grained classification of named entities exploiting latent semantic kernels. Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Boulder: Association for Computational Linguistics; 2009. p. 201–9.
Grishman R, Sundheim B. Message understanding conference - 6: a brief history. Proceedings of the 16th International Conference on Computational Linguistics; 1996.
Hacioglu K, Chen Y, Douglas B. Automatic time expression labeling for English and Chinese text. Proceedings of the 6th International Conference on Intelligent Text Processing and Computational Linguistics. Mexico City: Springer; 2005 . p. 548–59.
Hochreiter S, Schmidhuber J. Long short-term memory. Neur Comput 1997;9:1735–80.
Article Google Scholar
Huang Z, Xu W, Yu K. 2015. Bidirectional LSTM-CRF models for sequence tagging.
Ji H, Grishman R. Knowledge base population: successful approaches and challenges. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics; 2011. p. 1148–58.
Kazama J, Torisawa K. Exploiting wikipedia as external knowledge for named entity recognition. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Prague: Association for Computational Linguistics; 2007. p. 698–707.
Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A. Overview of the chemical compound and drug name recognition (CHEMDNER) task. BioCreative Challenge Eval Workshop; 2015. p. 2–33.
Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning. Williams College: Morgan Kaufmann Publishers; 2001. p. 281–9.
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architecture for named entity recognition. Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics; 2016. p. 260–70.
Lee K, Artzi Y, Dodge J, Zettlemoyer L. Context-dependent semantic parsing for time expressions. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore: Association for Computational Linguistics; 2014 . p. 1437–47.
Li J, Cardie C. Timeline generation: tracking individuals on twitter. Proceedings of the 23rd International Conference on World Wide Web; 2014. p. 643–52.
Liang P. 2005. Semi-supervised learning for natural language. Master’s Thesis.
Ling W, Dyer C, Black AW, Trancoso I, Fermandez R, Amir S, Marujo L, Luis T. Finding function in form: compositional character models for open vocabulary word representation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: Association for Computational Linguistics; 2015. p. 1520–30.
Ling X, Singh S, Weld DS. Design challenges for entity linking. Trans Assoc Comput Linguist 2015;3: 315–28.
Article Google Scholar
Ling X, Weld DS. Fine-grained entity recognition. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence. Toronto: AAAI Press; 2012. p. 94–100.
Liu L, Shang J, Ren X, Xu FF, Gui H, Peng J, Han J. Empower sequence labeling with task-aware neural language model. Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press; 2018. p. 5253–60.
Liu X, Zhang S, Wei F, Zhou M. Recognizing named entities in tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics; 2011. p. 359–67.
Llorens H, Derczynski L, Gaizauskas R, Saquete E. TIMEN: an open temporal expression normalisation resource. Proceedings of the 8th International Conference on Language Resources and Evaluation; 2012. p. 3044–51.
Llorens H, Saquete E, Navarro B. TIPSem (english and spanish): evaluating CRFs and semantic roles in TempEval-2. Proceedings of the 5th International Workshop on Semantic Evaluation; 2010. p. 284–91.
Luo G, Huang X, Lin C-Y, Nie Z. Joint named entity recognition and disambiguation. Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing; 2015 . p. 879–88.
Ma X, Hovy E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (volume 1: long papers). Berlin: Association for Computational Linguistics; 2016. p. 1064–74.
Ma Y, Cambria E, Gao S. Label embedding for zero-shot fine-grained named entity typing. Proceedings of the 26th International Conference on Computational Linguistics; 2016. p. 171–80.
Mani I, Verhagen M, Wellner B, Lee CM, Pustejovsky J. Machine learning of temporal relations. Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics; 2006. p. 753–60.
Mani I, Wilson G. Robust temporal processing of news. Proceedings of the 38th annual meeting on association for computational linguistics; 2000. p. 69–76.
Maynard D, Tablan V, Ursu C, Cunningham H, Wilks Y. Named entity recognition from diverse text types. Proceedings of 2001 Recent Advances in Natural Language Processing Conference; 2001. p. 257–74.
Mazur P, Dale R. WikiWars: a new corpus for research on temporal expressions. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. MIT Stata Center: Association for Computational Linguistics; 2010. p. 913–22.
McCallum A, Li W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. Proceedings of the 7th Conference on Computational Natural Language Learning. Edmonton: Association for Computational Linguistics; 2003. p. 188–91.
Miller S, Guinness J, Zamanian A. Name tagging with word clusters and discriminative training. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics; 2004.
Nadeau D, Sekine S. A survey of named entity recognition and classification. Lingvisticae Investigationes 2007; 30(1):3–26.
Article Google Scholar
Nakashole N, Tylenda T, Weikum G. Fine-grained semantic typing of emerging entities. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia: Association for Computational Linguistics; 2013. p. 1488–97.
Owoputi O, O’Connor B, Dyer C, Gimpel K, Schneider N, Smith NA. Improved part-of-speech tagging for online conversational text with word clusters. Proceedings of NAACL-HLT 2013; 2013. p. 380–90.
Parker R, Graff D, Kong J, Chen K, Maeda K. 2011. Engilish gigaword, 5th edn.
Peters ME, Ammar W, Bhagavatula C, Power R. Semi-supervised suquence tagging with bidirectional language models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics; 2017. p. 1756–65.
Poibeau T, Kosseim L. Proper name extraction from non-journalistic texts. Lang Comput 2001;37:144–57.
MATH Google Scholar
Pradhan S, Moschitti A, Xue N, Ng HT, Bjorkelund A, Uryupina O, Zhang Y, Zhong Z. Towards robust linguistic analysis using OntoNotes. Proceedings of the 7th Conference on Computational Natural Language Learning. Sofia: Association for Computational Linguistics; 2013. p. 143–52.
Pradhan SS, Hovy E, Marcus M, Palmer M, Ramshaw L, Weischedel R. Ontonotes: a unified relational semantic representation. Proceedings of the 2007 IEEE International Conference on Semantic Computing; 2007. p. 517–26.
Pustejovsky J, Castano J, Ingria R, Sauri R, Gaizauskas R, Setzer A, Katz G, Radev D. TimeML: robust specification of event and temporal expressions in text. Direct Question Answer 2003;3:28–34.
Google Scholar
Pustejovsky J, Hanks P, Sauri R, See A, Gaizauskas R, Setzer A, Sundheim B, Radev D, Day D, Ferro L, Lazo M. The TIMEBANK corpus. Corpus Linguist 2003;2003:647–56.
Google Scholar
Pustejovsky J, Lee K, Bunt H, Romary L. ISO-TimeML: an international standard for semantic annotation. Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10); 2010. p. 394–7.
Radford A, Narasimhan K, Salimans T, Sutskever I. 2018. Improving language understanding by generative pre-training.
Ratinov L, Roth D. Design challenges and misconceptions in named entity recognition. Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Boulder: Association for Computational Linguistics; 2009 . p. 147–55.
Ren X, He W, Qu M, Huang L, Ji H, Han J. AFET: automatic fine-grained entity typing by hierarchical partial-label embedding. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin: Association for Computational Linguistics; 2016. p. 1369–78.
Ritter A, Clark S, Mausam, Etzioni O. Named entity recognition in tweets: an experimental study. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing; 2011. p. 1524–34.
Sang EFTK, Meulder FD. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. Proceedings of the 7th Conference on Natural Language Learning; 2003. p. 142–7.
Sang EFTK, Veenstra J. Representing text chunks. Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics; 1999. p. 173–9.
Santos CND, Guimaraes V. Boosting named entity recognition with neural character embeddings. Proceedings of the 5th Named Entities Workshop. Beijing: Association for Computational Linguistics; 2015. p. 25–33.
Silva JFD, Kozareva Z, Lopes JGP. Cluster analysis and classification of named entities. Proceedings of the 4th International Conference on Language Resources and Evaluation. Lisbon: European Language Resources Association; 2004. p. 321–4.
Steedman M. 1996. Surface structure and interpretation. The MIT Press.
Strötgen J, Gertz M. HeidelTime: high quality rule-based extraction and normalization of temporal expressions. Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval’10). Stroudsburg: Association for Computational Linguistics; 2010. p. 321–4.
Strubell E, Verga P, Belanger D, McCallum A. Fast and accurate entity recognition with iterated dilated convolutions. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen: Association for Computational Linguistics; 2017. p. 2670–80.
Takeuchi K, Collier N. Bio-medical entity extraction using support vector machines. Artif Intell Med 2005; 33(2):125– 37.
Article Google Scholar
UzZaman N, Allen JF. TRIPS and TRIOS system for TempEval-2: Extracting temporal information from text. Proceedings of the 5th International Workshop on Semantic Evaluation; 2010 . p. 276–83.
UzZaman N, Llorens H, Derczynski L, Verhagen M, Allen J, Pustejovsky J. SemEval-2013 task 1: TempEval-3: Evaluating time expressions, events, and temporal relations. Proceedings of the 7th International Workshop on Semantic Evaluation; 2013. p. 1–9.
Verhagen M, Gaizauskas R, Schilder F, Hepple M, Katz G, Pustejovsky J. SemEval-2007 task 15: TempEval temporal relation identification. Proceedings of the 4th International Workshop on Semantic Evaluation; 2007. p. 75–80.
Verhagen M, Mani I, Sauri R, Knippen R, Jang SB, Littman J, Rumshisky A, Phillips J, Pustejovsky J. Automating temporal annotation with TARQI. Proceedings of the ACL Interactive Poster and Demonstration Sessions. Ann Arbor: Association for Computational Linguistics; 2005. p. 81–4.
Verhagen M, Sauri R, Caselli T, Pustejovsky J. SemEval-2010 task 13: TempEval-2. Proceedings of the 5th International Workshop on Semantic Evaluation; 2010. p. 57–62.
Wang L-J, Li W-C, Chang C-H. Recognizing unregistered names for mandarin word identification. Proceedings of the 14th Conference on Computational Linguistics; 1992. p. 1239–43.
Wong K-F, Xia Y, Li W, Yuan C. An overview of temporal information extraction. Int J Comput Process Oriental Lang 2005;18(2):137–52.
Article Google Scholar
Zhong X, Cambria E. Time expression recognition using a constituent-based tagging scheme. Proceedings of the 2018 World Wide Web Conference. Lyon: Association for Computing Machinery; 2018. p. 983–92.
Zhong X, Sun A, Cambria E. Time expression analysis and recognition using syntactic token types and general heuristic rules. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics; 2017. p. 420–9.

Download references

Funding

This research was funded by the AME Programmatic Funding (Project No. A18A2b0046) from the Agency for Science, Technology and Research (A*STAR), Singapore.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore
Xiaoshi Zhong & Erik Cambria
School of Computing, Edinburgh Napier University, Scotland, UK
Amir Hussain

Authors

Xiaoshi Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Erik Cambria
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoshi Zhong.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extension of the following conference paper: Xiaoshi Zhong and Erik Cambria. 2018. Time Expression Recognition Using a Constituent-based Tagging Scheme. In Proceedings of the 2018 World Wide Web Conference , Association for Computing Machinery, Lyon, France, pages 983–992.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhong, X., Cambria, E. & Hussain, A. Extracting Time Expressions and Named Entities with Constituent-Based Tagging Schemes. Cogn Comput 12, 844–862 (2020). https://doi.org/10.1007/s12559-020-09714-8

Download citation

Received: 21 August 2019
Accepted: 21 January 2020
Published: 10 May 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s12559-020-09714-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extracting Time Expressions and Named Entities with Constituent-Based Tagging Schemes

Abstract

Access this article

Similar content being viewed by others

Distant Supervision for Chinese Temporal Tagging

INDTime: Temporal Tagger––First Step Toward Temporal Information Retrieval

Injecting Temporal-Aware Knowledge in Historical Named Entity Recognition

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extracting Time Expressions and Named Entities with Constituent-Based Tagging Schemes

Abstract

Access this article

Similar content being viewed by others

Distant Supervision for Chinese Temporal Tagging

INDTime: Temporal Tagger––First Step Toward Temporal Information Retrieval

Injecting Temporal-Aware Knowledge in Historical Named Entity Recognition

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation