Skip to main content
Log in

Improving completeness and consistency of co-reference annotation standard

  • Published:
Wireless Networks Aims and scope Submit manuscript

Abstract

As the processing power of mobile terminals increases, wireless network applications such as voice assistants can put more context-sensitive tasks on the mobile terminals, thus reducing the wireless network bandwidth needed and the cost of data storage in the cloud. Co-reference annotation, identifying the same semantics in context, is one of the critical techniques in these tasks. However, there are some problems with the existing co-reference annotation standards. First, the annotation is incomplete. Second, the types of annotated mentions are inconsistent. Third, there are currently no metrics for the above characteristics. Analyzing the above-mentioned issues, this paper proposes a new co-reference annotation standard. The new standard can annotate more semantics and co-reference relations and only adopts two types of mentions for annotation. Meanwhile, this paper presents a performance evaluation corpus and designs three performance metrics for evaluating the new standard according to the completeness of semantic annotation, the completeness of co-reference annotation, and the consistency of mention. The experiment shows that the new standard outperforms all the baseline methods and achieves 0.95 in the completeness of semantic annotation, 0.68 in the completeness of co-reference annotation, and 0.57 in the consistency of types of mentions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this published article and its supplementary information file.

References

  1. Cybulska, A., & Vossen, P. (2014). Guidelines for ECB+ annotation of events and their coreference. Retrieved from http://www.newsreader-project.eu/files/2013/01/NWR-2014-1.pdf

  2. Barhom, S., Shwartz, V., Eirew, A., Bugert, M., Reimers, N., & Dagan, I. (2019). Revisiting joint modeling of cross-document entity and event coreference resolution. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 4179–4189). Presented at the ACL 2019, Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1409

  3. Soon, W. M., Ng, H. T., & Lim, D. C. Y. (2001). A machine learning approach to coreference resolution of noun phrases. Computational linguistics, 27(4), 521–544. https://doi.org/10.1162/089120101753342653

    Article  Google Scholar 

  4. Moosavi, N. S., & Strube, M. (2017). Lexical features in coreference resolution: To be used with caution. In Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 2: Short Papers) (Vol. 2, pp. 14–19). Presented at the ACL 2017, Vancouver, Canada: Association for computational linguistics. https://doi.org/10.18653/v1/P17-2003

  5. Xu, Y., Xia, B., Wan, Y., Zhang, F., Xu, J., & Ning, H. (2021). CDCAT: A multi-language cross-document entity and event coreference annotation tool. Tsinghua Science and Technology, 27(3), 589–598. https://doi.org/10.26599/TST.2020.9010060

    Article  Google Scholar 

  6. Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., & Zhang, Y. (2012). CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes. In Proceedings of the shared task: Modeling multilingual unrestricted coreference in OntoNotes (pp. 1–40). Presented at the joint conference on EMNLP and CoNLL, Jeju Island, Korea: Association for Computational Linguistics. Retrieved from https://aclanthology.org/W12-4501

  7. Wu, W., Wang, F., Yuan, A., Wu, F., & Li, J. (2020). CorefQA: Coreference resolution as query-based span prediction. In Proceedings of the 58th annual meeting of the association for computational linguistics. Presented at the ACL 2020, Online. Retrieved from https://virtual.acl2020.org/paper_main.622.html

  8. Luan, Y., He, L., Ostendorf, M., & Hajishirzi, H. (2018). Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In Proceedings of the 2018 conference on empirical methods in natural language processing (pp. 3219–3232). Presented at the EMNLP 2018, Brussels, Belgium: Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1360

  9. Kang, Y., Ou, R., Zhang, Y., Li, H., & Tian, S. (2022). PG-CODE: Latent dirichlet allocation embedded policy knowledge graph for government department coordination. Tsinghua Science and Technology, 27(4), 680–691. https://doi.org/10.26599/TST.2021.9010059

    Article  Google Scholar 

  10. Liao, X., Zheng, D., & Cao, X. (2021). Coronavirus pandemic analysis through tripartite graph clustering in online social networks. Big Data Mining and Analytics, 4(4), 242–251. https://doi.org/10.26599/BDMA.2021.9020010

    Article  Google Scholar 

  11. Humphreys, K., Gaizauskas, R., & Azzam, S. (1997). Event coreference for information extraction. In Proceedings of a workshop on operational factors in practical, robust anaphora resolution for unrestricted texts (pp. 75–81). Madrid, Spain. https://doi.org/10.3115/1598819.1598830

  12. Xiong, A., Liu, D., Tian, H., Liu, Z., Yu, P., & Kadoch, M. (2021). News keyword extraction algorithm based on semantic clustering and word graph model. Tsinghua Science and Technology, 26(6), 886–893. https://doi.org/10.26599/TST.2020.9010051

    Article  Google Scholar 

  13. Peng, C., Zhang, C., Xue, X., Gao, J., Liang, H., & Niu, Z. (2022). Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification. Tsinghua Science and Technology, 27(4), 664–679. https://doi.org/10.26599/TST.2021.9010055

    Article  Google Scholar 

  14. Bai, H., Yang, Y., & Wang, J. (2022). Exploiting more associations between slots for multi-domain dialog state tracking. Big Data Mining and Analytics, 5(1), 41–52.

    Article  Google Scholar 

  15. Cybulska, A., & Vossen, P. (2014). Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution. In Proceedings of the ninth international conference on language resources and evaluation (pp. 4545–4552). Presented at the LREC 2014. Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/840_Paper.pdf

  16. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2006). OntoNotes: The 90% Solution. In Proceedings of the human language technology conference of the NAACL, companion volume: Short papers (pp. 57–60). Presented at the HLT-NAACL 2006, New York City, USA: Association for Computational Linguistics. https://doi.org/10.3115/1614049.1614064

  17. Zeldes, A. (2017). The GUM corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3), 581–612. https://doi.org/10.1007/s10579-016-9343-x

    Article  Google Scholar 

  18. Walker, C., Strassel, S., Medero, J., & Maeda, K. (2006). ACE 2005 multilingual training corpus. Retrieved April 10, 2022 from https://catalog.ldc.upenn.edu/LDC2006T06

  19. Bhardwaj, N., & Sharma, P. (2021). An advanced uncertainty measure using fuzzy soft sets: application to decision-making problems. Big Data Mining and Analytics, 4(2), 94–103. https://doi.org/10.26599/BDMA.2020.9020020

    Article  Google Scholar 

  20. McNamee, P., & Dang, H. T. (2009). Overview of the TAC 2009 knowledge base population track. In Text analysis conference (TAC) (pp. 111–113).

  21. Bagga, A., & Baldwin, B. (1998). Entity-based cross-document coreferencing using the vector space model. In Proceedings of the 17th international conference on Computational linguistics (Vol. 1). Presented at the COLING 1998, Montreal, Quebec, Canada. https://doi.org/10.3115/980845.980859

  22. Sandhaus, E. (2008). The New York times annotated corpus. Linguistic Data Consortium. https://doi.org/10.35111/77BA-9X74

    Book  Google Scholar 

  23. Lu, J., & Ng, V. (2018). Event coreference resolution: A survey of two decades of research. In Proceedings of the twenty-seventh international joint conference on artificial intelligence (pp. 5479–5486). Presented at the IJCAI-18, Stockholm, Sweden. https://doi.org/10.24963/ijcai.2018/773

Download references

Acknowledgements

The authors would like to thank the editors and the reviewers who made valuable comments that helped us improve this paper.

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huansheng Ning.

Ethics declarations

Conflict of interest

The authors declare they have no financial interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 303 kb)

Supplementary file2 (PDF 287 kb)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Y., Farha, F., Wan, Y. et al. Improving completeness and consistency of co-reference annotation standard. Wireless Netw (2022). https://doi.org/10.1007/s11276-022-03077-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11276-022-03077-8

Keywords

Navigation