Recognizing Named Entities in Specific Domain

Tikhomirov, M. M.; Loukachevitch, N. V.; Dobrov, B. V.

doi:10.1134/S199508022008020X

Recognizing Named Entities in Specific Domain

Published: 21 October 2020

Volume 41, pages 1591–1602, (2020)
Cite this article

Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

M. M. Tikhomirov¹,
N. V. Loukachevitch¹ &
B. V. Dobrov¹

152 Accesses
1 Citation
Explore all metrics

Abstract

The paper presents the results of applying the BERT representation model in the named entity recognition task (NER) for the cybersecurity domain in Russian. We compare several approaches to domain-specific NER combining BERT fine-tuning on a domain-specific text collection, general labeled data, domain-specific data augmentation, and a domain-specific annotated dataset. We showed that using a BERT model fine-tuned on a domain text collection and pre-trained on the combination of a general dataset and augmented data achieves the best results of named entity recognition. We also studied computational performance of the BERT model in so-called mixed precision regime.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using BERT and Augmentation in Named Entity Recognition for Cybersecurity Domain

ND-NER: A Named Entity Recognition Dataset for OSINT Towards the National Defense Domain

On the Assessment of Deep Learning Models for Named Entity Recognition of Brazilian Legal Documents

Notes

https://github.com/LAIR-RCC/InfSecurityRussianNLP
https://github.com/NVIDIA/apex
https://github.com/LAIR-RCC/InfSecurityRussianNLP

REFERENCES

I. Afanasyev, V. Voevodin, V. Rudyak, and A. Emelyanenko,‘‘The practice of conducting performance analysis of supercomputer applications,’’ Numer. Methods Program. 20, 346–355 (2019).
Google Scholar
D. Bahdanau, K. Cho, and Y. Bengio,‘‘Neural machine translation by jointly learning to align and translate,’’ arXiv:1409.0473 (2014).
V. Bocharov, A. Starostin, S. Alexeeva, A. Bodrova, A. Chunchunkov, S. Dzhumaev, I. Efimenko, D. Granovsky, V. Khoroshevsky, I. Krylova, M. Nikolaeva, I. Smurov, and S. Toldova, ‘‘FactRuEval 2016: Evaluation of named entity recognition and fact extraction systems for Russian,’’ in Proceedings of International Conference on Computational Linguistics Dialog-2016 (2016), No. 22, pp. 702–720.
R. Bridges, C. Jones, M. Iannacone, K. Testa, and J. Goodall, ‘‘Automatic labeling for entity extraction in cyber security,’’ arXiv:1308.4941 (2013)
L. Chen, A. Moschitti, G. Castellucci, A. Favalli, and R. Romagnoli, ‘‘Transfer learning for industrial applications of named entity recognition,’’ in Proceedings of the 2nd Workshop on Natural Language for Artificial Intelligence NL4AI 2018 (2018), pp. 129–140.
DeepPavlov Documentation. http://docs.deeppavlov.ai/en/master/. Accessed Dec. 25, 2019.
J. Devlin, M. Chang, K. Lee, and K. Toutanova, ‘‘Bert: Pre-training of deep bidirectional transformers for language understanding,’’ arXiv:1810.04805 (2018).
Ch. Fellbaum, WordNet: An Electronic Lexical Database (MIT, Boston, MA, 1998).
MATH Google Scholar
H. Gasmi, A. Bouras, and J. Laval, ‘‘LSTM recurrent neural networks for cybersecurity named entity recognition,’’ in Proceedings of the International Conference on Software Engineering Advances ICSEA, 2018, Vol. 11.
J. Howard and S. Ruder, ‘‘Universal language model fine-tuning for text classification,’’ arXiv:1801.06146 (2018).
A. Joshi, R. Lal, T. Finin, and A. Joshi, ‘‘Extracting cybersecurity related linked data from text,’’ in Proceedings of the 2013 IEEE 7th International Conference on Semantic Computing (2013), pp. 252–259.
S. Kobayashi, ‘‘Contextual augmentation: Data augmentation by words with paradigmatic relations,’’ in Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics NAACL-HLT, 2018, pp. 452–457.
Y. Kuratov and M. Arkhipov, ‘‘Adaptation of deep bidirectional multilingual transformers for russian language,’’ arXiv:1905.07213 (2019).
J. Lafferty, A. McCallum, and F. Pereira, ‘‘Conditional random fields: Probabilistic: models for segmenting and labeling sequence data,’’ in Proceedings of the International Conference on Machine Learning ICML-2001 (2001).
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, ‘‘Neural architectures for named entity recognition,’’ arXiv:1603.01360 (2016).
T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘‘Efficient estimation of word representations in vector space,’’ arXiv:1301.3781 (2013).
V. Mozharova and N. Loukachevitch, ‘‘Combining knowledge and CRF-based approach to named entity recognition in Russian,’’ in Proceedings of the International Conference on Analysis of Images, Social Networks and Texts (Springer, Cham, 2016), pp. 185–195.
V. Mozharova and N. Loukachevitch, ‘‘Recognizing names in islam-related russian twitter,’’ in Proceedings of the Conference on Data Analytics and Management in Data Intensive Domains DAMDID-2017 (2017), pp. 319–324.
J. Piskorski, L. Laskova, M. Marcinczuk, L. Pivovarova, P. Priban, J. Steinberger, and R. Yangarberger, ‘‘The second cross-lingual challenge on recognition, normalization, classification, and linking of named entities across slavic languages,’’ in Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing BSNLP-2019 (2019), pp. 63–74.
E. Sang and F. Meulde, ‘‘Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition,’’ in Proceedings of the 7th conference on Natural Language Learning at HLT-NAACL 2003 (2003), Vol. 4, pp. 142–147.
A. Sirotina and N. Loukachevitch, ‘‘Named entity recognition in information security domain for Russian,’’ in Proceedings of the Recent Advances in Natural Language Processing RANLP-2019 (2019), pp. 1115–1122.
Google Scholar
K. Shinzato, S. Sekine, N. Yoshinaga, and K. Torisawa, ‘‘Constructing dictionaries for named entity recognition on specific domains from the Web,’’ in Proceedings of the Web Content Mining with Human Language Technologies Workshop on the 5th International Semantic Web (2006).
B. Strauss, B. Toma, A. Ritter, M. de Marneffe, and W. Xu, ‘‘Results of the wnut16 named entity recognition shared task,’’ in Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT) (2016), pp. 138–144.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proceedings of the International Conference on Advances in Neural Information Processing Systems (2017), 5998–6008.
J. Wei and K. Zou, ‘‘Eda: Easy data augmentation techniques for boosting performance on text classification tasks,’’ in Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP-2019 (2019), pp. 6381–6387.
Y. Wu, M. Schuster, Z. Chen, Q. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al., ‘‘Google’s neural machine translation system: Bridging the gap between human and machine translation,’’ arXiv:1609.08144 (2016).
W. Yang Wang and D. Yang, ‘‘That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets,’’ in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015), pp. 2557–2563.

Download references

Funding

The research is carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University. The participation of M. Tikhomirov in the reported study was funded by RFBR, project no. 19-37-90119.

Author information

Authors and Affiliations

Moscow State University, 119991, Moscow, Russia
M. M. Tikhomirov, N. V. Loukachevitch & B. V. Dobrov

Authors

M. M. Tikhomirov
View author publications
You can also search for this author in PubMed Google Scholar
N. V. Loukachevitch
View author publications
You can also search for this author in PubMed Google Scholar
B. V. Dobrov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to M. M. Tikhomirov, N. V. Loukachevitch or B. V. Dobrov.

Additional information

(Submitted by E. E. Tyrtyshnikov)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tikhomirov, M.M., Loukachevitch, N.V. & Dobrov, B.V. Recognizing Named Entities in Specific Domain. Lobachevskii J Math 41, 1591–1602 (2020). https://doi.org/10.1134/S199508022008020X

Download citation

Received: 30 March 2020
Revised: 12 April 2020
Accepted: 18 April 2020
Published: 21 October 2020
Issue Date: August 2020
DOI: https://doi.org/10.1134/S199508022008020X

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions