Abstract
The solution to the problem of automatically identifying and classifying information events in media texts is described based on the model of phraseological conceptual analysis of texts. The proposed solution is based on the use of previously developed methods for formalizing the semantic structure of sentences, as well as methods and algorithms for identifying fragments of media texts that describe information events. The developed algorithm implements the rules of C. Fillmore’s case grammar, which are based on the procedures of semantic–syntactic and conceptual analysis of texts.
Similar content being viewed by others
Notes
The term event (information event) will be understood as mass media message descriptions of socially significant phenomena, incidents, facts of social activity of a global or regional scale, as well as facts and events of social conglomerations or facts of personal life of famous public figures, etc.
The term information line is understood as the main topic of a message, which forces the target audience to discuss it. As a rule, an information line reflects important facts of the content of an event.
Dictionary of Unified Formalized Representations of Concept Names (UFRCN) [2].
The structure of the sentence in the form of the main words of phrases and their relationships.
REFERENCES
Bogatyrev, M.Yu., Extracting facts from natural language texts using conceptual graph models, Izv. Tul. Gos. Univ., Tekh. Nauki, 2016, no. 7, part 1, pp. 198–207.
Vinogradov, A.N., Vlasova, N.A., Kurshev, E.P., and Podobryaev, A.V., Modern technologies of natural language processing in strategic management problems, in Tekhnologicheskaya perspektiva v ramkakh evraziiskogo prostranstva: Novye rynki i tochki ekonomicheskogo rosta (Technological Perspective within the Eurasian Space: New Markets and Points of Economic Growth), St. Petersburg: Tsentr Nauchno-Inf. Tekhnol. Asterion, 2018.
Ermakov, A.E., Automatic extraction of facts from dossier texts: The experience of establishing anaphoric connections, in Komp’yuternaya lingvistika i intellektual’nye tekhnologii: Trudy mezhdunarodnoi konferentsii “Dialog'2007" (Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference "Dialogue'2007"), Moscow: Nauka, 2007.
Khoroshilov, Al-dr.A., Nikitin, Yu.V., Khoroshilov, Al-ei.A., and Budsko, V.I., Automatic creation of a formalized representation of the semantic content of unstructured text messages of mass media and social networks, Sist. Vys. Dostupnosti, 2014, vol. 10, no.3.
Helbig, H., Knowledge Representation and the Semantics of Natural Language, Berlin: Springer, 2006.
Kan, A.V., Revina, V.D., Rusnak, V.I., Khoroshilov, Al-dr.A., and Khoroshilov, A.A., Automatic formation of a syntactic language model for machine translation and information retrieval tasks, Nauchno-Tekh. Inf., Ser. 2, 2018, no. 12, pp. 25–41.
Fillmore, C.J., The case for case, 1967 Texas Symposium on Linguistic Universals, Columbus, OH: The Ohio State University, 1968.
Ablov, I.V., Kozichev, V.N., Shirmanov, A.V., Khoroshilov, Al-dr.A., and Khoroshilov, Al-ei.A., The tools of a machine grammar of the Russian language (based on G.G. Belonogov), Autom. Doc. Math. Linguist., 2018, vol. 52, pp. 142–156.
Kalinin, Yu.P., Khoroshilov, Al-dr.A., and Khoroshilov, Al-ei.A., Modern technologies of automated processing of textual information, Sist. Vys. Dostupnosti, 2015, vol. 11, no. 2, pp. 67–79.
Zakharov, V.N., Musabaev, R.R., Krasovitskii, A.M., Kozlovskaya, Ya.D., Khoroshilov, Al-dr.A., and Khoroshilov, Al-ei.A., A method for clustering news media reports based on their conceptual analysis, Inf. Ee Primen., 2019, vol. 29, no. 3, pp. 52–65.
Aivazyan, S.A., Bukhshtaber, V.M., Enyukov, I.S., and Meshalkin, L.D., Prikladnaya statistika: Klassifikatsiya i snizhenie razmernosti (Applied Statistics: Classification and Dimensionality Reduction), Moscow: Finansy Stat., 1989.
Alon, N., Spencer, J.H., and Erdős, P., The Probabilistic Method, Wiley, 1992.
Gol'dberg, I., Neirosetevye metody v obrabotke estestvennogo yazyka (Neural Network Methods in Natural Language Processing), Moscow: DMK, 2019.
Osinga, D., Glubokoe obuchenie. Gotovye resheniya (Deep Learning. Ready-Made Solutions), St. Petersburg: Dialektika, 2019.
Funding
This article was prepared as part of the PCF BR05236839 Development of information technologies and systems for stimulating the sustainable development of the individual as one of the foundations for the development of digital Kazakhstan project.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Translated by L. A. Solovyova
About this article
Cite this article
Khoroshilov, Ad.A., Musabaev, R.R., Kozlovskaya, Y.D. et al. Automatic Detection and Classification of Information Events in Media Texts. Autom. Doc. Math. Linguist. 54, 202–214 (2020). https://doi.org/10.3103/S0005105520040032
Received:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0005105520040032
Keywords:
- identification of information events
- classification of information events
- semantic–syntactic analysis of texts
- conceptual analysis of texts
- names of concepts
- statistical measure of significant names of concepts
- semantic correlation coefficient
- measures of semantic similarity of the contents of texts and classifier headings