Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton January 20, 2022

Inferring case paradigms in Koalib with computational classifiers

  • Nicolas Quint and Marc Allassonnière-Tang ORCID logo EMAIL logo

Abstract

The object case inflection in Koalib (Niger-Congo) represents complex patterns that involve phoneme position, syllable structure, and tonal pattern. Few attempts have been made with qualitative and quantitative approaches to identify the rules of the object case paradigms in Koalib. In the current study, information on phonemes, tones, and syllables are automatically extracted from a Koalib sample of 2,677 lexemes. The data is then fed to decision-tree-based classifiers to predict the object case paradigms and extract the interactive patterns between the variables. The results improve the predicting accuracy of existing studies and identify the case paradigms predicted by linguistic hypotheses. New case paradigms are also found by the computational classifiers and explained from a linguistic perspective. Our work demonstrates that the combination of linguistic theoretical knowledge with machine learning techniques can become one of the methodological approaches for linguistic analyses.


Corresponding author: Marc Allassonnière-Tang, EA UMR7206, MNHN/CNRS/Université de Paris, Paris, France, E-mail:

Acknowledgements

The authors are thankful for the comments of the editors and the reviewers, which helped to significantly improve the content of the paper. They are also grateful to Siddig Ali Karmal Koko for sharing with them his knowledge of and expertise on his mother tongue, Koalib.

  1. Research funding: The first author is thankful for the support of the following grants: (i) PICS franco-soudanais Les langues du Soudan: à la croisée des aires et types linguistiques [The languages of the Sudan: a typological and areal crossroad]; (ii) PHC-Napata Kin terms and anthroponyms in the Nuba Mountain languages; (iii) Labex EFL, Strand 3, Workpackage RT1 – Language genealogy (Niger-Congo, Austronesian): Reconstruction, internal classification and grammatical description in the world’s two biggest phyla: Niger-Congo and Austronesian (ANR-10-LABX-0083). This last grant contributes to the IdEx Université de Paris – ANR-18-IDEX-0001. The second author is also thankful for the support of grants from the Université de Lyon (ANR-10-LABX-0081, NSCO ED 476), the IDEXLYON Fellowship (2018–2021, 16-IDEX-0005), and the French National Research Agency (ANR-11-IDEX-0007, ANR-20-CE27-0021).

References

1967. T̠ikitad̠iza t̠iaŋ [The New Testament in Koalib]. Khartoum: The Bible Society of the Sudan.Search in Google Scholar

1993. Wa@d̠ wiyaŋ [The New Testament in Koalib]. Khartoum: The Bible Society in Sudan.Search in Google Scholar

Abdalla, Jummize & Abdalla Komi. 2000. Yəwə na Nyaamin Nyəthi Kithilə Kir 2000 [A calendar for the year 2000, lit. ‘Months and days of the year that is 2000’]. Khartoum: Khartoum Workshop Programme.Search in Google Scholar

Abdalla Omer, Jummeiz, Abdalla Komi Kodi & Ibrahim El-Haimer. 1995. Ŋwɔɔli Ŋwiyaŋ Kandisa-Gi Kət̠hi Kouliib [A new Koalib alphabet]. Khartoum: Kouliib Language Development Committee.Search in Google Scholar

Abdalla Omer, Jummeiz, Shanan Suliman Kodi & Abdalla Komi Kodi. 1998. Riŋerɔŋ Rəthi ŋwɔɔli ŋwiyaŋ kandisa-gi Kəthi kwəliib [A new Koalib alphabet illustrated by short stories]. Khartoum: Kwəliib Language Development Committee.Search in Google Scholar

Aharoni, Roee & Yoav Goldberg. 2017. Morphological inflection Generation with Hard Monotonic Attention. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2004–2015. Vancouver, Canada: Association for Computational Linguistics.10.18653/v1/P17-1183Search in Google Scholar

Ahlberg, Malin, Markus Forsberg & Mans Hulden. 2015. Paradigm classification in supervised learning of morphology. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1024–1029. Denver, Colorado: Association for Computational Linguistics.10.3115/v1/N15-1107Search in Google Scholar

Boychev, Georgi. 2013. Case inflection in Koalib: Discovering the rules. University of Lorraine MA thesis.Search in Google Scholar

Breiman, Leo, Jerome Friedman, Charles J. Stone & Richard Olshen. 1984. Classification and regression trees. New York: Taylor & Francis.Search in Google Scholar

Corbett, Greville G. 1991. Gender. Cambridge: Cambridge University Press.Search in Google Scholar

Corbett, Greville G. 2013. Systems of gender assignment. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology.Search in Google Scholar

Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sebastian Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner & Mans Hulden. 2018. The CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection. In Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, 1–27. Brussels: Association for Computational Linguistics.10.18653/v1/K17-2001Search in Google Scholar

Dimmendaal, Gerrit J. 2015. Accretion zones and the absence of language union. In Gerrit J. Dimmendaal (ed.), The leopard’s spots, 25–63. Leiden: Brill.10.1163/9789004224148_004Search in Google Scholar

Dowle, Matt & Arun Srinivasan. 2019. data.table: Extension of data.frame. R package version 1.12.2. Available at: https://CRAN.R-project.org/package=data.table.Search in Google Scholar

Eddelbuettel, Dirk. 2017. random: True random numbers using random.org. R package version 0.2.6. Available at: https://CRAN.R-project.org/package=random.Search in Google Scholar

Gower, John C. 1971. A General Coefficient of Similarity and Some of its Properties. Biometrics 27(4). 857–871.10.2307/2528823Search in Google Scholar

Hammarström, Harald. 2013. Noun class parallels in Kordofanian and Niger-Congo: Evidence of genealogical inheritance? In Thilo Schadeberg & Roger Blench (eds.), Nuba mountain language studies, 549–569. Cologne: Rüdiger Köppe.Search in Google Scholar

Hammarström, Harald, Robert Forkel & Martin Haspelmath. 2019. Glottolog 4.1. Jena: Max Planck Institute for the Science of Human History.Search in Google Scholar

Kann, Katharina & Hinrich Schütze. 2016. MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 62–70. Berlin, Germany: Association for Computational Linguistics.10.18653/v1/W16-2010Search in Google Scholar

Karshola Omar, Hussein, Hassan Komi & Susan Estifanus. 2000. Riŋeroŋ Kandsagi ked̠i Kawaliib [Koalib stories]. Khartoum: Kwaliib Language Committee.Search in Google Scholar

Kassambara, Alboukadel & Fabian Mundt. 2020. factoextra: Extract and Visualize the Results of Multivariate Data Analyses. R package version 1.0.7. Available at: https://CRAN.R-project.org/package=factoextra.Search in Google Scholar

Kaufman, Leonard & Peter Rousseuw. 1990. Finding groups in data. New York: Wiley.10.1002/9780470316801Search in Google Scholar

Kodi, Ismail. 2000. Tijaɽina [Traditional Celebration]. Khartoum: Kwaliib Language Committee.Search in Google Scholar

Krijthe, Jesse. 2018. Rtsne: T-distributed stochastic neighbor embedding using a Barnes-Hut implementation. R package version 0.15. Available at: https://github.com/jkrijthe/Rtsne.Search in Google Scholar

Kuhn, Matt & Davis Vaughan. 2019. parsnip: A common API to modeling and analysis functions. R package version 0.0.3.1. Available at: https://CRAN.R-project.org/package=parsnip.Search in Google Scholar

Kuhn, Max, Fanny Chow & Hadley Wickham. 2019. rsample: General resampling infrastructure. R package version 0.0.5. Available at: https://CRAN.R-project.org/package=rsample.Search in Google Scholar

Kuhn, Max & Hadley Wickham. 2019. recipes: Preprocessing tools to create design matrices. R package version 0.1.6. Available at: https://CRAN.R-project.org/package=recipes.Search in Google Scholar

Liaw, Andy & Matthew Wiener. 2002. Classification and regression by randomForest. R News 2(3). 18–22.Search in Google Scholar

Maechler, Martin, Peter Rousseuw, Anja Struyf, MiaHubert & KurtHornik. 2019. cluster: Cluster analysis basics and extensions. R package version 2.1.0.Search in Google Scholar

Makarov, Peter & Simon Clematide. 2018. UZH at CoNLL–SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection. In Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, 69–75. Brussels: Association for Computational Linguistics.10.18653/v1/K17-2004Search in Google Scholar

Milborrow, Stephen. 2019. rpart.plot: Plot rpart models: An enhanced version of plot.rpart. R package version 3.0.8. Available at: https://CRAN.R-project.org/package=rpart.plot.Search in Google Scholar

Paluszynska, Aleksandra & Przemyslaw Biecek. 2017. randomForestExplainer: Explaining and visualizing random forests in terms of variable importance. R package version 0.9. Available at: https://CRAN.R-project.org/package=randomForestExplainer.Search in Google Scholar

Perry, Patrick. 2017. corpus: Text corpus analysis. R package version 0.10.0. Available at: https://CRAN.R-project.org/package=corpus.Search in Google Scholar

Quint, Nicolas. 2006. Phonologie de la langue koalibe, Dialecte réré (Soudan). Paris: L’Harmattan.Search in Google Scholar

Quint, Nicolas. 2010a. Benefactive and malefactive verb extensions in the Koalib very system. In Fernando Zúñiga & Seppo Kittilä (eds.), Typological Studies in Language, Vol. 92, 295–316. Amsterdam: John Benjamins Publishing Company.10.1075/tsl.92.12quiSearch in Google Scholar

Quint, Nicolas. 2010b. Case in Koalib (a Kordofanian language) and related Heibanian languages. In The 40th Colloquium on African Languages and Linguistics. Leiden: Leiden University.Search in Google Scholar

Quint, Nicolas. 2013. Integration of borrowed nouns in Koalib, a noun class language. In Thilo Schadeberg & Roger Blench (eds.), Nuba mountain language studies, 115–134. Cologne: Rüdiger Köppe.Search in Google Scholar

Quint, Nicolas. 2018. An assessment of the Arabic lexical contribution to contemporary spoken Koalib. In Stefano Manfredi & Mauro Tosco (eds.), Arabic in contact, 189–205. Amsterdam: John Benjamins.10.1075/sal.6.10quiSearch in Google Scholar

Quint, Nicolas. 2020. Kordofanian. In Rainer Vossen (ed.), The Oxford handbook of African languages, 239–268. Oxford: Oxford University Press.10.1093/oxfordhb/9780199609895.013.56Search in Google Scholar

Quint, Nicolas. 2022. Classes nominales dans deux langues Niger-Congo: le baïnouck djifanghorois (atlantique) et le koalib (kordofanien) [Nominal classes in two Niger-Congo languages: baïnouck and Koalib]. Faits de Langue 53. 1–29.10.1163/19589514-05202010Search in Google Scholar

Quint, Nicolas & Siddig Ali Karmal Kokko. 2009. The phonology of Koalib: a Kordofanian language of the Nuba Mountains (Sudan) (Grammatical analyses of African languages; Grammatische Analysen afrikanischer Sprachen v. 36 = Bd. 36). Cologne: Rüdiger Köppe. OCLC: ocn517262760.Search in Google Scholar

Quint, Nicolas & Siddig Ali Karmal Kokko. 2022. Koalib-French dictionary forthcoming. Paris: L’Harmattan.Search in Google Scholar

R-Core-Team. 2021. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.Search in Google Scholar

Schadeberg, Thilo. 1981. A survey of Kordofanian Vol 1: The Heiban group. Hamburg: Helmut Buske.Search in Google Scholar

Sorokin, Alexey. 2016. Using longest common subsequence and character models to predict word forms. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 54–61. Berlin, Germany: Association for Computational Linguistics.10.18653/v1/W16-2009Search in Google Scholar

Suliman, Istifanus. 2000. Riŋerɔŋw [Stories]. Khartoum: Kwaliib Language Committee.Search in Google Scholar

Tagliamonte, Sali A. & Harald Baayen. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24. 135–178. https://doi.org/10.18653/v1/w16-2009.Search in Google Scholar

Therneau, Terry & Beth Atkinson. 2019. rpart: Recursive partitioning and regression trees. R package version 4.1-15. Available at: https://CRAN.R-project.org/package=rpart.Search in Google Scholar

Ting, Kai Ming. 2010. Precision and Recall. In Claude Sammut & Geoffrey I. Webb (eds.), Encyclopedia of Machine Learning, 781. Boston, MA: Springer US.10.1007/978-0-387-30164-8_652Search in Google Scholar

Wickham, Hadley. 2017. tidyverse: Easily install and load the Tidyverse. R package version 1.2.1. Available at: https://CRAN.R-project.org/package=tidyverse.Search in Google Scholar

Wickham, Hadley. 2019. stringr: Simple, consistent wrappers for common string operations. R package version 1.4.0. Available at: https://CRAN.R-project.org/package=stringr.Search in Google Scholar

Wickham, Hadley, Jim Hester & Romain Francois. 2018. readr: Read rectangular text data. R package version 1.3.1. Available at: https://CRAN.R-project.org/package=readr.Search in Google Scholar

Received: 2021-04-24
Accepted: 2021-12-16
Published Online: 2022-01-20
Published in Print: 2023-05-25

© 2021 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 16.4.2024 from https://www.degruyter.com/document/doi/10.1515/cllt-2021-0028/html
Scroll to top button