Do all roads lead to Rome?: Modeling register variation with factor analysis and discriminant analysis

Jesse Egbert; Douglas Biber

doi:10.1515/cllt-2016-0016

Published by De Gruyter Mouton August 31, 2018

Do all roads lead to Rome?: Modeling register variation with factor analysis and discriminant analysis

Jesse Egbert and Douglas Biber

From the journal Corpus Linguistics and Linguistic Theory

https://doi.org/10.1515/cllt-2016-0016

Showing a limited preview of this publication:

Abstract

Previous theoretical and empirical research on register variation has argued that linguistic co-occurrence patterns have a highly systematic relationship to register differences, because they both share the same functional underpinnings. The goal of this study is to test this claim through a comparison of two statistical techniques that have been used to describe register variation: factor analysis (as used in Multi-Dimensional analysis, MDA) and canonical discriminant analysis (CDA). MDA and CDA have different statistical bases and thus give priority to different analytical considerations: linguistic co-occurrence in the case of MDA and the prediction of register differences in the case of CDA. Thus, there is no statistical reason to expect that the two techniques, if applied to the same corpus, will produce similar results. We hypothesize that although MDA and CDA approach register variation from opposite sides, they will produce similar results because both types of statistical patterns are motivated by underlying discourse functions. The present paper tests this claim through a case-study analysis of variation among web registers, applying MDA and CDA to analyze register variation in the same corpus of texts.

Keywords: register variation; factor analysis; discriminant analysis; multi-dimensional analysis; web registers; text classification; linguistic co-occurrence

Funding statement: National Science Foundation, Directorate for Social, Behavioral and Economic Sciences, Division of Behavioral and Cognitive Sciences (Grant/Award Number: 1147581).

References

Atkinson, Dwight & Douglas Biber. 1994. Register: A review of empirical research. In D. Biber & E. Finegan (eds.), Sociolinguistic perspectives on register, 351–385. Oxford: Oxford University Press.Search in Google Scholar

Baayen, R. Harald. 2009. Corpus linguistics in morphology: Morphological productivity. In A. Luedeling & M. Kyto (eds.), Corpus linguistics. An international handbook, 900–919. Berlin: Mouton De Gruyter.Search in Google Scholar

Baayen, R. Harald, Hans Van Halteren, & Fiona Tweedie. 1996. Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11(3). 121–131.10.1093/llc/11.3.121Search in Google Scholar

Barbieri, Federica. 2009. Annotation of major register/genre studies. Appendix A in D. Biber and S. Conrad, Register, genre, and style 271–295. Cambridge: Cambridge University Press.10.1017/CBO9780511814358.010Search in Google Scholar

Berber Sardinha, Tony & Marcia Veirano Pinto, (eds.). 2014. Multi-Dimensional analysis, 25 years on – a tribute to Douglas Biber. Amsterdam: John Benjamins.10.1075/scl.60Search in Google Scholar

Biber, Douglas. 1985. Investigating macroscopic textual variation through multi-feature/multi-dimensional analyses. Linguistics 23. 337–360.10.1515/ling.1985.23.2.337Search in Google Scholar

Biber, Douglas. 1986. Spoken and written textual dimensions in English: Resolving the contradictory findings. Language 62. 384–414.10.2307/414678Search in Google Scholar

Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024Search in Google Scholar

Biber, Douglas. 1992. The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings. Computers and the Humanities 26. 331–345.10.1007/BF00136979Search in Google Scholar

Biber, Douglas. 1993. Using register-diversified corpora for general language studies. Computational Linguistics 19. 219–241.Search in Google Scholar

Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.10.1017/CBO9780511519871Search in Google Scholar

Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory 8. 9–37.10.1515/cllt-2012-0002Search in Google Scholar

Biber, Douglas. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1). 7–34.10.1075/lic.14.1.02bibSearch in Google Scholar

Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.10.1017/CBO9780511814358Search in Google Scholar

Biber, Douglas & Jesse Egbert. 2015. Using grammatical features for automatic register identification in an unrestricted corpus of documents from the open web. Journal of Research Design and Statistics in Linguistics and Communication Science 2(1). 3–36.10.1558/jrds.v2i1.27637Search in Google Scholar

Biber, Douglas & Jesse Egbert. 2016. Register variation on the searchable web: A Multi-Dimensional analysis. Journal of English Linguistics 44(2). 95–137.10.1177/0075424216628955Search in Google Scholar

Biber, Douglas, Jesse Egbert & Mark Davies. 2015. Exploring the composition of the searchable web: A corpus-based taxonomy of web registers. Corpora 10(1). 11–45.10.3366/cor.2015.0065Search in Google Scholar

Biber, Douglas & Edward Finegan. 1988. Adverbial stance types in English. Discourse Processes 11. 1–34.10.1080/01638538809544689Search in Google Scholar

Biber, Douglas & Edward Finegan. 1989. Styles of stance in English: Lexical and grammatical marking of evidentiality and affect. Text 9. 93–124.10.1515/text.1.1989.9.1.93Search in Google Scholar

Biber, Douglas & Bethany Gray. 2016. Grammatical complexity in academic English: Linguistic change in writing. Cambridge: Cambridge University Press.10.1017/CBO9780511920776Search in Google Scholar

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, & Edward Finegan. 1999. Longman grammar of spoken and written English. Essex, UK: Pearson Education Limited.Search in Google Scholar

Carroll, John. 1960. Vectors of prose style. In T. A. Sebeok (ed.) Style in language, 283–292. Cambridge: Cambridge University Press.Search in Google Scholar

Conrad, Susan & Douglas Biber. 2001. Variation in English: Multi-dimensional studies. Essex, UK: Pearson Education Limited.Search in Google Scholar

Egbert, Jesse. 2015. Sub-register and discipline variation in published academic writing: Investigating statistical interaction in corpus data. International Journal of Corpus Linguistics 20(1). 1–29.10.1075/ijcl.20.1.01egbSearch in Google Scholar

Egbert, Jesse, Douglas Biber & Mark Davies. 2015. Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology 66(9). 1817–1831.10.1002/asi.23308Search in Google Scholar

Friginal, Eric. 2013. Twenty-five years of Biber’s multi-dimensional analysis. [Special issue]. Corpora 8(2). 137–280.10.3366/cor.2013.0038Search in Google Scholar

Gries, Stefan. 2003. Multifactorial analysis in corpus linguistics: A study of particle placement. New York: Continuum Press.Search in Google Scholar

Gries, Stefan. 2011. Corpus data in usage-based linguistics: What’s the right degree of granularity in argument structure constructions? In Brda, M., Zic Fuchs, M. (eds.), Expanding Cognitive Linguistic Horizons, 237–256. Amsterdam and Philadelphia: John Benjamins.10.1075/hcp.32.15griSearch in Google Scholar

Karlgren, Jussi, & Douglass Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proceedings of the 15th conference on Computational linguistics 2. 1071–1075.10.3115/991250.991324Search in Google Scholar

Stamatatos, Efstathios, Nikos Fakotakis, & George Kokkinakis. 2000. Automatic text categorization in terms of genre and author. Computational Linguistics 26(4). 471–495.10.1162/089120100750105920Search in Google Scholar

Staples, S., Egbert, J., Biber, D., & Conrad, S. 2015. Register variation: A corpus approach. In Deborah Schiffrin, Deborah Tannen, and Heidi Hamilton (eds.), The Handbook of Discourse Analysis. Oxford: Blackwell.10.1002/9781118584194.ch24Search in Google Scholar

Tambouratzis, George, Stella Markantonatou, Nikolaos Hairetakis, Marina Vassiliou, & George Carayannis. 2004. Discriminating the registers and styles in the Modern Greek language – part 2: Extending the feature vector to optimize author discrimination. Literary and Linguistic Computing 19(2). 221–242.10.1093/llc/19.2.221Search in Google Scholar

Published Online: 2018-08-31

Published in Print: 2018-09-25

Do all roads lead to Rome?: Modeling register variation with factor analysis and discriminant analysis

Abstract

References

Journal and Issue

Articles in the same Issue