Abstract
Previous theoretical and empirical research on register variation has argued that linguistic co-occurrence patterns have a highly systematic relationship to register differences, because they both share the same functional underpinnings. The goal of this study is to test this claim through a comparison of two statistical techniques that have been used to describe register variation: factor analysis (as used in Multi-Dimensional analysis, MDA) and canonical discriminant analysis (CDA). MDA and CDA have different statistical bases and thus give priority to different analytical considerations: linguistic co-occurrence in the case of MDA and the prediction of register differences in the case of CDA. Thus, there is no statistical reason to expect that the two techniques, if applied to the same corpus, will produce similar results. We hypothesize that although MDA and CDA approach register variation from opposite sides, they will produce similar results because both types of statistical patterns are motivated by underlying discourse functions. The present paper tests this claim through a case-study analysis of variation among web registers, applying MDA and CDA to analyze register variation in the same corpus of texts.
Funding statement: National Science Foundation, Directorate for Social, Behavioral and Economic Sciences, Division of Behavioral and Cognitive Sciences (Grant/Award Number: 1147581).
References
Atkinson, Dwight & Douglas Biber. 1994. Register: A review of empirical research. In D. Biber & E. Finegan (eds.), Sociolinguistic perspectives on register, 351–385. Oxford: Oxford University Press.Search in Google Scholar
Baayen, R. Harald. 2009. Corpus linguistics in morphology: Morphological productivity. In A. Luedeling & M. Kyto (eds.), Corpus linguistics. An international handbook, 900–919. Berlin: Mouton De Gruyter.Search in Google Scholar
Baayen, R. Harald, Hans Van Halteren, & Fiona Tweedie. 1996. Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11(3). 121–131.10.1093/llc/11.3.121Search in Google Scholar
Barbieri, Federica. 2009. Annotation of major register/genre studies. Appendix A in D. Biber and S. Conrad, Register, genre, and style 271–295. Cambridge: Cambridge University Press.10.1017/CBO9780511814358.010Search in Google Scholar
Berber Sardinha, Tony & Marcia Veirano Pinto, (eds.). 2014. Multi-Dimensional analysis, 25 years on – a tribute to Douglas Biber. Amsterdam: John Benjamins.10.1075/scl.60Search in Google Scholar
Biber, Douglas. 1985. Investigating macroscopic textual variation through multi-feature/multi-dimensional analyses. Linguistics 23. 337–360.10.1515/ling.1985.23.2.337Search in Google Scholar
Biber, Douglas. 1986. Spoken and written textual dimensions in English: Resolving the contradictory findings. Language 62. 384–414.10.2307/414678Search in Google Scholar
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024Search in Google Scholar
Biber, Douglas. 1992. The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings. Computers and the Humanities 26. 331–345.10.1007/BF00136979Search in Google Scholar
Biber, Douglas. 1993. Using register-diversified corpora for general language studies. Computational Linguistics 19. 219–241.Search in Google Scholar
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.10.1017/CBO9780511519871Search in Google Scholar
Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory 8. 9–37.10.1515/cllt-2012-0002Search in Google Scholar
Biber, Douglas. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1). 7–34.10.1075/lic.14.1.02bibSearch in Google Scholar
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.10.1017/CBO9780511814358Search in Google Scholar
Biber, Douglas & Jesse Egbert. 2015. Using grammatical features for automatic register identification in an unrestricted corpus of documents from the open web. Journal of Research Design and Statistics in Linguistics and Communication Science 2(1). 3–36.10.1558/jrds.v2i1.27637Search in Google Scholar
Biber, Douglas & Jesse Egbert. 2016. Register variation on the searchable web: A Multi-Dimensional analysis. Journal of English Linguistics 44(2). 95–137.10.1177/0075424216628955Search in Google Scholar
Biber, Douglas, Jesse Egbert & Mark Davies. 2015. Exploring the composition of the searchable web: A corpus-based taxonomy of web registers. Corpora 10(1). 11–45.10.3366/cor.2015.0065Search in Google Scholar
Biber, Douglas & Edward Finegan. 1988. Adverbial stance types in English. Discourse Processes 11. 1–34.10.1080/01638538809544689Search in Google Scholar
Biber, Douglas & Edward Finegan. 1989. Styles of stance in English: Lexical and grammatical marking of evidentiality and affect. Text 9. 93–124.10.1515/text.1.1989.9.1.93Search in Google Scholar
Biber, Douglas & Bethany Gray. 2016. Grammatical complexity in academic English: Linguistic change in writing. Cambridge: Cambridge University Press.10.1017/CBO9780511920776Search in Google Scholar
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, & Edward Finegan. 1999. Longman grammar of spoken and written English. Essex, UK: Pearson Education Limited.Search in Google Scholar
Carroll, John. 1960. Vectors of prose style. In T. A. Sebeok (ed.) Style in language, 283–292. Cambridge: Cambridge University Press.Search in Google Scholar
Conrad, Susan & Douglas Biber. 2001. Variation in English: Multi-dimensional studies. Essex, UK: Pearson Education Limited.Search in Google Scholar
Egbert, Jesse. 2015. Sub-register and discipline variation in published academic writing: Investigating statistical interaction in corpus data. International Journal of Corpus Linguistics 20(1). 1–29.10.1075/ijcl.20.1.01egbSearch in Google Scholar
Egbert, Jesse, Douglas Biber & Mark Davies. 2015. Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology 66(9). 1817–1831.10.1002/asi.23308Search in Google Scholar
Friginal, Eric. 2013. Twenty-five years of Biber’s multi-dimensional analysis. [Special issue]. Corpora 8(2). 137–280.10.3366/cor.2013.0038Search in Google Scholar
Gries, Stefan. 2003. Multifactorial analysis in corpus linguistics: A study of particle placement. New York: Continuum Press.Search in Google Scholar
Gries, Stefan. 2011. Corpus data in usage-based linguistics: What’s the right degree of granularity in argument structure constructions? In Brda, M., Zic Fuchs, M. (eds.), Expanding Cognitive Linguistic Horizons, 237–256. Amsterdam and Philadelphia: John Benjamins.10.1075/hcp.32.15griSearch in Google Scholar
Karlgren, Jussi, & Douglass Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proceedings of the 15th conference on Computational linguistics 2. 1071–1075.10.3115/991250.991324Search in Google Scholar
Stamatatos, Efstathios, Nikos Fakotakis, & George Kokkinakis. 2000. Automatic text categorization in terms of genre and author. Computational Linguistics 26(4). 471–495.10.1162/089120100750105920Search in Google Scholar
Staples, S., Egbert, J., Biber, D., & Conrad, S. 2015. Register variation: A corpus approach. In Deborah Schiffrin, Deborah Tannen, and Heidi Hamilton (eds.), The Handbook of Discourse Analysis. Oxford: Blackwell.10.1002/9781118584194.ch24Search in Google Scholar
Tambouratzis, George, Stella Markantonatou, Nikolaos Hairetakis, Marina Vassiliou, & George Carayannis. 2004. Discriminating the registers and styles in the Modern Greek language – part 2: Extending the feature vector to optimize author discrimination. Literary and Linguistic Computing 19(2). 221–242.10.1093/llc/19.2.221Search in Google Scholar
© 2018 Walter de Gruyter GmbH, Berlin/Boston