Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton May 10, 2019

Effects of task and corpus-derived association scores on the online processing of collocations

  • Kyla McConnell

    Kyla McConnellKyla McConnell is a Ph.D. candidate in Linguistics at the English Department of the University of Freiburg (Germany). Her primary research interests center around her ongoing dissertation “Individual Differences and Task Effects in Predictive Coding”. This research focuses on topics such as the extent to which quantitative and corpus-derived variables can reflect the cognition of individual speakers and how speaker- and task-based variables can modulate language processing. In this, she works with various psycho- and neurolinguistic experimental paradigms and statistical methods to align large-scale data with real-time language comprehension.

    She previously studied English Language and Linguistics at the University of Freiburg (Germany), and Hispanic Linguistics and German Language and Literature at the University of North Carolina at Chapel Hill (USA).

    EMAIL logo
    and Alice Blumenthal-Dramé

    Alice Blumenthal-DraméDr. Alice Blumenthal-Dramé currently works as an Assistant Professor in English Linguistics at the English Department of the University of Freiburg (Germany). She studied English Philology, Slavic Philology, Computational Linguistics and General Linguistics at the University of Manchester (UK), the Lomonosov University of Moscow (Russian Federation), and the University of Freiburg (Germany), where she received her PhD in 2011.

    Her publications exploit behavioral and functional neuroimaging methods to explore the extent to which statistical generalizations across “big data” (notably, large-scale text corpora and databases derived from such corpora) have the potential to offer realistic insights into language users’ cognition. Major motivations behind this research have been: (1) to put to the test the cognitive reality of cognitive linguistic assumptions, and (2) to gain a better understanding of the size and nature of the cognitive building blocks that are utilized in natural language use.

    Further research interests include morphological theories, psycholinguistic models, Gestalt psychology, usage-based linguistics, language typology, and statistical methods.

Abstract

In the following self-paced reading study, we assess the cognitive realism of six widely used corpus-derived measures of association strength between words (collocated modifier–noun combinations like vast majority): MI, MI3, Dice coefficient, T-score, Z-score, and log-likelihood. The ability of these collocation metrics to predict reading times is tested against predictors of lexical processing cost that are widely established in the psycholinguistic and usage-based literature, respectively: forward/backward transition probability and bigram frequency. In addition, the experiment includes the treatment variable of task: it is split into two blocks which only differ in the format of interleaved comprehension questions (multiple choice vs. typed free response). Results show that the traditional corpus-linguistic metrics are outperformed by both backward transition probability and bigram frequency. Moreover, the multiple-choice condition elicits faster overall reading times than the typed condition, and the two winning metrics show stronger facilitation on the critical word (i.e. the noun in the bigrams) in the multiple-choice condition. In the typed condition, we find an effect that is weaker and, in the case of bigram frequency, longer lasting, continuing into the first spillover word. We argue that insufficient attention to task effects might have obscured the cognitive correlates of association scores in earlier research.

About the authors

Kyla McConnell

Kyla McConnellKyla McConnell is a Ph.D. candidate in Linguistics at the English Department of the University of Freiburg (Germany). Her primary research interests center around her ongoing dissertation “Individual Differences and Task Effects in Predictive Coding”. This research focuses on topics such as the extent to which quantitative and corpus-derived variables can reflect the cognition of individual speakers and how speaker- and task-based variables can modulate language processing. In this, she works with various psycho- and neurolinguistic experimental paradigms and statistical methods to align large-scale data with real-time language comprehension.

She previously studied English Language and Linguistics at the University of Freiburg (Germany), and Hispanic Linguistics and German Language and Literature at the University of North Carolina at Chapel Hill (USA).

Alice Blumenthal-Dramé

Alice Blumenthal-DraméDr. Alice Blumenthal-Dramé currently works as an Assistant Professor in English Linguistics at the English Department of the University of Freiburg (Germany). She studied English Philology, Slavic Philology, Computational Linguistics and General Linguistics at the University of Manchester (UK), the Lomonosov University of Moscow (Russian Federation), and the University of Freiburg (Germany), where she received her PhD in 2011.

Her publications exploit behavioral and functional neuroimaging methods to explore the extent to which statistical generalizations across “big data” (notably, large-scale text corpora and databases derived from such corpora) have the potential to offer realistic insights into language users’ cognition. Major motivations behind this research have been: (1) to put to the test the cognitive reality of cognitive linguistic assumptions, and (2) to gain a better understanding of the size and nature of the cognitive building blocks that are utilized in natural language use.

Further research interests include morphological theories, psycholinguistic models, Gestalt psychology, usage-based linguistics, language typology, and statistical methods.

Acknowledgments

We are grateful to Marc Brysbaert and one anonymous reviewer for their thoroughly helpful suggestions. Naturally, we take full responsibility for any errors that may remain in the text.

This research was supported by a Junior Fellowship from the Freiburg Institute for Advanced Studies to the second author.

References

Abbot-Smith, Kirsten & Michael Tomasello. 2006. Exemplar-learning and schematization in a usage-based account of syntactic acquisition. The Linguistic Review 23(3). 275–290.Search in Google Scholar

Aijmer, Karin & Bengt Altenberg. 2014. English corpus linguistics. New York & London: Routledge.Search in Google Scholar

Arnon, Inbal & Uriel Cohen Priva. 2013. More than words: The effect of multi-word frequency and constituency on phonetic duration. Language and Speech 56(3). 349–371. doi:10.1177/0023830913484891.Search in Google Scholar

Arnon, Inbal & Neal Snider. 2010. More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62(1). 67–82. doi:10.1016/j.jml.2009.09.005.Search in Google Scholar

Baayen, R. Harald. 2008. Analyzing linguistic data: A practical introduction to statistics using R. New York & Cambridge: Cambridge University Press.Search in Google Scholar

Bannard, Colin 2006. Acquiring phrasal lexicons from corpora. University of Edinburgh dissertation.Search in Google Scholar

Bannard, Colin & Elena Lieven. 2012. Formulaic language in L1 acquisition. Annual Review of Applied Linguistics 32. 3–16. doi:10.1017/S0267190512000062.Search in Google Scholar

Barton, Kamil 2018. MuMIn: Multi-Model Inference. https://CRAN.R-project.org/package=MuMIn.Search in Google Scholar

Bates, Douglas, Martin Mächler, Ben Bolker & Steve Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1). 1–48. doi:10.18637/jss.v067.i01.Search in Google Scholar

Biskup, Danuta. 1992. L1 influence on Learners’ renderings of english collocations: A Polish/German empirical study. In Vocabulary and applied linguistics, 85–93. London: Palgrave Macmillan. doi:10.1007/978-1-349-12396-4_8.Search in Google Scholar

Blumenthal-Dramé, Alice. 2012. Entrenchment in usage-based theories: What corpus data do and do not reveal about the mind (Topics in English Linguistics 83). Berlin: de Gruyter Mouton.Search in Google Scholar

Blumenthal-Dramé, Alice. 2016a. 6. Entrenchment from a psycholinguistic and neurolinguistic perspective. In Entrenchment and the psychology of language learning: How we reorganize and adapt linguistic knowledge. Berlin, Boston: De Gruyter. doi:10.1515/9783110341423-007.Search in Google Scholar

Blumenthal-Dramé, Alice 2016b. What corpus-based Cognitive Linguistics can and cannot expect from neurolinguistics. Cognitive Linguistics 27(4). doi:10.1515/cog-2016-0062Search in Google Scholar

Blumenthal-Dramé, Alice, Volkmar Glauche, Tobias Bormann, Cornelius Weiller, Mariacristina Musso & Bernd Kortmann. 2017. Frequency and chunking in derived words: A parametric fMRI study. Journal of Cognitive Neuroscience 29(7). 1162–1177. doi:10.1162/jocn_a_01120.Search in Google Scholar

Blumenthal-Dramé, Alice & Evie Malaia. 2018. Shared neural and cognitive mechanisms in action and language: The multiscale information transfer framework. Wiley Interdisciplinary Reviews: Cognitive Science e1484. doi:10.1002/wcs.1484Search in Google Scholar

Boston, Marisa, John Hale, Reinhold Kliegl, Umesh Patil & Shravan Vasishth. 2008. Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus. Journal of Eye Movement Research 2(1). 1, 1–12.Search in Google Scholar

Bybee, Joan. 2010. Language, usage and cognition. Cambridge; New York: Cambridge University Press.Search in Google Scholar

Bybee, Joan & James L. McClelland. 2005. Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review 22(2–4). 381–410.Search in Google Scholar

Caldwell-Harris, Catherine L. & Alison L. Morris. 2008. Fast Pairs: A visual word recognition paradigm for measuring entrenchment, top-down effects, and subjective phenomenology. Consciousness and Cognition 17(4). 1063–1081. doi:10.1016/j.concog.2008.09.004.Search in Google Scholar

Carreiras, Manuel, Blair C. Armstrong, Manuel Perea & Ram Frost. 2014. The what, when, where, and how of visual word recognition. Trends in Cognitive Sciences 18(2). 90–98. doi:10.1016/j.tics.2013.11.005.Search in Google Scholar

Chater, Nick & Morten H. Christiansen. 2018. Language acquisition as skill learning. Current Opinion in Behavioral Sciences (The Evolution of Language) 21. 205–208. doi:10.1016/j.cobeha.2018.04.001.Search in Google Scholar

Christiansen, Morten H. & Inbal Arnon. 2017. More than words: The role of multiword sequences in language learning and use. Topics in Cognitive Science 9(3). 542–551. doi:10.1111/tops.12274.Search in Google Scholar

Christiansen, Morten H. & Nick Chater. 2016. The Now-or-Never bottleneck: A fundamental constraint on language. Behavioral and Brain Sciences 39. doi:10.1017/S0140525X1500031X.Search in Google Scholar

Clark, Andy. 2013. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences 36(03). 181–204. doi:10.1017/S0140525X12000477.Search in Google Scholar

Clark, Andy. 2016. Surfing uncertainty: Prediction, action, and the embodied mind. New York: Oxford University Press.Search in Google Scholar

Conklin, Kathy & Norbert Schmitt. 2012. The processing of formulaic language. Annual Review of Applied Linguistics 32. 45–61. doi:10.1017/S0267190512000074.Search in Google Scholar

Croft, William. 2001. Radical construction grammar: Syntactic theory in typological perspective. New York: Oxford University Press.Search in Google Scholar

Dąbrowska, Ewa. 2014. Words that go together: Measuring individual differences in native speakers’ knowledge of collocations. The Mental Lexicon 9(3). 401–418. doi:10.1075/ml.9.3.02dab.Search in Google Scholar

Demberg, Vera & Frank Keller. 2008. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition 109(2). 193–210. doi:10.1016/j.cognition.2008.07.008.Search in Google Scholar

Deuter, Margaret, James Greenan, Joseph Noble, Janet Phillips & Diana Lea. 2002. Oxford collocations dictionary. Oxford: Oxford University Press.Search in Google Scholar

Drummond, Alex 2016. Ibex Farm. http://spellout.net/ibexfarm/.Search in Google Scholar

Durrant, Philip & Alice Doherty 2010. Are high-frequency collocations psychologically real? Investigating the thesis of collocational priming. Corpus Linguistics and Linguistic Theory 6(2). doi:10.1515/cllt.2010.006Search in Google Scholar

Ellis, Nick C. 2002. Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition 24(2). 143–188. doi:10.1017/S0272263102002024.Search in Google Scholar

Ellis, Nick C., Rita Simpson-Vlach & Carson Maynard. 2008. Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly 42(3). 375–396. doi:10.1002/j.1545-7249.2008.tb00137.x.Search in Google Scholar

Evert, Stefan. 2009. Corpora and collocations. In Anke Lüdeling & Merja Kytö (eds.), Corpus linguistics: An international handbook, vol. 2. 1212–1248. Berlin, New York: Mouton de Gruyter.Search in Google Scholar

Frank, Stefan L. 2013. Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in Cognitive Science 5(3). 475–494. doi:10.1111/tops.12025.Search in Google Scholar

Frank, Stefan L. & Rens Bod. 2011. Insensitivity of the human sentence-processing system to hierarchical structure. Psychological Science 22(6). 829–834. doi:10.1177/0956797611409589.Search in Google Scholar

Frank, Stefan L., Leun J. Otten, Giulia Galli & Gabriella Vigliocco. 2015. The ERP response to the amount of information conveyed by words in sentences. Brain and Language 140. 1–11. doi:10.1016/j.bandl.2014.10.006.Search in Google Scholar

Gollan, Tamar H., Timothy J. Slattery, Diane Goldenberg, Eva Van Assche, Wouter Duyck & Keith Rayner. 2011. Frequency drives lexical access in reading but not in speaking: The frequency-lag hypothesis. Journal of Experimental Psychology: General 140(2). 186–209. doi:10.1037/a0022256.Search in Google Scholar

Gries, Stefan Th. 2013. 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics 18(1). 137–166. doi:10.1075/ijcl.18.1.09gri.Search in Google Scholar

Gries, Stefan Th. & Nick C. Ellis. 2015. Statistical measures for usage-based linguistics. Language Learning 65(S1). 228–255. doi:10.1111/lang.12119.Search in Google Scholar

Gurevich, Olga, Matthew A. Johnson & Adele E. Goldberg. 2010. Incidental verbatim memory for language. Language and Cognition 2(1). 45–78. doi:10.1515/langcog.2010.003.Search in Google Scholar

Hale, John. 2016. Information-theoretical complexity metrics. Language and Linguistics Compass 10(9). 397–412. doi:10.1111/lnc3.12196.Search in Google Scholar

Hay, J. & R. Baayen. 2005. Shifting paradigms: Gradient structure in morphology. Trends in Cognitive Sciences 9(7). 342–348. doi:10.1016/j.tics.2005.04.002.Search in Google Scholar

Hintz, Florian, Antje S. Meyer & Falk Huettig. 2016. Encouraging prediction during production facilitates subsequent comprehension: Evidence from interleaved object naming in sentence context and sentence reading. The Quarterly Journal of Experimental Psychology 69(6). 1056–1063. doi:10.1080/17470218.2015.1131309.Search in Google Scholar

Hoffmann, Sebastian. 2008. Corpus linguistics with BNCweb: A practical guide (English Corpus Linguistics v. 6). Frankfurt am Main: Peter Lang.Search in Google Scholar

Hohwy, Jakob. 2013. The predictive mind. 1st ed. Oxford, New York: Oxford University Press.Search in Google Scholar

Howarth, Peter. 1998. Phraseology and second language proficiency. Applied Linguistics 19(1). 24–44. doi:10.1093/applin/19.1.24.Search in Google Scholar

Huang, Yanping & Rajesh P. N. Rao. 2011. Predictive coding. Wiley Interdisciplinary Reviews: Cognitive Science 2(5). 580–593. doi:10.1002/wcs.142.Search in Google Scholar

In’nami, Yo & Rie Koizumi. 2009. A meta-analysis of test format effects on reading and listening test performance: Focus on multiple-choice and open-ended formats. Language Testing 26(2). 219–244. doi:10.1177/0265532208101006.Search in Google Scholar

Ito, Aine, Martin Corley & Martin J. Pickering. 2018. A cognitive load delays predictive eye movements similarly during L1 and L2 comprehension. Bilingualism: Language and Cognition 21(2). 251–264. doi:10.1017/S1366728917000050.Search in Google Scholar

Jacobs, Cassandra L., Gary S. Dell, Aaron S. Benjamin & Colin Bannard. 2016. Part and whole linguistic experience affect recognition memory for multiword sequences. Journal of Memory and Language 87. 38–58. doi:10.1016/j.jml.2015.11.001.Search in Google Scholar

Jiang, Nan & Tatiana M. Nekrasova. 2007. The processing of formulaic sequences by second language speakers. The Modern Language Journal 91(3). 433–445.Search in Google Scholar

Just, Marcel A., Patricia A. Carpenter & Jacqueline D. Woolley. 1982. Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General 111(2). 228–238. doi:10.1037/0096-3445.111.2.228.Search in Google Scholar

Kuperberg, Gina R. & T. Florian Jaeger. 2016. What do we mean by prediction in language comprehension? Language, Cognition and Neuroscience 31(1). 32–59. doi:10.1080/23273798.2015.1102299.Search in Google Scholar

Kuznetsova, Alexandra, Per B. Brockhoff & Rune H. B. Christensen 2017. lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software 82(13). doi:10.18637/jss.v082.i13Search in Google Scholar

Levshina, Natalia. 2015. How to do linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins Publishing Company.Search in Google Scholar

Levy, Roger. 2008. Expectation-based syntactic comprehension. Cognition 106(3). 1126–1177. doi:10.1016/j.cognition.2007.05.006.Search in Google Scholar

Linzen, Tal & T. Florian Jaeger 2015. Uncertainty and expectation in sentence processing: Evidence from subcategorization distributions. Cognitive Science 40(6). doi:10.1111/cogs.12274Search in Google Scholar

Lowder, Matthew W., Wonil Choi, Fernanda Ferreira & John M. Henderson. 2018. Lexical predictability during natural reading: Effects of surprisal and entropy reduction. Cognitive Science doi:10.1111/cogs.12597.Search in Google Scholar

Martyńska, Małgorzata. 2004. Do English language learners know collocations? Investigationes Linguisticae 11. 1–12. doi:10.14746/il.2004.11.4.Search in Google Scholar

McCauley, Stewart M. & Morten H. Christiansen. 2017. Computational investigations of multiword chunks in language learning. Topics in Cognitive Science 9(3). 637–652. doi:10.1111/tops.12258.Search in Google Scholar

O’Grady, William. 2008. The emergentist program. Lingua 118(4). 447–464. doi:10.1016/j.lingua.2006.12.001.Search in Google Scholar

Payne, Brennan R. & Kara D. Federmeier. 2017. Pace yourself: Intraindividual variability in context use revealed by self-paced event-related brain potentials. Journal of Cognitive Neuroscience 29(5). 837–854. doi:10.1162/jocn_a_01090.Search in Google Scholar

Rodriguez, Michael C. 2006. Construct equivalence of multiple-choice and constructed-response items: A random effects synthesis of correlations. Journal of Educational Measurement 40(2). 163–184. doi:10.1111/j.1745-3984.2003.tb01102.x.Search in Google Scholar

Siyanova, Anna & Norbert Schmitt. 2008. L2 learner production and processing of collocation: A multi-study perspective. Canadian Modern Language Review doi:10.3138/cmlr.64.3.429.Search in Google Scholar

Siyanova-Chanturia, Anna 2015. On the ‘holistic’ nature of formulaic language. Corpus Linguistics and Linguistic Theory 0(0). doi:10.1515/cllt-2014-0016Search in Google Scholar

Siyanova-Chanturia, Anna, Kathy Conklin, Sendy Caffarra, Edith Kaan & Walter J. B. van Heuven. 2017. Representation and processing of multi-word expressions in the brain. Brain and Language 175. 111–122. doi:10.1016/j.bandl.2017.10.004.Search in Google Scholar

Smith, Nathaniel J. & Roger Levy. 2013. The effect of word predictability on reading time is logarithmic. Cognition 128(3). 302–319. doi:10.1016/j.cognition.2013.02.013.Search in Google Scholar

Tremblay, Antoine & Harald Baayen. 2009. Holistic processing of regular four-word sequences. Perspectives on Formulaic Language in Acquisition and Production. London and New York: Continuum.Search in Google Scholar

Tremblay, Antoine, Bruce Derwing, Gary Libben & Chris Westbury. 2011. Processing advantages of lexical bundles: Evidence from self-paced reading and sentence recall tasks: Lexical bundle processing. Language Learning 61(2). 569–613. doi:10.1111/j.1467-9922.2010.00622.x.Search in Google Scholar

Tremblay, Antoine & Benjamin V. Tucker. 2011. The effects of N-gram probabilistic measures on the recognition and production of four-word sequences. The Mental Lexicon 6(2). 302–324. doi:10.1075/ml.6.2.04tre.Search in Google Scholar

Wei, Taiyun & Viliam Simko. 2017. R package “corrplot”: Visualization of a correlation matrix. https://github.com/taiyun/corrplot.Search in Google Scholar

Wiechmann, Daniel 2008. On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory 4(2). doi:10.1515/CLLT.2008.011Search in Google Scholar

Wlotko, Edward W. & Kara D. Federmeier. 2015. Time for prediction? The effect of presentation rate on predictive sentence comprehension during word-by-word reading. Cortex 68. 20–32. doi:10.1016/j.cortex.2015.03.014.Search in Google Scholar

Wurm, Lee H. & Sebastiano A. Fisicaro. 2014. What residualizing predictors in regression analyses does (and what it does not do). Journal of Memory and Language 72. 37–48. doi:10.1016/j.jml.2013.12.003.Search in Google Scholar

  Appendix: Materials used in the experiments

Table A1:

Full list of experimental stimuli, along with different associations scores between the modifier and the critical word (“Word”) as well as raw frequencies for the modifier (ModFreq), the critical word (NounFreq), and the whole bigram (BigramFreq), all extracted from the BNC (cf. Section 2.1).

Full sentenceWordMIMI3Z-scoreT-scoreLog-likelihoodDiceFTPBTPModFreqNounFreqBigram-Freq
Katy was surrounded by foreign accents on the train. Accents 6.8727 16.5887 56.7944 5.3392 219.3252 0.0033 0.0018 0.0164 15,943 1,764 29
There existed a strong argument against the bill. Argument 5.5428 18.0386 57.8537 8.5308 436.0662 0.0056 0.004 0.0063 19,222 11,996 76
Emma discerned the bad attitude of her client. Attitude 3.0282 9.947 7.881 2.9101 26.8894 0.0009 0.0004 0.001 25,615 10,614 11
Tanner purchased the elastic band that he needed. Band 10.8746 21.2144 256.2276 5.9968 476.3258 0.0082 0.0835 0.0042 431 8,663 36
Scott contemplated the humble beginning of the movement. Beginning 9.4352 19.6938 153.2099 5.9075 389.7183 0.0084 0.0467 0.0046 749 7,603 35
Amber enjoyed a refreshing beverage under the stars. Beverage 11.2777 13.2777 52.8205 1.4136 27.2982 0.0063 0.0047 0.0089 426 224 2
Ava documented the majestic bird in her journal. Bird 6.4239 8.4239 9.6755 1.3977 13.8717 0.0004 0.0071 0.0002 282 9,467 2
The moldy bread was thrown away. Bread 10.1971 14.1971 59.8999 1.9983 48.707 0.0022 0.0367 0.0011 109 3,672 4
Ryan chose a fast car at the dealership. Car 6.908 19.6581 98.4071 9.0346 633.3937 0.0047 0.0167 0.0024 4,975 33,942 83
Phoebe started a brief chat with the postman. Chat 7.0298 11.6737 22.8095 2.219 38.8353 0.0018 0.001 0.0053 4,947 944 5
Bentley mentioned the wild child and his mother. Child 2.1454 9.3154 5.3359 2.6811 17.1356 0.0003 0.0023 0.0002 5,308 69,271 12
This vicious circle seems unbreakable sometimes. Circle 11.8743 26.0492 711.7303 11.6588 1,993.7023 0.0484 0.1598 0.0276 851 4,929 136
Due to the mitigating circumstances Sarah was released. Circumstances 12.1365 20.1365 259.9508 3.9991 245.4345 0.003 0.2667 0.0015 60 10,824 16
Today civilian clothes are being washed. Clothes 9.5153 19.4237 147.9851 5.5602 348.5448 0.0083 0.0332 0.0056 1,173 6,920 31
Kyra always had decaffeinated coffee with her toast. Coffee 13.3784 18.9931 253.4934 2.6455 121.0784 0.0023 0.3889 0.0011 18 6,360 7
Last year provided favorable conditions for job creation. Conditions 7.1523 18.0048 76.7503 6.5113 342.2991 0.0035 0.0295 0.0018 1,457 23,511 43
Robert listened to his guilty conscience when making decisions. Conscience 9.6139 20.2578 174.5992 6.3165 454.7506 0.0146 0.0098 0.0283 4,078 1,415 40
Tarek’s firm conviction persuaded the politicians. Conviction 7.431 13.7709 36.9915 2.9826 74.885 0.0037 0.0025 0.0033 3,615 2,745 9
Ty commented on the petty crime plaguing the city. Crime 9.1959 19.3708 138.8888 5.821 367.2017 0.0074 0.0437 0.0039 778 8,631 34
Gregory predicted the grave danger associated with lead. Danger 10.0774 20.9299 212.8371 6.5514 518.1569 0.0108 0.0447 0.0058 961 7,424 43
Connor was informed about the great deal on designer jeans. Deal 9.791 33.7384 1,885.2475 63.3477 48,526.0409 0.1449 0.0635 0.3463 63,349 11,613 4,022
Barbara remembered the heated debate at the meeting. Debate 10.6063 22.6063 313.1955 7.9949 821.0008 0.0149 0.0989 0.0079 647 8,071 64
Tris watched the crushing defeat unfold on TV. Defeat 12.1905 21.1094 313.3181 4.6894 331.0624 0.0129 0.1063 0.0067 207 3,275 22
Trevor was fascinated by the murky depths of the ocean. Depths 9.7334 15.3481 71.5885 2.6426 80.6973 0.0033 0.0298 0.0017 235 4,045 7
Something about the rich dessert made David ill. Dessert 7.6399 12.2838 28.2633 2.2249 43.0689 0.0014 0.0007 0.0114 7,655 437 5
Brenda prioritized a balanced diet and regular exercise. Diet 10.979 23.5126 391.4779 8.7706 1,025.7188 0.0284 0.0711 0.0169 1,083 4,543 77
Rosa spoke about reckless driving to the kids. Driving 13.4627 25.8025 895.2282 8.4845 1,213.8078 0.0773 0.1333 0.0536 540 1,343 72
That is a prime example of Renaissance art. Example 4.9585 19.1755 63.1599 11.3695 683.4083 0.0052 0.0115 0.0032 11,954 43,028 138
A characteristic feature defined Larry’s face. Feature 9.2357 20.6366 175.0611 7.1991 565.8966 0.0075 0.031 0.0039 1,678 13,295 52
It is well known that itchy feet drive people crazy. Feet 10.2877 17.4576 117.2715 3.4613 150.1117 0.0012 0.1579 0.0006 76 20,412 12
Ahmad’s son anticipated the epileptic fit before it happened. Fit 15.5188 24.4377 993.1906 4.6903 441.0314 0.0407 0.2651 0.0212 83 1,036 22
Mohammed took note of the good food at the pub. Food 4.4897 21.1871 81.6389 17.2518 1,412.1 0.0065 0.0026 0.0157 125,701 20,722 326
Kevin envied the small fortune his brother inherited. Fortune 6.3079 19.2268 81.9699 9.2624 598.4906 0.0039 0.0017 0.0293 50,353 3,005 88
Lyssa spotted her close friend in the crowd. Friend 7.6776 25.9973 340.2379 23.7997 4,992.3611 0.0281 0.0382 0.0185 14,964 30,860 572
Luca smelled the rotten fruit on the counter. Fruit 8.232 14.8759 51.9096 3.1518 94.3375 0.0036 0.013 0.002 767 4,985 10
The kid played on the green grass near the school. Grass 7.4811 18.3337 86.1413 6.5207 361.1591 0.0081 0.0044 0.0101 9,759 4,250 43
Clarissa observed the stunted growth of the plant. Growth 8.731 13.3749 41.375 2.2308 50.7874 0.0008 0.0424 0.0004 118 12,875 5
Sage’s tiresome habit quickly became annoying. Habit 7.9992 9.9992 16.8773 1.4087 18.2124 0.001 0.0087 0.0005 230 3,838 2
Katrina discovered blonde hair in the bathroom. Hair 10.904 26.6692 670.6813 15.3543 3,160.4447 0.0318 0.2857 0.0167 826 14,100 236
Clara offered a helping hand to the workers. Hand 11.1344 25.2666 546.5517 11.5707 2,068.7274 0.0054 1 0.0027 134 50,168 134
Bartholomew praised the magnificent house and its owners. House 2.3089 6.9528 3.4752 1.7848 8.0307 0.0002 0.0025 0.0001 1,970 57,866 5
It was not a bright idea to visit Crystal. Idea 5.8521 18.6367 68.0362 9.0065 517.92 0.0046 0.0142 0.0026 5,905 31,856 84
Chris was aware of the debilitating illness and its consequences. Illness 11.0078 17.9267 143.586 3.315 146.6632 0.0057 0.0671 0.003 164 3,718 11
Floyd criticized the direct impact of the pollution. Impact 6.3701 17.7149 63.5276 7.0551 350.2185 0.0061 0.005 0.0067 10,303 7,614 51
The object’s vital importance cannot be overstated. Importance 8.0159 21.6047 168.0816 10.4949 1,016.0611 0.0152 0.0221 0.0116 5,033 9,574 111
Henry grabbed the blunt instrument and appraised it. Instrument 11.0284 22.0122 303.0333 6.705 603.1051 0.0154 0.0959 0.0083 469 5,450 45
Laura knew about the government’s vested interest in the change. Interest 11.5152 28.1949 971.9832 17.9939 5,085.8702 0.0175 0.9529 0.0087 340 37,144 324
Carla was a born leader her teachers said. Leader 7.4864 12.1303 26.7821 2.2236 42.0777 0.0006 0.0188 0.0003 266 16,343 5
Courtney acknowledged that communal living had many benefits. Living 9.2425 17.4175 98.3246 4.1163 184.3508 0.0066 0.024 0.0038 707 4,509 17
Brianna understood that unrequited love could be painful. Love 11.9894 21.8032 343.3493 5.4759 457.5023 0.0043 0.5085 0.0021 59 14,231 30
Paul had a light lunch before the interview. Lunch 7.8723 18.3681 92.7151 6.1381 339.7126 0.0088 0.0055 0.0072 6,870 5,256 38
Saul realized that the vast majority had voted incorrectly. Majority 11.0409 30.5339 1,343.8702 29.2948 11,678.7503 0.118 0.1866 0.0859 4,604 9,997 859
Louise complimented the old man in her neighborhood. Man 5.912 28.6512 392.4456 50.585 16,686.0853 0.0362 0.0423 0.0277 62,612 95,595 2,646
Nellie pinpointed the ulterior motive of the banker. Motive 15.1263 26.1735 1,268.7073 6.7821 911.9709 0.0449 0.6301 0.0231 73 1,990 46
Tess ensured the classical music was showcased correctly. Music 8.4305 22.7096 219.1371 11.8399 1,374.3772 0.0161 0.0442 0.0095 3,188 14,800 141
Achim dismissed the preconceived notion with a sigh. Notion 12.007 19.6217 231.4156 3.7407 207.5266 0.0061 0.1556 0.0031 90 4,503 14
Christian wrote about the auspicious occasion in his memoir. Occasion 9.37 14.0138 51.6796 2.2327 55.2327 0.0011 0.0526 0.0006 95 8,928 5
Priya found the commissioned officer sitting around outside. Officer 10.045 16.6888 97.5457 3.1593 121.0114 0.0011 0.1408 0.0006 71 17,640 10
The senior officials ultimately decided everything. Officials 8.743 24.663 325.2122 15.7429 2,536.6022 0.0308 0.0305 0.0303 8,152 8,207 249
Vladimir maintained a brisk pace throughout the walk. Pace 10.2516 18.4265 139.6072 4.1197 208.3316 0.009 0.0341 0.0051 499 3,330 17
Philippa experienced excruciating pain in her legs. Pain 11.3482 19.5232 204.2522 4.1215 236.8076 0.0043 0.1828 0.0021 93 8,034 17
The business was based on common people and their desires. People 2.4278 15.9376 19.5146 8.4609 188.1064 0.0016 0.0057 0.0009 18,969 123,085 108
Pablo regularly tested the moral principles of his employees. Principles 7.0264 19.5972 99.4183 8.764 606.6328 0.0084 0.0152 0.0057 5,130 13,737 78
Clemence was intrigued by the peaceful protest in the capital. Protest 9.1073 18.9211 126.2527 5.4673 319.6689 0.0111 0.0187 0.0077 1,603 3,888 30
Hank noticed the pouring rain and stayed inside. Rain 13.2279 24.7376 713.0229 7.3477 914.084 0.019 0.2241 0.0088 241 6,127 54
Chandler lost the avid reader in the library. Reader 10.9147 19.9618 206.0215 4.7933 305.5509 0.0054 0.1429 0.0026 161 8,699 23
Heather adjusted to the harsh reality after the war. Reality 9.5549 21.8544 229.1578 8.4149 802.7956 0.0165 0.0424 0.0099 1,673 7,187 71
The group thought that human rights were very important. Rights 8.7995 29.6394 779.2654 36.9304 14,208.3554 0.0668 0.076 0.0467 18,017 29,348 1,370
Brian examined the tidy room and was satisfied. Room 3.8311 7.001 4.9861 1.6103 10.3659 0.0002 0.004 0.0001 743 34,119 3
Neveah took in the incredible scenery all around her. Scenery 6.9744 6.9744 5.5177 0.992 7.6866 0.001 0.0008 0.0013 1,195 748 1
Charlie gave a speech about military service in the eighties. Service 5.8261 22.1052 124.0316 16.4969 1,732.5064 0.0088 0.0264 0.0052 10,691 54,457 282
Redmond considered it a crying shame to be poor. Shame 15.935 24.7197 1,119.6926 4.5825 457.9808 0.0243 0.9545 0.0115 22 1,828 21
Ginny made sure that a fair share was allocated today. Share 8.0057 23.879 249.4495 15.5916 2,243.6862 0.0217 0.0288 0.0153 8,495 16,013 245
Benny figured a quick shower would be nice. Shower 6.8916 13.2314 30.5977 2.9747 68.1962 0.0028 0.0014 0.0049 6,297 1,837 9
Alaina followed the putrid smell to the kitchen. Smell 12.2163 17.3862 154.854 2.449 90.3779 0.0042 0.12 0.0021 50 2,850 6
Ahmed checked the fertile soil for invasive insects. Soil 10.9615 22.1909 309.2366 6.9965 651.3057 0.0188 0.0822 0.0104 596 4,723 49
Gabe is a brave soul for going skydiving. Soul 8.1374 15.7521 60.3216 3.7284 130.2129 0.0054 0.0082 0.0038 1,709 3,675 14
Taylor reacted to the high speed of the serve. Speed 7.7923 25.6 324.0409 21.7873 4,258.4001 0.0243 0.0085 0.0617 56,487 7,764 479
Morgan ignored the budding star despite her persistence. Star 8.4969 13.6668 42.5484 2.4427 58.8976 0.0012 0.0279 0.0006 215 9,867 6
Brooke gossiped about the beautiful stranger on the train. Stranger 2.6426 2.6426 0.8493 0.8399 1.9841 0.0002 0.0001 0.0005 8,377 2,179 1
Sherry had cosmetic surgery done too often. Surgery 12.6951 24.096 581.4858 7.21 821.0537 0.0341 0.1368 0.0188 380 2,772 52
Brendon got a glimpse of the losing team before they left. Team 7.6102 10.7801 20.0518 1.7232 25.7968 0.0003 0.0159 0.0001 189 22,401 3
Dominic rejoiced about the free time he now had. Time 2.2962 16.7921 21.6708 9.8187 242.5858 0.0015 0.0077 0.0008 19,674 180,243 152
Carl sensed that precious time was running out. Time 4.3458 15.6907 30.3035 6.7902 211.7938 0.0006 0.0324 0.0003 1,575 180,243 51
Shelly was interested in ancient times and faraway lands. Times 3.2468 15.5866 23.2075 7.5914 196.1424 0.0008 0.0148 0.0004 4,857 180,243 72
Tom heard the heavy traffic from his window. Traffic 7.4875 20.4712 125.6743 9.434 757.3755 0.0118 0.0089 0.0139 10,118 6,467 90
Hal questioned the narrow victory of his opponent. Victory 6.7387 16.2485 52.2014 5.1475 199.0077 0.005 0.0052 0.0044 5,194 6,152 27
Bella’s loud voice carried the choir. Voice 8.9794 21.8653 207.9339 9.3089 917.0664 0.009 0.0405 0.0047 2,146 18,371 87
Matthew felt the tepid water with his toe. Water 9.5393 18.0352 115.6138 4.353 218.1647 0.0011 0.2346 0.0005 81 36,381 19
Sam recalled the mild winter three years ago. Winter 8.6834 19.1792 123.0403 6.1494 382.8209 0.0088 0.0224 0.0052 1,698 7,370 38
Gertrude passed by a disillusioned youth on the corner. Youth 6.7726 6.7726 5.1325 0.9909 7.413 0.0003 0.0047 0.0002 212 6,202 1
Katy was surrounded by foreign accents on the train. Accents 6.8727 16.5887 56.7944 5.3392 219.3252 0.0033 0.0018 0.0164 15,943 1,764 29
Range 2.1454–15.9350 2.6426–33.7384 0.8493–1,885.2475 0.8399–63.3477 1.9841–48,526.0409 0.0002–0.1449 0.0001–1.0000 0.0001–0.3463 18–125,701 224–180,243 1–4,022
Mean 8.673872 18.72445 243.1973 8.192896 1,591.01 0.01342043 0.09920968 0.01285376 7,364.688 20,040.41 159.3333
Standard deviation 2.97298 5.744573 335.2631 9.480713 5,572.751 0.02256647 0.1919674 0.03758788 17,522.47 35,324.28 518.1823
  1. Note: For greater ease of readability, the five last columns of this table are not log-transformed. FTP: Forward transition probability; BTP: backward transition probability; ModFreq: modifier frequency; NounFreq:  noun frequency; BigramFreq: bigram frequency.

Figure A1: Scatterplot representing the relation between log-transformed bigram frequency (logBigramFreq) and log-transformed backward transition probability (logBackwardTP) in the 91 collocations used in the present study (Pearson’s correlation: 0.7029, p < 0.0000).
Figure A1:

Scatterplot representing the relation between log-transformed bigram frequency (logBigramFreq) and log-transformed backward transition probability (logBackwardTP) in the 91 collocations used in the present study (Pearson’s correlation: 0.7029, p < 0.0000).


Supplementary Material

The online version of this article offers supplementary material (DOI:https://doi.org/10.1515/cllt-2018-0030).


Published Online: 2019-05-10
Published in Print: 2022-02-23

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 24.4.2024 from https://www.degruyter.com/document/doi/10.1515/cllt-2018-0030/html
Scroll to top button