Abstract
This paper explores the proposed benefits of ΔP (delta P) as a measure of collocation strength. Its focus is on contrasting ΔP with other, more commonly used, association measures, particularly transitional probabilities, but also mutual information and Lexical Gravity G. To this end, first the strong correlation between ΔP and transitional probability is illustrated with the help of two exemplary corpora. This is followed by an analysis of hesitation placement in spontaneous spoken English, based on the assumption that hesitations will not be placed within strong collocations. Results show that, despite their strong similarity, in some contexts ΔP is more predictive of hesitation placement than transitional probability. Yet neither ΔP nor any of the other association measures emerges as the universally best predictor. On the basis of these results, it is suggested that studies should always rely on several association measures.
References
Allan, Lorraine G. 1980. A note on measurement of contingency between two binary variables in judgement tasks. Bulletin of the Psychonomic Society 15(3). 147–149.10.3758/BF03334492Search in Google Scholar
Arnon, Inbal & Neal Snider. 2010. More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62. 67–82.10.1016/j.jml.2009.09.005Search in Google Scholar
Baayen, R. Harald. 2008. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.10.1017/CBO9780511801686Search in Google Scholar
Baayen, R. Harald. 2009. LanguageR: Data sets and functions with ‘Analyzing Linguistic Data: A practical introduction to statistics’. R package version 0.955. http://CRAN.R-project.org/package=languageR.10.1017/CBO9780511801686Search in Google Scholar
Beattie, Geoffrey & Brian L. Butterworth. 1979. Contextual probability and word frequency as determinants of pauses and errors in spontaneous speech. Language and Speech 22(3). 201–211.10.1177/002383097902200301Search in Google Scholar
Beckner, Clay, Richard Blythe, Morten H. Joan Bybee, William Croft Christiansen, Nick C. Ellis, John Holland, Ke Jinyun, Diane Larsen-Freeman & Tom Schoeneman. 2009. Language is a complex adaptive system: Position paper. Language Learning 59(Supplement 1). 1–26.10.1111/j.1467-9922.2009.00533.xSearch in Google Scholar
Bell, Alan, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, Michelle Gregory & Daniel Gildea. 2003. Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America 113(2). 1001–1024.10.1121/1.1534836Search in Google Scholar
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Pearson.Search in Google Scholar
Bod, Rens. 2010. Probabilistic linguistics. In Bernd Heine & Heiko Narrog (eds.), The Oxford handbook of linguistic analysis, 633–662. Oxford: Oxford University Press.Search in Google Scholar
Bresnan, Joan & Jessica Spencer. 2013. Frequency and variation in English subject-verb contraction. Stanford, CA: Stanford University Department of Linguistics and Center for the Study of Language and Information.Search in Google Scholar
Brezina, Vaclav, Tony McEnery & Stephen Wattam. 2015. Collocations in context. A new perspective on collocational networks. International Journal of Corpus Linguistics 20(2). 139–173.10.1075/ijcl.20.2.01breSearch in Google Scholar
Bybee, Joan. 1998. The emergent lexicon. Chicago Linguistics Society 34: The Panels. 421–435.10.1093/acprof:oso/9780195301571.003.0013Search in Google Scholar
Bybee, Joan. 2002. Phonological evidence for the exemplar storage of multiword sequences. Studies in Second Language Acquisition 24(2). 215–221.10.1017/S0272263102002061Search in Google Scholar
Bybee, Joan. 2006. From usage to grammar: The mind’s response to repetition. Language 82(4). 711–733.10.1353/lan.2006.0186Search in Google Scholar
Bybee, Joan. 2007a. Frequency of use and the organization of language. Oxford: Oxford University Press.10.1093/acprof:oso/9780195301571.001.0001Search in Google Scholar
Bybee, Joan. 2007b. Sequentiality as the basis of constituent structure. In Joan Bybee (ed.), Frequency of use and the organisation of language, 313–335. Oxford: Oxford University Press. (Reprinted from Talmy Givón & Bertram F. Malle (eds.), The evolution of language out of pre-language. Amsterdam: John Benjamins. 2002. 107–132.).10.1075/tsl.53.07bybSearch in Google Scholar
Bybee, Joan. 2010. Language, usage, and cognition. Cambridge: Cambridge University Press.10.1017/CBO9780511750526Search in Google Scholar
Bybee, Joan & James L. McClelland. 2005. Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review 22. 381–410.10.1515/tlir.2005.22.2-4.381Search in Google Scholar
Bybee, Joan & Joanne Scheibman. 2007. The effect of usage on degrees of constituency. The reduction of don’t in English. In Joan Bybee (ed.), Frequency of use and the organisation of language, 294–312. Oxford: Oxford University Press. (Reprinted from Linguistics 37(4). 1999. 575–596.).10.1093/acprof:oso/9780195301571.001.0001Search in Google Scholar
Calhoun, Sasha, Jean Carletta, Jason Brenier, Neil Mayo, Daniel Jurafsky, Mark Steedman & David Beaver. 2010. The NXT-format switchboard corpus: A rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Language Resources and Evaluation Journal 44. 387–419.10.1007/s10579-010-9120-1Search in Google Scholar
Clark, Herbert H. & Jean E. Fox Tree. 2002. Using uh and um in spontaneous speaking. Cognition 84. 73–110.10.1016/S0010-0277(02)00017-3Search in Google Scholar
Croft, William. 2001. Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press.10.1093/acprof:oso/9780198299554.001.0001Search in Google Scholar
Daudaravičius, Vidas & Marcinkevičienė. Rūta. 2004. Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics 9(2). 321–348.10.1075/ijcl.9.2.08dauSearch in Google Scholar
Eikmeyer, Hans-Jürgen, Ulrich Schade, Marc Kupietz & Uwe Laubenstein. 1999. A connectionist view of language production. In Rolf Klabunde & Christiane Von Stutterheim (eds.), Representations and processes in language production, 205–236. Wiesbaden: Deutscher Universitätsverlag.10.1007/978-3-322-99290-1_8Search in Google Scholar
Ellis, Nick C. 2006. Language acquisition as rational contingency learning. Applied Linguistics 27(1). 1–24.10.1093/applin/ami038Search in Google Scholar
Ellis, Nick C. & Fernando Ferreira-Junior. 2009. Constructions and their acquisition. Islands and the distinctiveness of the occupancy. Annual Review of Cognitive Linguistics 7. 187–220.10.1075/arcl.7.08ellSearch in Google Scholar
Ellis, Nick C., Rita Simpson-Vlach & Carson Maynard. 2008. Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics and TESOL. TESOL Quarterly 24(3). 375–396.10.1002/j.1545-7249.2008.tb00137.xSearch in Google Scholar
Elman, Jeffrey L. 1990. Finding structure in time. Cognitive Science 14. 179–211.10.4324/9781315784779-11Search in Google Scholar
Evert, Stefan. 2004. The statistics of co-occurrences: Word pairs and collocations. Stuttgart: Institut für maschinelle Sprachverarbeitung, University of Stuttgart dissertation.Search in Google Scholar
Fillmore, Charles J., Paul Kay & Mary Catherine O’Connor. 2003. Regularity and idiomaticity in grammatical constructions: The case of let alone. In Michael Tomasello (ed.), The new psychology of language: Cognitive and functional approaches to language structure, 243–270. Mahwah, NJ: Lawrence Erlbaum.Search in Google Scholar
Fried, Mirjam & Östman. Jan-Ola. 2004. Construction grammar: A thumbnail sketch. In Mirjam Fried & Jan-Ola Östman (eds.), Construction grammar in a cross-language perspective, 11–86. Amsterdam/Philadelphia: John Benjamins.10.1075/cal.2.02friSearch in Google Scholar
Frisson, Steven, Keith Rayner & Martin J. Pickering. 2005. Effects of contextual predictability and transitional probability on eye movements during reading. Journal of Experimental Psychology: Learning, Memory and Cognition 31(5). 862–877.10.1037/0278-7393.31.5.862Search in Google Scholar
Fung, Loretta & Ronald Carter. 2007. Discourse markers and spoken English: Native and learner use in pedagogic settings. Applied Linguistics 28(3). 410–439.10.1093/applin/amm030Search in Google Scholar
Godfrey, John J., Edward Holliman & McDaniel. Jane 1992. SWITCHBOARD: Telephone speech corpus for research and development. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1992 1. I-517–I-20.10.1109/ICASSP.1992.225858Search in Google Scholar
Goldberg, Adele. 2005. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Search in Google Scholar
Goldman-Eisler, Frieda. 1968. Psycholinguistics: Experiments in spontaneous speech. New York: Academic Press.Search in Google Scholar
Gregory, Michelle L., William D. Raymond, Alan Bell, Eric Fosler-Lussier & Daniel Jurafsky. 1999. The effects of collocational strength and contextual predictability in lexical production. Communication and Linguistic Studies 35. 151–166.Search in Google Scholar
Gries, Stefan Th. 2013. 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics 18(1). 137–165.10.1075/bct.74.07griSearch in Google Scholar
Gries, Stefan Th. 2014. Coll.analysis 3.5. A script for R to compute perform collostructional analyses. http://www.linguistics.ucsb.edu/faculty/stgries/teaching/groningen/index.html.Search in Google Scholar
Gries, Stefan Th. 2015a. More (old and new) misunderstandings of collostruction analysis: On Schmidt & Küchenhoff (2013). Cognitive Linguistics 26(3). 505–536.10.1515/cog-2014-0092Search in Google Scholar
Gries, Stefan Th. 2015b. The role of quantitative methods in cognitive linguistics. In Jocelyne Daems, Eline Zenner, Kris Heylen, Dirk Speelman & Hubert Cuyckens (eds.), Change of paradigms – New paradoxes. Recontextualizing language and linguistics. Berlin/Boston: De Gruyter Mouton.Search in Google Scholar
Gries, Stefan Th. & Joybrato Mukherjee. 2010. Lexical gravity across varieties of English: An ICE-based study of n-Grams in Asian Englishes. International Journal of Corpus Linguistics 15(4). 520–548.10.1075/ijcl.15.4.04griSearch in Google Scholar
Hothorn, Torsten, Kurt Hornik & Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3). 651–674.10.1198/106186006X133933Search in Google Scholar
Jenkins, Herbert M. & William C. Ward. 1965. Judgement of contingency between responses and outcomes. Psychological Monographs 79(1). 1–17.10.1037/h0093874Search in Google Scholar
Jucker, Andreas. 1993. The discourse marker well: A relevance-theoretical account. Journal of Pragmatics 19. 435–452.10.1016/0378-2166(93)90004-9Search in Google Scholar
Jurafsky, Daniel, Alan Bell, Eric Fosler-Lussier, Cynthia Girand & William D. Raymond. 1998. Reduction of English function words in Switchboard. Proceedings of the International Conference of Spoken Language Processing, Sydney. 1–4.10.21437/ICSLP.1998-801Search in Google Scholar
Jurafsky, Daniel & James H. Martin. 2008. Speech and language processing. An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall International.Search in Google Scholar
Kapatsinski, Vsevolod M. 2005. Measuring the relationship of structure to use: Determinants of the extent of recycle in repetition repair. Berkeley Linguistics Society 30. 481–492.10.3765/bls.v30i1.949Search in Google Scholar
Kapatsinski, Vsevolod M. & Joshua Radicke. 2009. Frequency and the emergence of prefabs: Evidence from monitoring. In Roberta Corrigan, Edith A. Moravcsik, Hamid Ouali & Kathleen M. Wheatley (eds.), Formulaic language. Vol. 2: Acquisition, loss, psychological reality, functional explanations, 499–520. Amsterdam/Philadelphia: John Benjamins.10.1075/tsl.83.14kapSearch in Google Scholar
Langacker, Ronald W. 2000. A dynamic usage-based model. In Suzanne Kemmer & Michael Barlow (eds.), Usage-based models of language, 1–63. Stanford, CA: CSLI Publications.Search in Google Scholar
Levey, Stephen. 2006. The sociolinguistic distribution of discourse marker like in preadolescent speech. Multilingua 25. 413–441.10.1515/MULTI.2006.022Search in Google Scholar
Maclay, Howard & Charles E. Osgood. 1959. Hesitation phenomena in spontaneous English speech. Word 15. 19–44.10.1080/00437956.1959.11659682Search in Google Scholar
Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.Search in Google Scholar
Müller, Simone. 2005. Discourse markers in native and non-native English discourse. Amsterdam/Philadelphia: John Benjamins.10.1075/pbns.138Search in Google Scholar
NXT Switchboard Corpus Public Release. 2008. Philadelphia: Linguistic Data Consortium. Catalog #LDC2009T26.Search in Google Scholar
Oakes, Michael. 1998. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.Search in Google Scholar
Onnis, Luca & Eric Thiessen. 2013. Language experience changes subsequent learning. Cognition 162(2). 168–284.10.1016/j.cognition.2012.10.008Search in Google Scholar
Pecina, Pavel. 2010. Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1/2). 137–158.10.1007/s10579-009-9101-4Search in Google Scholar
Perruchet, Pierre & Sebastien Pacton. 2006. Implicit learning and statistical learning: One phenomenon, two approaches. TRENDS in Cognitive Sciences 10(5). 233–238.10.1016/j.tics.2006.03.006Search in Google Scholar
Phillips, Martin K. 1983. Lexical macrostructure in science text. Birmingham: University of Birmingham dissertation.Search in Google Scholar
R Development Core Team. 2009. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project.org.Search in Google Scholar
Reali, Florencia & Morten H. Christiansen. 2007. Processing of relative clauses is made easier by frequency of occurrence. Journal of Memory and Language 57. 1–23.10.1016/j.jml.2006.08.014Search in Google Scholar
Rescorla, Robert A. 1968. Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative Physiological Psychology 66. 1–5.10.1037/h0025984Search in Google Scholar
Rumelhart, David E. & James L. McClelland (eds.). 1986. Parallel distributed processing: Explorations in the microstructure of cognition. Foundations, vol. 1. Cambridge, MA/London: MIT Press/Bradford.10.7551/mitpress/5236.001.0001Search in Google Scholar
Schmid, Hans-Jörg & Küchenhoff. Helmut. 2013. Collostructional analysis and other ways of measuring lexicogrammatical attraction: Theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics 24(3). 531–577.10.1515/cog-2013-0018Search in Google Scholar
Schneider, Ulrike. 2014. Frequency, chunks and hesitations. A usage-based analysis of chunking in English. Freiburg: NIHIN Studies. https://freidok.uni-freiburg.de/data/9793Search in Google Scholar
Schneider, Ulrike. 2016. Chunking as a factor determining the placement of hesitations. A corpus-based study of spoken English. In Heike Behrens & Stefan Pfänder (eds.), Frequency effects in language: What counts in language processing, acquisition and change, 61–89. Berlin/New York: Mouton De Gruyter.10.1515/9783110346916-004Search in Google Scholar
Shanks, David R. 1995. The psychology of associative learning. Cambridge: Cambridge University Press.10.1017/CBO9780511623288Search in Google Scholar
Shriberg, Elizabeth & Andreas Stolcke. 1996. Word predictability after hesitations: A corpus-based study. Proceedings of the International Conference on Spoken Language Processing. 1868–1871.10.1109/ICSLP.1996.607996Search in Google Scholar
Strobl, Carolin, Anne-Laure Boulestreix, Thomas Kneib, Thomas Augustin & Achim Zeileis. 2008. Conditional variable importance for random forests. BMC Bioinformatics 9. 307.10.1186/1471-2105-9-307Search in Google Scholar
Strobl, Carolin, Anne-Laure Boulestreix, Achim Zeileis & Torsten Hothorn. 2007. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8. 25.10.1186/1471-2105-8-25Search in Google Scholar
Strobl, Carolin, James Malley & Gerhard Tutz. 2009. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods 14(4). 323–348.10.1037/a0016973Search in Google Scholar
Tagliamonte, Sali A. & R. Harald Baayen. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24. 135–178.10.1017/S0954394512000129Search in Google Scholar
Tily, Harry, Susanne Gahl, Inbal Arnon, Neal Snider, Anubha Kothari & Joan Bresnan. 2009. Syntactic probabilities affect pronunciation variation in spontaneous speech. Language and Cognition 1(2). 147–165.10.1515/LANGCOG.2009.008Search in Google Scholar
Vogel Sosa, Anna & James MacFarlane. 2002. Evidence for frequency-based constituents in the mental lexicon: Collocations involving the word. Journal of Brain and Language 83. 227–236.10.1016/S0093-934X(02)00032-9Search in Google Scholar
Wahl, Alexander. 2015. Intonation unit boundaries and the storage of bigrams. Evidence from bidirectional and directional association measures. Review of Cognitive Linguistics 13(1). 191–219.10.1075/rcl.13.1.08wahSearch in Google Scholar
Ward, William C. & Herbert M. Jenkins. 1965. The display of information and the judgement of contingency. Canadian Journal of Experimental Psychology 19(3). 231–241.10.1037/h0082908Search in Google Scholar
Wiechmann, Daniel. 2008. On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory 4(2). 253–290.10.1515/CLLT.2008.011Search in Google Scholar
Wray, Alison. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.10.1017/CBO9780511519772Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston