Investigating genre distinctions through discourse distance and discourse network

Kun Sun; Rong Wang; Wenxin Xiong

doi:10.1515/cllt-2020-0064

Published by De Gruyter Mouton February 25, 2021

Investigating genre distinctions through discourse distance and discourse network

Kun Sun , Rong Wang and Wenxin Xiong

From the journal Corpus Linguistics and Linguistic Theory

https://doi.org/10.1515/cllt-2020-0064

Showing a limited preview of this publication:

Abstract

The notion of genre has been widely explored using quantitative methods from both lexical and syntactical perspectives. However, discourse structure has rarely been used to examine genre. Mostly concerned with the interrelation of discourse units, discourse structure can play a crucial role in genre analysis. Nevertheless, few quantitative studies have explored genre distinctions from a discourse structure perspective. Here, we use two English discourse corpora (RST-DT and GUM) to investigate discourse structure from a novel viewpoint. The RST-DT is divided into four small subcorpora distinguished according to genre, and another corpus (GUM) containing seven genres are used for cross-verification. An RST (rhetorical structure theory) tree is converted into dependency representations by taking information from RST annotations to calculate the discourse distance through a process similar to that used to calculate syntactic dependency distance. Moreover, the data on dependency representations deriving from the two corpora are readily convertible into network data. Afterwards, we examine different genres in the two corpora by combining discourse distance and discourse network. The two methods are mutually complementary in comprehensively revealing the distinctiveness of various genres. Accordingly, we propose an effective quantitative method for assessing genre differences using discourse distance and discourse network. This quantitative study can help us better understand the nature of genre.

Keywords: dependency representations; discourse network; genre differences; linear distance; RST relation

Corresponding author: Kun Sun, Department of Linguistics, University of Tübingen, Tübingen, Germany, E-mail: kun.sun@uni-tuebigen.de

Funding source: European Research Council

Award Identifier / Grant number: 742545

Funding source: Important Humanities and Social Science Research Project of Zhejiang Higher Education

Award Identifier / Grant number: 2018QN071

Funding source: Beijing Municipal Natural Science Foundation

Award Identifier / Grant number: 16YYB018

Acknowledgments

We would like to thank the three anonymous reviewers (particularly the first reviewer) for their insightful and constructive comments on the paper. We also express our sincere gratitude to the Editor-in-Chief for her great helps and generosity in improving this paper. The first author thanks his little son for his cooperation during this difficult time.

Research funding: This work was supported by the ERC (European Research Council) advanced grant (No. 742545). “ The second and third authors were funded by “Important Humanities and Social Science Research Project of Zhejiang Higher Education (Fund No. 2018QN071)” and “Beijing Municipal Natural Science Foundation (Fund No.16YYB018 )” respectively.”

References

Asher, Nicholas & Alex Lascarides. 2003. Logics of conversation. Cambridge: Cambridge University Press.Search in Google Scholar

Barabási Albert-László. 2016. Network science. Cambridge: Cambridge University Press.Search in Google Scholar

Bax, Stephen. 2010. Discourse and genre: Using language in context. London: Palgrave Macmillan.Search in Google Scholar

Beliankou, Andrei, Reinhard Köhler & Sven Naumann. 2012. Quantitative properties of argumentation motifs. In Methods and applications of quantitative linguistics, selected papers of the 8th international conference on quantitative linguistics, 35–43. Belgrade: University of Belgrade.Search in Google Scholar

Berzlánovich, Ildikó & Gisela Redeker. 2012. Genre-dependent interaction of coherence and lexical cohesion in written discourse. Corpus Linguistics and Linguistic Theory 8(1). 183–208. https://doi.org/10.1515/cllt-2012-0008.Search in Google Scholar

Biber, Douglas & Susan Conrad. 2019. Register, genre, and style. Cambridge: Cambridge University Press.10.1017/9781108686136Search in Google Scholar

Bürkner, Paul-Christian. 2017. brms: An r package for bayesian multilevel models using stan. Journal of Statistical Software 80(1). 1–28. https://doi.org/10.18637/jss.v080.i01.Search in Google Scholar

Carlson, Lynn & Daniel Marcu. 2001. Discourse tagging reference manual. Technical Report ISI-TR-545. University of Southern California Information Sciences Institute.Search in Google Scholar

Carlson, Lynn, Daniel Marcu & Mary E. Okurowski. 2002. RST discourse treebank (RST-DT). LDC2002T07. Philadelphia: Linguistic Data Consortium.Search in Google Scholar

Cong, Jin & Haitao Liu. 2014. Approaching human language with complex networks. Physics of Life Reviews 11(4). 598–618. https://doi.org/10.1016/j.plrev.2014.04.004.Search in Google Scholar

Csardi, Gabor & Tamas Nepusz. 2006. The igraph software package for complex network research. InterJournal, Complex Systems 1695(5). 1–9.Search in Google Scholar

Das, Debopam & Maite Taboada. 2018. Signalling of coherence relations in discourse, beyond discourse markers. Discourse Processes 55(8). 743–770. https://doi.org/10.1080/0163853x.2017.1379327.Search in Google Scholar

Eder, Maciej, Rybicki Jan & Mike Kestemont. 2016. Stylometry with R: A package for computational text analysis. R Journal 8(1). 107–121. https://doi.org/10.32614/rj-2016-007.Search in Google Scholar

Ferrer-i-Cancho, Ramon. 2004. Euclidean distance between syntactically linked words. Physical Review E 70(5). 056135.10.1103/PhysRevE.70.056135Search in Google Scholar

Ferstl, Evelyn E., Jane Neumann, Carsten Bogler & D. Yves von Cramon. 2008. The extended language network: a meta-analysis of neuroimaging studies on text comprehension. Human Brain Mapping 29(5). 581–593. https://doi.org/10.1002/hbm.20422.Search in Google Scholar

Fludernik, Monika. 2000. Genres, text types, or discourse modes? Narrative modalities and generic categorization. Style 34(2). 274–292.Search in Google Scholar

Futrell, Richard, Kyle Mahowald & Edward Gibson. 2015. Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences 112(33). 10336–10341. https://doi.org/10.1073/pnas.1502134112.Search in Google Scholar

Gelman, Andrew. 2005. Analysis of variance—why it is more important than ever. The Annals of Statistics 33(1). 1–53. https://doi.org/10.1214/009053604000001048.Search in Google Scholar

Gelman, Andrew, Ben Goodrich, Jonah Gabry & Vehtari Aki. 2019. R-squared for Bayesian regression models. The American Statistician 73(3). 307–309. https://doi.org/10.1080/00031305.2018.1549100.Search in Google Scholar

Gerani, Shima, Giuseppe Carenini & Raymond T. Ng. 2019. Modeling content and structure for abstractive review summarization. Computer Speech & Language 53. 302–331. https://doi.org/10.1016/j.csl.2016.06.005.Search in Google Scholar

Gerani, Shima, M. Yashar Mehdad, Giuseppe Carenini, Raymond T. Ng & Bita Nejat. 2014. Abstractive summarization of product reviews using discourse structure. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1602–1613. Doha, Qatar: Association for Computational Linguistics.10.3115/v1/D14-1168Search in Google Scholar

Gibson, Edward. 1998. Linguistic complexity: Locality of syntactic dependencies. Cognition 68(1). 1–76. https://doi.org/10.1016/s0010-0277(98)00034-1.Search in Google Scholar

Givón, Thomas & Masayoshi Shibatani. 2009. Syntactic complexity: Diachrony, acquisition, neurocognition, evolution. Amsterdam: John Benjamins.10.1075/tsl.85Search in Google Scholar

Gruber, Helmut & Peter Muntigl. 2005. Generic and rhetorical structures of texts: Two sides of the same coin? Folia Linguistica 39(1–2). 75–113. https://doi.org/10.1515/flin.2005.39.1-2.75.Search in Google Scholar

Hayashi, Katsuhiko, Tsutomu Hirao & Masaaki Nagata. 2016. Empirical comparison of dependency conversions for rst discourse trees. In Proceedings of the 17th annual meeting of the special interest group on discourse and dialogue, 128–136. Los Angeles: Association for Computational Linguistics.10.18653/v1/W16-3616Search in Google Scholar

Hirao, Tsutomu, Yasuhisa Yoshida, Masaaki Nishino, Norihito Yasuda & Masaaki Nagata. 2013. Single-document summarization as a tree knapsack problem. In Proceedings of the 2013 conference on empirical methods in natural language processing, 1515–1520. Seattle, USA: Association for Computational Linguistics.Search in Google Scholar

Housen, Alex, Bastien De Clercq, Folkert Kuiken & Ineke Vedder. 2019. Multiple approaches to complexity in second language research. Second Language Research 35(1). 3–21. https://doi.org/10.1177/0267658318809765.Search in Google Scholar

Hudson, Richard. 2007. Language networks: The new word grammar. Oxford: Oxford University Press.Search in Google Scholar

Hyland, Ken. 2012. Genre and discourse analysis in language for specific purposes. In Carol Chapelle (ed.), The encyclopedia of applied linguistics. Oxford: Wiley-Blackwell.10.1002/9781405198431.wbeal0452Search in Google Scholar

Iruskieta, Mikel, Iria da Cunha & Maite Taboada. 2015. A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora. Language Resources and Evaluation 49(2). 263–309. https://doi.org/10.1007/s10579-014-9271-6.Search in Google Scholar

Juzwiak, Chris. 2009. Stepping stones: a guided approach to writing sentences and paragraphs. Boston: Bedford/St. Martins.Search in Google Scholar

Kolaczyk, Eric D. & Gábor Csárdi. 2014. Statistical analysis of network data with R. Heidelberg: Springer.10.1007/978-1-4939-0983-4Search in Google Scholar

Kolodzy, Janet. 2006. Convergence journalism: Writing and reporting across the news media. Lanham, Maryland: Rowman & Littlefield.Search in Google Scholar

Lee, David Y. W. 2001. Genres, registers, text types, domain, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology 5(3). 37–72.Search in Google Scholar

Li, Sujian, Liang Wang, Ziqiang Cao & Wenjie Li. 2014. Text-level discourse dependency parsing. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, 25–35. Baltimore, Maryland: Association for Computational Linguistics.10.3115/v1/P14-1003Search in Google Scholar

Liu, Haitao. 2008. Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science 9(2). 159–191.10.17791/jcs.2008.9.2.159Search in Google Scholar

Liu, Haitao, Chunshan Xu & Junying Liang. 2017. Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Reviews 21. 171–193. https://doi.org/10.1016/j.plrev.2017.03.002.Search in Google Scholar

Mann, William C. & Sandra A. Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3). 243–281. https://doi.org/10.1515/text.1.1988.8.3.243.Search in Google Scholar

Mehler, Alexander, Andy Lücking, Sven Banisch, Philippe Blanchard & Barbara Job. 2016. Towards a theoretical framework for analyzing complex linguistic networks. Heidelberg: Springer.10.1007/978-3-662-47238-5Search in Google Scholar

Morey, Mathieu, Philippe Muller & Nicholas Asher. 2018. A dependency perspective on rst discourse parsing and evaluation. Computational Linguistics 44(2). 198–235. https://doi.org/10.1162/coli_a_00314.Search in Google Scholar

Newman, Mark. 2018. Networks. New York: Oxford University Press.Search in Google Scholar

Nuzzo, Regina. 2014. Statistical errors: P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume. Nature 506(7487). 150–153. https://doi.org/10.1038/506150a.Search in Google Scholar

Palmer, Alexis & Annemarie Friedrich. 2014. Genre distinctions and discourse modes: Text types differ in their situation type distributions. In Workshop on frontiers and connections between argumentation theory and natural language processing. Italy: Forlì-Cesena, July 21–25.Search in Google Scholar

Phillips, Collin, Nina Kazanina, & Shani H. Abada. 2005. ERP effects of the processing of syntactic long-distance dependencies. Cognitive Brain Research 22(3). 407–428. https://doi.org/10.1016/j.cogbrainres.2004.09.012.Search in Google Scholar

Pons, Pascal & Matthieu Latapy. 2005. Computing communities in large networks using random walks. In Pinar Yolum, Tunga Güngör, Fikret Gürgen & Can Özturan (eds.), Computer and information sciences – ISCIS 2005, 284–293. Heidelberg: Springer.10.1007/11569596_31Search in Google Scholar

Sagae, Kenji. 2009. Analysis of discourse structure with syntactic dependencies and data driven shift-reduce parsing. In Proceedings of the 11th international conference on parsing technologies, 81–84. Paris: Association for Computational Linguistics.10.3115/1697236.1697253Search in Google Scholar

Sanders, Ted & Carel van Wijk. 1996. Pisa—A procedure for analyzing the structure of explanatory texts. Text 16(1). 91–132. https://doi.org/10.1515/text.1.1996.16.1.91.Search in Google Scholar

Sanders, Ted J., Demberg Vera, Jet Hoek, Merel C. J. Scholman, Fatemeh Torabi Asr, Sandrine Zufferey & Jacqueline Evers-Vermeul. 2018. Unifying dimensions in coherence relations: How various annotation frameworks are related. Corpus Linguistics and Linguistic Theory. https://doi.org/10.1515/cllt-2016-0078.Search in Google Scholar

Siew, Cynthia S., Dirk U. Wulff, Nicole M. Beckage & Yoed N. Kenett. 2019. Cognitive network science: A review of research on cognition through the lens of network representations, processes, and dynamics. Complexity 2019. 24. https://doi.org/10.1155/2019/2108423.Search in Google Scholar

Smith, Carlota S. 2003. Modes of discourse: The local structure of texts. Cambridge: Cambridge University Press.10.1017/CBO9780511615108Search in Google Scholar

Stede, Manfred, Stergos Afantenos, Andreas Peldszus, Nicholas Asher & Jérémy Perret. 2016. Parallel discourse annotations on a corpus of short texts. In Proceedings of the tenth international conference on Language Resources and Evaluation (LREC’16), 1051–1058. Portorož, Slovenia: European Language Resources Association.Search in Google Scholar

Sun, Kun & Wenxin Xiong. 2019. A computational model for measuring discourse complexity. Discourse Studies 21(6). 690–712. https://doi.org/10.1177/1461445619866985.Search in Google Scholar

Sun, Kun & Lili Zhang. 2018. Quantitative aspects of PDTB-style discourse relations across languages. Journal of Quantitative Linguistics 25(4). 342–371.10.1080/09296174.2017.1390934Search in Google Scholar

Swales, John. 1990. Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press.Search in Google Scholar

Taboada, Maite & Julia Lavid. 2003. Rhetorical and thematic patterns in scheduling dialogues: A generic characterization. Functions of Language 10(2). 147–178. https://doi.org/10.1075/fol.10.2.02tab.Search in Google Scholar

Taboada, Maite & William C. Mann. 2006. Rhetorical structure theory: Looking back and moving ahead. Discourse Studies 8(3). 423–459. https://doi.org/10.1177/1461445606061881.Search in Google Scholar

Temperley, David. 2007. Minimization of dependency length in written English. Cognition 105(2). 300–333. https://doi.org/10.1016/j.cognition.2006.09.011.Search in Google Scholar

Upton, Thomas A. 2002. Understanding direct mail letters as a genre. International Journal of Corpus Linguistics 7(1). 65–85. https://doi.org/10.1075/ijcl.7.1.04upt.Search in Google Scholar

Van Dijk, Teun A. 1985. Structures of news in the press. In Teun A. van Dijk (ed.), Discourse and communication: New approaches to the analysis of mass media discourse and communication, 69–93. Berlin: De Gruyter.10.1515/9783110852141Search in Google Scholar

Van Dijk, Teun A. 2019. Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. London: Routledge.10.4324/9780429025532Search in Google Scholar

Wang, Yaqin & Haitao Liu. 2017. The effects of genre on dependency distance and dependency direction. Language Sciences 59. 135–147. https://doi.org/10.1016/j.langsci.2016.09.006.Search in Google Scholar

Webber, Bonnie. 2009. Genre distinctions for discourse in the Penn treebank. In Proceedings of the joint conference of the 47th annual meeting of the ACL, 674–682. Singapore: Association for Computational Linguistics.10.3115/1690219.1690240Search in Google Scholar

Williams, Sandra & Ehud Reiter. 2003. A corpus analysis of discourse relations for natural language generation. In Proceedings of corpus linguistics, 28–31. U.K.: Lancaster University.Search in Google Scholar

Yang, Zhao, René Algesheimer & Tessone J Claudio. 2016. A comparative analysis of community detection algorithms on artificial networks. Scientific Reports 6. 30750. https://doi.org/10.1038/srep30750.Search in Google Scholar

Zeldes, Amir. 2016. rstWeb – A browser-based annotation interface for rhetorical structure theory and discourse relations. In Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics, 1–5. San Diego, CA: Association for Computational Linguistics.10.18653/v1/N16-3001Search in Google Scholar

Zeldes, Amir. 2017. The gum corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation 51(3). 581–612. https://doi.org/10.1007/s10579-016-9343-x.Search in Google Scholar

Zeldes, Amir. 2018. Multilayer corpus studies. London: Routledge.10.4324/9781315112473Search in Google Scholar

Zhang, Hongxin & Haitao Liu. 2016. Rhetorical relations revisited across distinct levels of discourse unit granularity. Discourse Studies 18(4). 454–472. https://doi.org/10.1177/1461445616647891.Search in Google Scholar

Zinsser, William. 2006. On writing well: The classic guide to writing nonfiction. New York, NY: HarperCollins.Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/cllt-2020-0064).

Received: 2020-02-12

Accepted: 2021-02-05

Published Online: 2021-02-25

Investigating genre distinctions through discourse distance and discourse network

Abstract

Acknowledgments

References

Supplementary Material

Journal and Issue

Articles in the same Issue