Abstract
The correspondence between the communicative intention of a speaker in terms of Information Structure and the way this speaker reflects communicative aspects by means of prosody have been a fruitful field of study in Linguistics. However, text-to-speech applications still lack the variability and richness found in human speech in terms of how humans display their communication skills. Some attempts were made in the past to model one aspect of Information Structure, namely thematicity for its application to intonation generation in text-to-speech technologies. Yet, these applications suffer from two limitations: (i) they draw upon a small number of made-up simple question-answer pairs rather than on real (spoken or written) corpus material; and (ii) they do not explore whether any other interpretation would better suit a wider range of textual genres beyond dialogs. In this paper, two different interpretations of thematicity in the field of speech technologies are examined: the state-of-art binary (and flat) theme-rheme, and the hierarchical thematicity defined by Igor Mel’čuk within the Meaning-Text Theory. The outcome of the experiments on a corpus of native speakers of US English suggests that the latter interpretation of thematicity has a versatile implementation potential for text-to-speech applications of the Information Structure–prosody interface.
Funding source: European Commission
Award Identifier / Grant number: H2020-645012-RIA, H2020-870930-IA
Funding source: Agencia Estatal de Investigación (AEI)
Award Identifier / Grant number: RYC-2015-17239 (AEI/FSE, UE)
Funding source: Ministerio de Ciencia, Innovación y Universidades
Award Identifier / Grant number: RYC-2015-17239 (AEI/FSE, UE)
Funding source: Fondo Social Europeo (FSE)
Award Identifier / Grant number: RYC-2015-17239 (AEI/FSE, UE)
-
Research funding: This work was partially funded by the European Commission in the context of its H2020 Programme under the contract numbers H2020-645012-RIA (KRISTINA) and H2020-870930-IA (WELCOME). The second author was funded by the Agencia Estatal de Investigación (AEI), Ministerio de Ciencia, Innovación y Universidades and the Fondo Social Europeo (FSE), grant RYC-2015-17239 (AEI/FSE, UE).
References
Ballesteros, Miguel, Bernd Bohnet, Simone Mille & Leo Wanner. 2015. Data-driven sentence generation with non-isomorphic trees. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies (NAACL–HLT). Association for Computational Linguistics, Denver, Colorado.10.3115/v1/N15-1042Search in Google Scholar
Baumann, Stefan. 2012. The intonation of givenness: Evidence from German. Tübingen: Max Niemeyer Verlag.Search in Google Scholar
Beckman, Mary E. & Janet Pierrehumbert. 1986. Intonational Structure in Japanese and English. Phonology Yearbook 3. 255–310.10.1017/S095267570000066XSearch in Google Scholar
Black, Alan W. & Paul A. Taylor. 1997. The festival speech synthesis system: System documentation. UK: Human Communciation Research Centre, University of Edinburgh Scotland. Technical Report HCRC/TR-83.Search in Google Scholar
Boersma, Paul. 2001. Praat, a system for doing phonetics by computer. Glot International 5. 341–345.Search in Google Scholar
Bohnet, Bernd, Alicia Burga & Leo Wanner. 2013. Towards the annotation of penn treebank with information structure. In Proceedings of the sixth international joint conference on natural language processing. Association for Computational Linguistics, Nagoya, Japan.Search in Google Scholar
Bouayad-Agha, Nadjet, Gerard Casamayor, Simone Mille & Leo Wanner. 2012. Perspective-oriented generation of football match summaries: Old tasks, new challenges. ACM Transactions on Speech and Language Processing 9. 1–31.10.1145/2287710.2287711Search in Google Scholar
Brown, Gillian. 1983. Prosodic structure and the given/new distinction. In Anne Cutler & D. Robert Ladd (eds.), Prosody: Models and measurements, 67–77. Berlin, Heidelberg: Springer.10.1007/978-3-642-69103-4_6Search in Google Scholar
Büring, Daniel. 2003. On D-trees, beans, and B-accents. Linguistics and Philosophy 26. 511–545.10.1023/A:1025887707652Search in Google Scholar
Calhoun, Sasha. 2010. The centrality of metrical structure in signalling information structure: A probabilistic perspective. Language 1. 1–42.10.1353/lan.0.0197Search in Google Scholar
Campbell, Nick & Parham Mokhtari. 2003. Voice quality: The 4th prosodic dimension. In Proceedings of the 15th international congress of phonetic sciences (ICSPhS). The 15th ICPhS Organizing Committee: Causal Productions Pty Ltd. Barcelona, Spain.Search in Google Scholar
Chafe, Wallace L. 1976. Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In Charles N. Li (ed.), Subject and topic, 25–55. New York: Academic Press.Search in Google Scholar
Charniak, Eugene, Don Blaheta, Niyu Ge, Keith Hall, John Hale & Mark Johnson. 2000. BLLIP 1987-89 WSJ Corpus Release 1 LDC2000T43. Available at: https://www.cis.upenn.edu/∼treebank/.Search in Google Scholar
Chomsky, Noam. 1995. The Minimalist program. Cambridge, MA: MIT Press.Search in Google Scholar
Clark, Herbert H. & Susane E. Haviland. 1977. Comprehension and the given-new contract. In Roy O. Freedle (ed.), Discourse production and comprehension. Discourse processes: Advances in research and theory, 1, 1–40. Norwood, New Jersey: Ablex Publishing Corporation.Search in Google Scholar
Daneš, Frantisek. 1970. One instance of Prague School methodology: Functional analysis of utterance and text. In Paul L. Garvin (ed.), Method and theory in linguistics. Janua Linguarum. Series Maior, 40, 132–146. Berlin, Germany: De Gruyter Mouton.10.1515/9783110872521.132Search in Google Scholar
Domínguez, Monica, Alicia Burga, Mireia Farrús & Leo Wanner. 2018. Towards expressive prosody generation in TTS for reading aloud applications. In Proceedings of IberSpeech 2018. International Speech Communication Association (ISCA). Barcelona, Spain.10.21437/IberSPEECH.2018-9Search in Google Scholar
Domínguez, Monica, Ivan Latorre, Mireia Farrús, Joan Codina & Leo Wanner. 2016. Praat on the web: An upgrade of praat for semi-automatic speech annotation. In Proceedings of the 26th international conference on computational linguistics: System demonstrations. The COLING 2016 Organizing Committee. Japan: Osaka.Search in Google Scholar
Domínguez, Monica, Mireia Farrús & Leo Wanner. 2017. A thematicity-based prosody enrichment tool for CTS. In Proceedings of interspeech: Show and tell demonstrations. Stockholm, Sweden: International Speech Communication Association (ISCA).10.21437/SpeechProsody.2018-119Search in Google Scholar
Erteschik-Shir, Nomi. 2007. Information structure: The syntax-discourse interface. Oxford, United Kingdom: Oxford University Press.Search in Google Scholar
Grabe, Esther, Francis Nolan & FarrarKimberley. 1998. IViE – A comparative transcription system for intonational variation in English. In Proceedings of the international conference on spoken language processing (ICSLP). Sydney, Australia: Australian Speech Science and Technology Association, Incorporated (ASSTA).10.21437/ICSLP.1998-583Search in Google Scholar
Haji-Abdolhosseini, Mohammad. 2003. A constraint-based approach to information structure and prosody correspondence. In Proceedings of the 10th international conference on head-driven phrase structure grammar. Michigan State University: CSLI Publications, East Lansing.10.21248/hpsg.2003.9Search in Google Scholar
Hajičova, Eva. 1987. Focussing—A meeting point of linguistics and artificial intelligence. In Proceedings of the 2nd international conference on artificial intelligence II: Methodology, systems, applications. Varna, Bulgaria: Noth-Holland.Search in Google Scholar
Hajičova, Eva, Barbara Partee & Petr Sgall. 1998. Topic-focus articulation, tripartite structures, and semantic content volume 71 of studies in linguistics and philosophy. Dordrecht, Netherlands: Springer Netherlands.10.1007/978-94-015-9012-9Search in Google Scholar
Hall, Mark, Eibe Frank, Geoffery Holmes, Bernhard Pfahringer, Peter Reutemann & Ian H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explorations 11(1). 10–18. https://doi.org/10.1145/1656274.1656278.Search in Google Scholar
Halliday, Michael. 1967. Notes on transitivity and theme in english: Parts 1–3. Journal of Linguistics 3. 199–244.10.1017/S0022226700001882Search in Google Scholar
Hedberg, Nancy & Juan Sosa. 2008. The prosody of topic and focus in spontaneous English dialogue. In Chungmin Lee, Matthew Gordon & Daniel Büring (eds.), Topic and focus. Studies in linguistics and philosophy, vol. 82. Dordrecht, Netherlands: Springer.10.1007/978-1-4020-4796-1_6Search in Google Scholar
Hirschberg, Julia. 2008. Pragmatics and intonation. In Laurence R. Horn & Gregory Ward (eds.), The handbook of pragmatics chapter 23, 515–537. Hoboken, New Jersey, USA: John Wiley & Sons, Ltd.10.1002/9780470756959.ch23Search in Google Scholar
Daniel Hirst & Albert Di-Cristo (eds.). 1998. Intonation systems: A survey of twenty languages. Cambridge, United Kingdom: Cambridge University Press.Search in Google Scholar
Izzad, Ramli, Seman Noraini, Ardi Norizah & Jamil Nursuriati. 2016. Rule-based storytelling text-to-speech (TTS) synthesis. In 3rd International conference on mechanics and mechatronics research (ICMMR). Volume 77 of MATEC web conferences. Chongqing, China: EDP Sciences.10.1051/matecconf/20167704003Search in Google Scholar
Kalbertodt, Janina, Beatrice Primus & Petra B. Schumacher. 2015. Punctuation, prosody, and discourse: Afterthought vs. right dislocation. Frontiers in Psychology 6. 1–12.10.3389/fpsyg.2015.01803Search in Google Scholar
Krifka, Manfred. 2008. Basic notions of information structure. Acta Linguistica Hungarica 55. 243–276.10.1556/ALing.55.2008.3-4.2Search in Google Scholar
Kruijff-Korbayová, Ivana, Stina Ericsson, Kepa Rodríguez, J. & ElenaKaragrjosova. 2003. Producing contextually appropriate intonation in an information-state based dialogue system. In Proceedings of the 10th conference of the European chapter of the association for computational linguistics (EACL). Association for Computational Linguistics. Budapest, Hungary.10.3115/1067807.1067838Search in Google Scholar
Kügler, Frank, Bernadett Smolibocki & Manfred Stede. 2012. Evaluation of information structure in speech synthesis: The case of product recommender systems perception. In ITG symposium on speech communication. IEEE Braunschweig, Germany.Search in Google Scholar
Ladd, Robert. 2008. Intonational phonology. Cambridge: Cambridge University Press.10.1017/CBO9780511808814Search in Google Scholar
Lambrecht, Knud. 1994. Information structure and sentence form: Topic, focus and the mental representations of discourse referents. Cambridge: Cambridge University Press.10.1017/CBO9780511620607Search in Google Scholar
Levelt, Willem. 1993. Speaking: From intention to articulation. Cambridge, MA: MIT Press.10.7551/mitpress/6393.001.0001Search in Google Scholar
Levitan, Rivka, Stefan Beňuš, Ramiro H. Gálvez, Agustin Gravano, Florencia Savoretti, Marian Trnka, Andreas Weise & Julia Hirschberg. 2016. Implementing acoustic-prosodic entrainment in a conversational avatar. In Proceedings of the annual conference of the international speech communication association (Interspeech). San Francisco, USA.10.21437/Interspeech.2016-985Search in Google Scholar
López-Mencía, Beatriz, David Díaz-Pardo, Alvaro Hernández-Trapote & Luis A. Hernández-Gómez. 2013. Embodied conversational agents in interactive applications for children with special educational needs. In David Griol Barres, Zoraida Callejas Carrión & Ramon L.-C. Delgado (eds.), Technologies for inclusive education: Beyond traditional integration approaches, 59–88. Hershey, USA: IGI Global.10.4018/978-1-4666-2530-3.ch004Search in Google Scholar
Mathesius, Vilem. 1929. Zur Satzperspektive im modernen Englisch. Archiv für das Studium der neueren Sprachen und Literaturen, 202–210. Berlin, Germany: Erich Schmidt Verlag. https://en.google-info.cn/21249545/1/archiv-fur-das-studium-der-neueren-sprachen-und-literaturen.html.Search in Google Scholar
Mel’čuk, Ignor A. 2001. Communicative organization in natural language: The semantic-communicative structure of sentences. Amsterdam, Philadephia: Benjamins.10.1075/slcs.57Search in Google Scholar
Meurers, Detmar, Ramon Ziai, Niels Ott & Janina Kopp. 2011. Evaluating answers to reading comprehension questions in context: Results for German and the role of information structure. In Proceedings of the TextInfer 2011 workshop on textual entailment TIWTE ’11. Association for Computational Linguistics, Stroudsburg, PA, USA.Search in Google Scholar
Ortiz, Amalia, Maria del Puy Carretero, David Oyarzun, Jose J. Yanguas, Cristina Buiza, M. Feli González & Igone Etxeberria. 2007. Elderly users in ambient intelligence: Does an avatar improve the interaction? In Constantine Stephanidis & Michael Pieper (eds.), Universal access in ambient intelligence environments: 9th ERCIM workshop on user interfaces for all, 99–114. Berlin, Heidelberg: Springer Berlin Heidelberg.10.1007/978-3-540-71025-7_8Search in Google Scholar
Pérez-Marín, Diana & Ismael Pascual-Nieto. 2013. An exploratory study on how children interact with pedagogic conversational agents. Behaviour & Information Technology 32. 955–964.10.1080/0144929X.2012.687774Search in Google Scholar
Riester, Arndt, Lisa Brunetti & Kordula De Kuthy. 2018. Annotation guidelines for questions under discussion and information structure. In Evangelia Adamou, Katharina Haude & Martine Vanhove (eds.), Information structure in lesser-described languages: Studies in prosody and syntax, 403–443. John Benjamins.10.1075/slcs.199.14rieSearch in Google Scholar
Rooth, Mats. 1992. A theory of focus interpretation. Natural Language Semantics 1. 75–116.10.1007/BF02342617Search in Google Scholar
Schröder, Marc & Jurgen Trouvain. 2003. The German text-to-speech synthesis system MARY: A tool for research, development and teaching. International Journal of Speech Technology 6. 365–377.10.1023/A:1025708916924Search in Google Scholar
Schwarzschild, Roger. 1999. GIVENness, AvoidF and other constraints on the placement of accent*. Natural Language Semantics 7. 141–177.10.1023/A:1008370902407Search in Google Scholar
Selkirk, Elisabeth O. 1984. Phonology and syntax: The relation between sound and structure. Cambridge, Massachussetts: The MIT Press.Search in Google Scholar
Sgall, Petr, Eva Hajičová & Eva Benešová. 1973. Topic, focus and generative semantics. Kronberg im Taunus, Germany: Scriptor.Search in Google Scholar
Silverman, Kim, Mary Beckman, John Pitrelli, Mori Ostendorf, Colin Wightman, Patti Price, Janet Pierrehumbert & Julia Hirschberg. 1992. TOBI: A standard for labeling English prosody. In Proceedings of the 2nd international conference on spoken language processing (ICSLP 92). International Speech Communication Association (ISCA). Banff, Canada.10.21437/ICSLP.1992-260Search in Google Scholar
Steedman, Mark. 2000. Information structure and the syntax-phonology interface. Linguistic Inquiry 31. 649–689.10.1162/002438900554505Search in Google Scholar
Syrdal, Ann K. & Yeon-Jun Kim. 2008. Dialog speech acts and prosody: Considerations for TTS. In Proceedings of the 4th international conference on speech prosody. Campinas, Brazil: International Speech Communication Association (ISCA).Search in Google Scholar
Vallduví, Enric. 2016. Information structure. In Maria Aloni & Paul Dekker (eds.), The Cambridge handbook of formal semantics Cambridge handbooks in language and linguistics, 728–755. Cambridge: Cambridge University Press.10.1017/CBO9781139236157.024Search in Google Scholar
Vanrell, Maria, Ignasi Mascaró, Francesc Torres-Tamarit & Pilar Prieto. 2013. Intonation as an encoder of speaker certainty: Information and confirmation yes-no questions in Catalan. Language and Speech 56. 163–190.10.1177/0023830912443942Search in Google Scholar
Von Stechow, Arnim. 1981. Topic, focus and local relevance. In Willemijn Klein & Willem Levelt (eds.), Crossing the boundaries in linguistics: Studies presented to Manfred Bierwisch, 95–130. Dordrecht, Netherlands: Springer.10.1007/978-94-009-8453-0_5Search in Google Scholar
Wanner, Leo, Elisabeth André, Josep Blat, Stamatia Dasiopoulou, Mireia Farrús, Thiago Fraga, Eleni Kamateri, Florian Lingenfelser, Gerard Llorach, Oriol Martínez, Georgios Meditskos, Simon Mille, Wolfgang Minker, Louisa Pragst, Dominik Schiller, Andries Stam, Ludo Stellingwerff, Federico Sukno, Bianca Vieru & Stefanos Vrochidis. 2017. Kristina: A knowledge-based virtual conversation agent. In Proceedings of the 15th international conference on practical applications of agents and multi-agent systems (PAAMS). Oporto, Portugal: Springer.10.1007/978-3-319-59930-4_23Search in Google Scholar
Wargnier, Pierre, Giovanni Carletti, Yann Laurent-Corniquet, Samuel Benveniste, Pierre Jouvelot & Rigaud Anne-Sophie. 2016. Field evaluation with cognitively-impaired older adults of attention management in the embodied conversational agent louise. In Proceedings of the 4th international conference on serious games and applications for health (SeGAH). Orlando, FL, USA: IEEE.10.1109/SeGAH.2016.7586282Search in Google Scholar
Wolff, Susann & Andre Brechmann. 2015. Carrot and stick 2.0: The benefits of natural and motivational prosody in computer-assisted learning. Computers in Human Behavior 43. 76–84.10.1016/j.chb.2014.10.015Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston