Abstract
Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with more than 80 new datasets appearing in the past 2 years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of “skills” that question answering/reading comprehension systems are supposed to acquire and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of overfocusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data and at researchers working on new resources.
- [1] . 2019. X-WikiRE: A large, multilingual resource for relation extraction as machine comprehension. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo’19). 265–274. Google ScholarCross Ref
- [2] . 2019. ComQA: A community-sourced dataset for complex factoid question answering with paraphrase clusters. In Proceedings of the 17th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 307–317. https://aclweb.org/anthology/papers/N/N19/N19-1027/.Google Scholar
- [3] . 2019. VQD: Visual query detection in natural scenes. In Proceedings of the 17th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 1955–1961. Google ScholarCross Ref
- [4] . 2019. TallyQA: Answering complex counting questions. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19). 8076–8084. Google ScholarDigital Library
- [5] . 2009. The Hitchhiker’s Guide to the Galaxay. Ballantine Books, New York, NY.
PR6051.D3352 H5 2009 Google Scholar - [6] . 2022. TopiOCQA: Open-domain conversational question answering with topic switching. Transactions of the Association for Computational Linguistics 10 (
April 2022), 468–483. Google ScholarCross Ref - [7] . 2021. CrossVQA: Scalably generating benchmarks for systematically testing VQA generalization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 2148–2166. Google ScholarCross Ref
- [8] . 1998. Dialogue Acts in Verbmobil 2.
Technical Report . Verbmobil.Google Scholar - [9] . 2021. Open-domain question answering goes conversational via question rewriting. In Proceedings of the 19th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’21). 520–534. https://aclanthology.org/2021.naacl-main.44.Google ScholarCross Ref
- [10] . 2015. VQA: Visual question answering. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 2425–2433. http://ieeexplore.ieee.org/document/7410636/.Google ScholarDigital Library
- [11] . 2020. On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 4623–4637. Google ScholarCross Ref
- [12] . 2021. Challenges in information-seeking QA: Unanswerable questions and paragraph retrieval. In Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP’21). 1492–1504. Google ScholarCross Ref
- [13] . 2018. Multilingual extractive reading comprehension by runtime machine translation. arXiv:1809.03275 [CS] (2018). http://arxiv.org/abs/1809.03275.Google Scholar
- [14] . 2020. XOR QA: Cross-lingual open-retrieval question answering. arXiv:2010.11856 [CS] (2020). http://arxiv.org/abs/2010.11856.Google Scholar
- [15] . 2017. Frames: A corpus for adding memory to goal-oriented dialogue systems. arXiv:1704.00057 [CS] (2017). http://arxiv.org/abs/1704.00057.Google Scholar
- [16] . 2020. Generating fact checking explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 7352–7364. Google ScholarCross Ref
- [17] . 2019. MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 4685–4697. Google ScholarCross Ref
- [18] . 2018. Multiple-choice item format. In The TESOL Encyclopedia of English Language Teaching. Wiley, 1–8. Google ScholarCross Ref
- [19] . 2016. MS MARCO: A human generated machine reading comprehension dataset. arXiv:1611.09268 [CS] (2016). http://arxiv.org/abs/1611.09268.Google Scholar
- [20] . 2017. Embracing data abundance: BookTest dataset for reading comprehension. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). https://openreview.net/pdf?id=H1U4mhVFe.Google Scholar
- [21] . 2016. The first cross-script code-mixed question answering corpus. In Proceedings of the Workshop on Modeling, Learning, and Mining for Cross/Multilinguality (MultiLingMine’16) Co-located with the 2016 European Conference on Information Retrieval (ECIR’16). 1–10. 56–65. http://ceur-ws.org/Vol-1589/MultiLingMine6.pdf.Google Scholar
- [22] . 2019. Analogy and analogical reasoning. In The Stanford Encyclopedia of Philosophy (Spring 2019 ed.), (Ed.). Metaphysics Research Lab, Stanford University, Stanford, CA. https://plato.stanford.edu/archives/spr2019/entries/reasoning-analogy/.Google Scholar
- [23] . 2019. The #BenderRule: On naming the languages we study and why it matters. The Gradient. Retrieved September 16, 2022 from https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/.Google Scholar
- [24] . 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6 (2018), 587–604. Google ScholarCross Ref
- [25] . 2013. Semantic parsing on Freebase from question-answer pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 1533–1544. https://www.aclweb.org/anthology/D13-1160.Google Scholar
- [26] . 2014. Modeling biological processes for reading comprehension. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1499–1510.Google ScholarCross Ref
- [27] . 2020. STARC: Structured annotations for reading comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 5726–5735. https://www.aclweb.org/anthology/2020.acl-main.507.Google Scholar
- [28] . 2019. Abductive commonsense reasoning. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). https://openreview.net/forum?id=Byg1v1HKDB.Google Scholar
- [29] . 2020. Experience grounds language. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 8718–8735. Google ScholarCross Ref
- [30] . 2020. PIQA: Reasoning about physical commonsense in natural language. Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). 7432–7439. Google ScholarCross Ref
- [31] . 2020. SubjQA: A dataset for subjectivity and review comprehension. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 5480–5494. Google ScholarCross Ref
- [32] . 2008. Inference. Retrieved September 16, 2022 from https://www.oxfordreference.com/view/10.1093/acref/9780199541430.001.0001/acref-9780199541430.Google Scholar
- [33] . 2008. Reasoning. Retrieved September 16, 2022 from https://www.oxfordreference.com/view/10.1093/acref/9780199541430.001.0001/acref-9780199541430.Google Scholar
- [34] . 2020. Multiple Choice Questions: An Introductory Guide. Retrieved September 16, 2022 from https://melbourne-cshe.unimelb.edu.au/__data/assets/pdf_file/0010/3430648/multiple-choice-questions_final.pdf.Google Scholar
- [35] . 2015. Large-scale simple question answering with memory networks. arXiv:1506.02075 [CS] (2015). http://arxiv.org/abs/1506.02075.Google Scholar
- [36] . 2019. What question answering can learn from trivia nerds. arXiv:1910.14464 [CS] (2019). http://arxiv.org/abs/1910.14464.Google Scholar
- [37] . 2018. Human-computer question answering: The case for quizbowl. In Proceedings of the NIPS’17 Competition: Building Intelligent Systems, Sergio Escalera and Markus Weimer (Eds.). Springer Series on Challenges in Machine Learning. Springer International, Cham, Switzerland, 169–180. Google ScholarCross Ref
- [38] . 2020. Language models are few-shot learners. arXiv:2005.14165 [CS] (2020). http://arxiv.org/abs/2005.14165.Google Scholar
- [39] . 2018. MultiWOZ—A large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 5016–5026. Google ScholarCross Ref
- [40] . 2020. A review of public datasets in question answering research. ACM SIGIR Forum 54, 2 (2020), 23. http://www.sigir.org/wp-content/uploads/2020/12/p07.pdf.Google ScholarDigital Library
- [41] . 2020. DoQA-accessing domain-specific FAQs via conversational QA. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 7302–7314. https://aclanthology.org/2020.acl-main.652/.Google ScholarCross Ref
- [42] . 2022. KQA Pro: A dataset with explicit compositional programs for complex question answering over knowledge base. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 6101–6119. Google ScholarCross Ref
- [43] . 2019. Automatic Spanish translation of the SQuAD dataset for multilingual question answering. arXiv:1912.05200 [CS] (2019). http://arxiv.org/abs/1912.05200.Google Scholar
- [44] . 2020. The TechQA dataset. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 1269–1278. https://www.aclweb.org/anthology/2020.acl-main.117.Google ScholarCross Ref
- [45] . 2020. Evaluation of text generation: A survey. arXiv:2006.14799 [CS] (2020). http://arxiv.org/abs/2006.14799.Google Scholar
- [46] . 2018. Code-mixed question answering challenge: Crowd-sourcing data and techniques. In Proceedings of the 3rd Workshop on Computational Approaches to Linguistic Code-Switching. 29–38. Google ScholarCross Ref
- [47] . 1999. Life of Brian.Google Scholar
- [48] . 2017. Counting everyday objects in everyday scenes. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1135–1144. https://openaccess.thecvf.com/content_cvpr_2017/html/Chattopadhyay_Counting_Everyday_Objects_CVPR_2017_paper.html.Google ScholarCross Ref
- [49] . 2021. Evaluating entity disambiguation and the role of popularity in retrieval-based NLP. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL’20). 4472–4485. Google ScholarCross Ref
- [50] . 2019. Evaluating question answering evaluation. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering. 119–124. Google ScholarCross Ref
- [51] . 2020. MOCHA: A dataset for training and evaluating generative reading comprehension metrics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 6521–6532. https://www.aclweb.org/anthology/2020.emnlp-main.528.Google ScholarCross Ref
- [52] . 2017. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 1870–1879. Google ScholarCross Ref
- [53] . 2020. Open-domain question answering. In Proceedings of ACL: Tutorial Abstracts. 34–37. Google ScholarCross Ref
- [54] . 2020. HybridQA: A dataset of multi-hop question answering over tabular and textual data. In Findings of EMNLP’20. 1026–1036. Google ScholarCross Ref
- [55] . 2021. WebSRC: A dataset for web-based structural reading comprehension. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 4173–4185. Google ScholarCross Ref
- [56] . 2021. FinQA: A dataset of numerical reasoning over financial data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 3697–3711. Google ScholarCross Ref
- [57] . 2000. Logical models of argument. ACM Computing Surveys 32, 4 (2000), 337–383. .Google ScholarDigital Library
- [58] . 2018. Adversarial TableQA: Attention supervision for question answering on tables. In Proceedings of Machine Learning Research. 391–406. http://proceedings.mlr.press/v95/cho18a/cho18a.pdf.Google Scholar
- [59] . 2018. QuAC: Question answering in context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 2174–2184. http://aclweb.org/anthology/D18-1241.Google ScholarCross Ref
- [60] . 2021. Decontextualization: Making sentences stand-alone. Transactions of the Association for Computational Linguistics 9 (April 2021), 447–461. Google ScholarCross Ref
- [61] . 2022. Machine reading, fast and slow: When do models “Understand” language? In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 78–93. https://aclanthology.org/2022.coling-1.8.Google Scholar
- [62] . 2019. Look before you hop: Conversational question answering over knowledge graphs using judicious context expansion. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 729–738. Google ScholarDigital Library
- [63] . 2021. Perhaps PTLMs should go to school—A task to assess open book and closed book QA. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 6104–6111. Google ScholarCross Ref
- [64] . 2018. Simple and effective multi-paragraph reading comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 845–855. Google ScholarCross Ref
- [65] . 2019. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 17th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 2924–2936. https://aclweb.org/anthology/papers/N/N19/N19-1300/.Google Scholar
- [66] . 2020. TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages. Transactions of the Association for Computational Linguistics 8 (July 2020), 454–470. Google ScholarCross Ref
- [67] . 2018. Think you have solved question answering? Try ARC, the AI2 reasoning challenge. arXiv:1803.05457 [CS] (2018). http://arxiv.org/abs/1803.05457.Google Scholar
- [68] . 2020. TutorialVQA: Question answering dataset for tutorial videos. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’20). 5450–5455. https://www.aclweb.org/anthology/2020.lrec-1.670.Google Scholar
- [69] . 2020. Event-QA: A dataset for event-centric question answering over knowledge graphs. arXiv:2004.11861 [CS] (2020). http://arxiv.org/abs/2004.11861.Google Scholar
- [70] . 2019. Enabling deep learning for large scale question answering in Italian. Intelligenza Artificiale 13, 1 (Jan. 2019), 49–61. Google ScholarCross Ref
- [71] . 2019. A span-extraction dataset for Chinese machine reading comprehension. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 5883–5889. Google ScholarCross Ref
- [72] . 2018. Dataset for the first evaluation on chinese machine reading comprehension. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18). https://www.aclweb.org/anthology/L18-1431.Google Scholar
- [73] . 2016. Consensus attention-based neural networks for chinese reading comprehension. In Proceedings of the International Conference on Computational Linguistics (COLING’16). 1777–1786. https://www.aclweb.org/anthology/C16-1167.Google Scholar
- [74] . 2020. A sentence Cloze dataset for Chinese machine reading comprehension. In Proceedings of the International Conference on Computational Linguistics (COLING’20). 6717–6723. Google ScholarCross Ref
- [75] . 2019. Quoref: A reading comprehension dataset with questions requiring coreferential reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 5924–5931. Google ScholarCross Ref
- [76] . 2021. A dataset of information-seeking questions and answers anchored in research papers. In Proceedings of the 19th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’21). 4599–4610. Google ScholarCross Ref
- [77] . 2019. Logic and probability. In The Stanford Encyclopedia of Philosophy (Summer 2019 ed.), (Ed.). Metaphysics Research Lab, Stanford University, Stanford, CA. https://plato.stanford.edu/archives/sum2019/entries/logic-probability/.Google Scholar
- [78] . 2020. FQuAD: French question answering dataset. arXiv:2002.06071 [CS] (2020). http://arxiv.org/abs/2002.06071.Google Scholar
- [79] . 2018. Wizard of Wikipedia: Knowledge-powered conversational agents. arXiv:1811.01241 [CS] (2018). http://arxiv.org/abs/1811.01241.Google Scholar
- [80] . 2015. Learning hybrid representations to retrieve semantically equivalent questions. In Proceedings of the Joint Conference of the 53rd Annual Meeting of the Association for Computational Linguistics and the 5th International Joint Conference on Natural Language Processing (ACL-IJCNLP’21). 694–699. Google ScholarCross Ref
- [81] . 2017. Abduction. In The Stanford Encyclopedia of Philosophy (Summer 2017 ed.), (Ed.). Metaphysics Research Lab, Stanford University, Stanford, CA. https://plato.stanford.edu/archives/sum2017/entries/abduction/.Google Scholar
- [82] . 2019. ORB: An open reading benchmark for comprehensive evaluation of machine reading comprehension. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering. 147–153. Google ScholarCross Ref
- [83] . 2019. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 17th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 2368–2378. https://aclweb.org/anthology/papers/N/N19/N19-1246/.Google Scholar
- [84] . 2018. Overview of the NLPCC 2017 shared task: Open domain Chinese question answering. In Natural Language Processing and Chinese Computing, , , , , and (Eds.). Springer, Cham, Switzerland, 954–961. Google ScholarCross Ref
- [85] . 2020. To test machine comprehension, start by defining comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 7839–7859. https://www.aclweb.org/anthology/2020.acl-main.701.Google ScholarCross Ref
- [86] . 2017. SearchQA: A new Q&A dataset augmented with context from a search engine. arXiv:1704.05179 [CS] (2017). http://arxiv.org/abs/1704.05179.Google Scholar
- [87] . 2021. English machine reading comprehension datasets: A survey. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 8784–8804. Google ScholarCross Ref
- [88] . 2020. SberQuAD—Russian reading comprehension dataset: Description and analysis. arXiv:1912.09723 [CS] (2020). Google ScholarDigital Library
- [89] . 2019. Can you unpack that? Learning to rewrite questions-in-context. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 5918–5924. Google ScholarCross Ref
- [90] . 2017. Key-value retrieval networks for task-oriented dialogue. arXiv:1705.05414 [CS] (2017). http://arxiv.org/abs/1705.05414.Google Scholar
- [91] . 2020. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics 8 (2020), 34–48. arXiv:1907.13528Google ScholarCross Ref
- [92] . 2019. ELI5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 3558–3567. Google ScholarCross Ref
- [93] . 2018. Identifying well-formed natural language questions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing(EMNLP’18). 798–803. Google ScholarCross Ref
- [94] . 2020. Temporal reasoning via audio question answering. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020), 2283–2294. Google ScholarDigital Library
- [95] . 2020. Read and reason with MuSeRC and RuCoS: Datasets for machine reading comprehension for Russian. In Proceedings of the International Conference on Computational Linguistics (COLING’20). 6481–6497. Google ScholarCross Ref
- [96] . 2020. IIRC: A dataset of incomplete information reading comprehension questions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 1137–1147. Google ScholarCross Ref
- [97] . 2020. A dataset and baselines for visual question answering on art. arXiv:2008.12520 [CS] (2020). http://arxiv.org/abs/2008.12520.Google Scholar
- [98] . 2020. Evaluating models’ local decision boundaries via contrast sets. In Findings of EMNLP’20. 1307–1323. Google ScholarCross Ref
- [99] . 2019. Question answering is a format; when is it useful? arXiv:1909.11291 (2019).Google ScholarCross Ref
- [100] . 2021. Competency problems: On finding and removing artifacts in language data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 1801–1813. Google ScholarCross Ref
- [101] . 2020. TANDA: Transfer and adapt pre-trained transformer models for answer sentence selection. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). 7780–7788. Google ScholarCross Ref
- [102] . 2020. Datasheets for datasets. arXiv:1803.09010 [CS] (2020). http://arxiv.org/abs/1803.09010.Google Scholar
- [103] . 2019. Posing fair generalization tasks for natural language inference. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 4475–4485. Google ScholarCross Ref
- [104] . 2019. Are we modeling the task or the annotator? An investigation of annotator bias in natural language understanding datasets. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 1161–1166. Google ScholarCross Ref
- [105] . 2020. DaNetQA: A yes/no question answering dataset for the russian language. arXiv:2010.02605 [CS] (2020). http://arxiv.org/abs/2010.02605.Google Scholar
- [106] . 2019. Assessing BERT’s syntactic abilities. arXiv:1901.05287 [CS] (2019). http://arxiv.org/abs/1901.05287.Google Scholar
- [107] . 2021. On the interaction of belief bias and explanations. In Findings of ACL-IJCNLP’21. 2930–2942. https://aclanthology.org/2021.findings-acl.259.Google Scholar
- [108] . 2012. SemEval-2012 Task 7: Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics (*SEM’12). 394–398. https://aclweb.org/anthology/papers/S/S12/S12-1052/.Google Scholar
- [109] . 2018. IQA: Visual question answering in interactive environments. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 4089–4098.Google ScholarCross Ref
- [110] . 2021. Beyond I.I.D.: Three levels of generalization for question answering on knowledge bases. arXiv:2011.07743 [CS] (2021). Google ScholarDigital Library
- [111] . 2020. MultiReQA: A cross-domain evaluation for retrieval question answering models. arXiv:2005.02507 [CS] (2020). http://arxiv.org/abs/2005.02507.Google Scholar
- [112] . 2017. IJCNLP-2017 Task 5: Multi-choice question answering in examinations. In Proceedings of the IJCNLP’17, Shared Tasks. 34–40. https://www.aclweb.org/anthology/I17-4005.Google Scholar
- [113] . 2021. Disfl-QA: A benchmark dataset for understanding disfluencies in question answering. In Findings of ACL 2021. https://arxiv.org/abs/2106.04016.Google Scholar
- [114] . 2018. MMQA: A multi-domain multi-lingual question-answering framework for English and Hindi. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18). https://www.aclweb.org/anthology/L18-1440.Google Scholar
- [115] . 2019. AmazonQA: A review-based question answering task. In Proceedings of the 2019 International Joint Conference on Artificial Intelligence (IJCAI’19). 4996–5002. Google ScholarCross Ref
- [116] . 2018. Transliteration better than translation? Answering code-mixed questions over a knowledge base. In Proceedings of the 3rd Workshop on Computational Approaches to Linguistic Code-Switching. 39–50. Google ScholarCross Ref
- [117] . 2018. Annotation artifacts in natural language inference data. In Proceedings of the 16th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18). 107–112. Google ScholarCross Ref
- [118] . 2021. ESTER: A machine reading comprehension dataset for reasoning about event semantic relations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 7543–7559. Google ScholarCross Ref
- [119] . 2019. ANTIQUE: A non-factoid question answering benchmark. arXiv:1905.08957 [CS] (2019). http://arxiv.org/abs/1905.08957.Google Scholar
- [120] . 2017. Toward automated fact-checking: Detecting check-worthy factual claims by ClaimBuster. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). ACM, New York, NY, 1803–1812. http://dblp.uni-trier.de/db/conf/kdd/kdd2017.html#HassanALT17.Google ScholarDigital Library
- [121] . 2021. Inductive logic. In The Stanford Encyclopedia of Philosophy (Spring 2022 ed.), (Ed.). Metaphysics Research Lab, Stanford University, Stanford, CA. https://plato.stanford.edu/archives/spr2021/entries/logic-inductive/.Google Scholar
- [122] . 2018. DuReader: A Chinese machine reading comprehension dataset from real-world applications. In Proceedings of the Workshop on Machine Reading for Question Answering. 37–46. Google ScholarCross Ref
- [123] . 2004. Meanings and configurations of questions in English. In Proceedings of the International Conference on Speech Prosody. 309–312. https://www.isca-speech.org/archive/sp2004/papers/sp04_309.pdf.Google Scholar
- [124] . 1990. The ATIS spoken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24–27, 1990. https://www.aclweb.org/anthology/H90-1021.Google ScholarDigital Library
- [125] . 2015. Teaching machines to read and comprehend. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS’15). 1693–1701. http://dl.acm.org/citation.cfm?id=2969239.2969428.Google Scholar
- [126] . 2016. WikiReading: A novel large-scale language understanding task over Wikipedia. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). 1535–1545. Google ScholarCross Ref
- [127] . 2015. The goldilocks principle: Reading children’s books with explicit memory representations. arXiv:1511.02301 [CS] (2015). http://arxiv.org/abs/1511.02301.Google Scholar
- [128] . 2020. Linguistic appropriateness and pedagogic usefulness of reading comprehension questions. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’20). 1753–1762. https://www.aclweb.org/anthology/2020.lrec-1.217.Google Scholar
- [129] . 2019. Cosmos QA: Machine reading comprehension with contextual commonsense reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 2391–2401. Google ScholarCross Ref
- [130] . 2019. GQA: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 6700–6709. https://openaccess.thecvf.com/content_CVPR_2019/html/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.html.Google ScholarCross Ref
- [131] . 2011. SRI’s Amex Travel Agent Data. Retrieved September 16, 2022 from http://www.ai.sri.com/~communic/amex/amex.html.Google Scholar
- [132] . 2017. TGIF-QA: Toward spatio-temporal reasoning in visual question answering. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2758–2766. https://openaccess.thecvf.com/content_cvpr_2017/html/Jang_TGIF-QA_Toward_Spatio-Temporal_CVPR_2017_paper.html.Google ScholarCross Ref
- [133] . 2019. CAsT 2019: The conversational assistance track overview. In Proceedings of the Text REtrival Conference (TREC’19).Google Scholar
- [134] . 2017. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 2021–2031. Google ScholarCross Ref
- [135] . 2018. TempQuestions: A benchmark for temporal question answering. In Companion of WWW’18. ACM, New York, NY, 1057–1062. Google ScholarDigital Library
- [136] . 2018. TEQUILA: Temporal question answering over knowledge bases. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM’18). ACM, New York, NY, 1807–1810. Google ScholarDigital Library
- [137] . 2019. FreebaseQA: A new factoid QA data set matching trivia-style question-answer pairs with Freebase. In Proceedings of the 17th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 318–323. https://aclweb.org/anthology/papers/N/N19/N19-1028/.Google Scholar
- [138] . 2022. CARETS: A consistency and robustness evaluative test suite for VQA. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 6392–6405. Google ScholarCross Ref
- [139] . 2019. PubMedQA: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 2567–2577. Google ScholarCross Ref
- [140] . 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2901–2910.Google ScholarCross Ref
- [141] . 2017. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 1601–1611. Google ScholarCross Ref
- [142] . 2022. AIT-QA: Question answering dataset over complex tables in the airline industry. In Proceedings of the 20th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’22). 305–314. Google ScholarCross Ref
- [143] . 2019. Learning the difference that makes a difference with counterfactually-augmented data. In Proceedings of the International Conference on Learning Representations(ICLR’19). https://openreview.net/forum?id=Sklgs0NFvr.Google Scholar
- [144] . 2020. Project PIAF: Building a native French question-answering dataset. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’20). 5481–5490. https://www.aclweb.org/anthology/2020.lrec-1.673.Google Scholar
- [145] . 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 16th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18). 252–262. Google ScholarCross Ref
- [146] . 2020. UnifiedQA: Crossing format boundaries with a single QA system. arXiv:2005.00700 [CS] (2020). https://arxiv.org/abs/2005.00700.Google Scholar
- [147] . 2017. DeepStory: Video story QA by deep embedded memory networks. In Proceedings of the 2017 International Joint Conference on Artificial Intelligence (IJCAI’17). https://openreview.net/forum?id=ryZczSz_bS.Google ScholarCross Ref
- [148] . 2016. Dialog State Tracking Challenge 5 Handbook v.3.1. Retrieved September 16, 2022 from http://workshop.colips.org/dstc5/.Google Scholar
- [149] . 2018. The NarrativeQA reading comprehension challenge. Transactions of the Association for Computational Linguistics 6 (2018), 317–328. http://aclweb.org/anthology/Q18-1023.Google ScholarCross Ref
- [150] . 2020. SCDE: Sentence Cloze dataset with high quality distractors from examinations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 5668–5683. Google ScholarCross Ref
- [151] . 2017. Defeasible reasoning. In The Stanford Encyclopedia of Philosophy (Winter 2017 ed.), (Ed.). Metaphysics Research Lab, Stanford University, Stanford, CA. https://plato.stanford.edu/archives/win2017/entries/reasoning-defeasible/.Google Scholar
- [152] . 2020. RuBQ: A Russian dataset for question answering over wikidata. arXiv:2005.10659 [CS] (2020). http://arxiv.org/abs/2005.10659.Google Scholar
- [153] . 2021. Hurdles to progress in long-form question answering. In Proceedings of the 19th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’21). 4940–4957. Google ScholarCross Ref
- [154] . 2014. Learning to automatically solve algebra word problems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 271–281. Google ScholarCross Ref
- [155] . 2019. Natural questions: A benchmark for question answering research. Transactions of Association for Computational Linguistics 7 (2019), 452–466. https://ai.google/research/pubs/pub47761.Google ScholarCross Ref
- [156] . 2017. RACE: Large-scale reading comprehension dataset from examinations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 785–794. Google ScholarCross Ref
- [157] . 2018. ODSQA: Open-domain spoken question answering dataset. In Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT’18). 949–956. Google ScholarCross Ref
- [158] . 2018. Semi-supervised training data generation for multilingual question answering. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18). https://www.aclweb.org/anthology/L18-1437.Google Scholar
- [159] . 2018. TVQA: Localized, compositional video question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 1369–1379. https://www.aclweb.org/anthology/papers/D/D18/D18-1167/.Google ScholarCross Ref
- [160] . 2012. The Winograd Schema Challenge. In Proceedings of the 13th International Conference on Principles of Knowledge Representation and Reasoning. 552–561.Google Scholar
- [161] . 2017. Zero-shot relation extraction via reading comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL’17). 333–342. Google ScholarCross Ref
- [162] . 2020. MLQA: Evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 7315–7330. https://www.aclweb.org/anthology/2020.acl-main.653/.Google ScholarCross Ref
- [163] . 2018. Spoken SQuAD: A study of mitigating the impact of speech recognition errors on listening comprehension. arXiv:1804.00320 [CS] (2018). http://arxiv.org/abs/1804.00320.Google Scholar
- [164] . 2022. MultiSpanQA: A dataset for multi-span question answering. In Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’22). 1250–1260. Google ScholarCross Ref
- [165] . 2020. Molweni: A challenge multiparty dialogues-based machine reading comprehension dataset with discourse structure. arXiv:2004.05080 [CS] (2020). http://arxiv.org/abs/2004.05080.Google Scholar
- [166] . 2021. MLEC-QA: A Chinese multi-choice biomedical question answering dataset. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 8862–8874. Google ScholarCross Ref
- [167] . 2016. Dataset and neural recurrent sequence labeling model for open-domain factoid question answering. arXiv:1607.06275 [CS] (2016). http://arxiv.org/abs/1607.06275.Google Scholar
- [168] . 2019. Entity-relation extraction as multi-turn question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 1340–1350. Google ScholarCross Ref
- [169] . 2022. MMCoQA: Conversational question answering over text, tables, and images. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 4220–4231. Google ScholarCross Ref
- [170] . 2019. A new multi-choice reading comprehension dataset for curriculum learning. In Proceedings of the 11th Asian Conference on Machine Learning. 742–757. http://proceedings.mlr.press/v101/liang19a.html.Google Scholar
- [171] . 2019. KorQuAD1.0: Korean QA dataset for machine reading comprehension. arXiv:1909.07005 [CS] (2019). http://arxiv.org/abs/1909.07005.Google Scholar
- [172] . 2020. Birds have four legs? NumerSense: Probing numerical commonsense knowledge of pre-trained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 6862–6868. Google ScholarCross Ref
- [173] . 2019. Reasoning over paragraph effects in situations. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering. 58–62. Google ScholarCross Ref
- [174] . 2022. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 3214–3252. Google ScholarCross Ref
- [175] . 2014. Microsoft COCO: Common objects in context. In Computer Vision—ECCV 2014, , , , and (Eds.). Springer, Cham, Switzerland, 740–755. Google ScholarCross Ref
- [176] . 2020. How can we accelerate progress towards human-like linguistic generalization? arXiv:2005.00955 [CS] (2020). https://arxiv.org/pdf/2005.00955.pdf.Google Scholar
- [177] . 2020. LogiQA: A challenge dataset for machine reading comprehension with logical reasoning. In Proceedings of the 2020 International Joint Conference on Artificial Intelligence (IJCAI’20). 3622–3628. Google ScholarCross Ref
- [178] . 2019. XQA: A cross-lingual open-domain question answering dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 2358–2368. Google ScholarCross Ref
- [179] . 2019. XCMRC: Evaluating cross-lingual machine reading comprehension. In Natural Language Processing and Chinese Computing, , , , , and (Eds.). Springer, Cham, Switzerland, 552–564. Google ScholarDigital Library
- [180] . 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692 [CS] (2019). http://arxiv.org/abs/1907.11692.Google Scholar
- [181] . 2017. World knowledge for reading comprehension: Rare entity prediction with hierarchical LSTMs using external descriptions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 825–834. Google ScholarCross Ref
- [182] . 2020. MKQA: A linguistically diverse benchmark for multilingual open domain question answering. arXiv:2007.15207 [CS] (2020). http://arxiv.org/abs/2007.15207.Google Scholar
- [183] . 2015. The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv:1506.08909 [CS] (2015). http://arxiv.org/abs/1506.08909.Google Scholar
- [184] . 2018. Challenging reading comprehension on daily conversation: Passage completion on multiparty dialog. In Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’22). 2039–2048. Google ScholarCross Ref
- [185] . 2017. Multiple-choice tests can support deep learning! Proceedings of the Atlantic Universities’ Teaching Showcase 21 (2017), 61–66. https://ojs.library.dal.ca/auts/article/view/8430.Google Scholar
- [186] . 2017. A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering. arXiv:1611.07810 [CS] (2017). http://arxiv.org/abs/1611.07810.Google Scholar
- [187] . 2018. Developing certification exam questions: More deliberate than you may think. Professional Safety 63, 5 (May 2018), 44–49. https://onepetro.org/PS/article/63/05/44/33528/Developing-Certification-Exam-Questions-More.Google Scholar
- [188] . 2013. Open dataset for development of Polish question answering systems. In Proceedings of the 6th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics. https://www.researchgate.net/profile/Maciej-Piasecki/publication/272685856_Open_dataset_for_development_of_Polish_Question_Answering_systems.Google Scholar
- [189] . 2022. ChartQA: A benchmark for question answering about charts with visual and logical reasoning. In Findings of ACL’22. 2263–2279. Google ScholarCross Ref
- [190] . 2016. Addressing complex and subjective product-related queries with customer reviews. In Proceedings of the 25th International Conference on World Wide Web (WWW’16). 625–635. Google ScholarDigital Library
- [191] . 2018. The natural language decathlon: Multitask learning as question answering. arXiv:1806.08730 [CS, STAT] (2018). http://arxiv.org/abs/1806.08730.Google Scholar
- [192] . 1969. Some philosophical problems from the standpoint of artificial intelligence. In Machine Intelligence 4, and (Eds.). Edinburgh University Press, 463–502.Google Scholar
- [193] . 2019. BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance. arXiv:1911.02969 [CS] (2019). http://arxiv.org/abs/1911.02969.Google Scholar
- [194] . 2019. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 3428–3448. Google ScholarCross Ref
- [195] . 2009. Toward a comprehensive model of comprehension. In The Psychology of Learning and Motivation.
Psychology of Learning and Motivation Series , Vol. 51. Academic Press, Cambridge, MA, 297–384. Google ScholarCross Ref - [196] . 2020. A diverse corpus for evaluating and developing English math word problem solvers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 975–984. https://www.aclweb.org/anthology/2020.acl-main.92.Google ScholarCross Ref
- [197] . 2018. Can a suit of armor conduct electricity? A new dataset for open book question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 2381–2391. http://aclweb.org/anthology/D18-1260.Google ScholarCross Ref
- [198] . 2020. AmbigQA: Answering ambiguous open-domain questions. arXiv:2004.10645 [CS] (2020). http://arxiv.org/abs/2004.10645.Google Scholar
- [199] . 2021. SPARTQA: A textual question answering benchmark for spatial reasoning. In Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’21). 4582–4598. Google ScholarCross Ref
- [200] . 2020. Towards question format independent numerical reasoning: A set of prerequisite tasks. arXiv preprint arXiv:2005.08516 (2020). https://arxiv.org/abs/2005.08516.Google Scholar
- [201] . 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*’19). ACM, New York, NY, 220–229. Google ScholarDigital Library
- [202] . 2016. InScript: Narrative texts annotated with script information. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’16). 3485–3493. https://www.aclweb.org/anthology/L16-1555.Google Scholar
- [203] . 2020. COVID-QA: A question answering dataset for COVID-19. In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL’20. https://www.aclweb.org/anthology/2020.nlpcovid19-acl.18.Google Scholar
- [204] . 2017. LSDSem 2017 shared task: The story Cloze test. In Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential, and Discourse-Level Semantics. 46–51. http://www.aclweb.org/anthology/W17-0900.Google ScholarCross Ref
- [205] . 2019. Neural Arabic question answering. In Proceedings of the 4th Arabic Natural Language Processing Workshop. 108–118. Google ScholarCross Ref
- [206] . 2017. MarioQA: Answering questions by watching gameplay videos. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV’17). http://arxiv.org/abs/1612.01669.Google ScholarCross Ref
- [207] . 2017. SemEval-2017 Task 3: Community question answering. In Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval’17). 27–48. http://www.aclweb.org/anthology/S17-2003.Google ScholarCross Ref
- [208] . 2015. SemEval-2015 Task 3: Answer selection in community question answering. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 269–281.Google Scholar
- [209] . 2016. SemEval-2016 Task 3: Community question answering. 525–545.Google Scholar
- [210] . 2020. A Vietnamese dataset for evaluating machine reading comprehension. In Proceedings of the International Conference on Learning Representations (ICLR’20). 2595–2605. Google ScholarCross Ref
- [211] . 2020. TORQUE: A reading comprehension dataset of temporal ordering questions. arXiv:2005.00242 [CS] (2020). http://arxiv.org/abs/2005.00242.Google Scholar
- [212] . 2020. A method for building a commonsense inference dataset based on basic events. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 2450–2460. https://www.aclweb.org/anthology/2020.emnlp-main.192.Google ScholarCross Ref
- [213] . 2016. Who did what: A large-scale person-centered Cloze dataset. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’16). 2230–2235. Google ScholarCross Ref
- [214] . 2018. MCScript: A novel dataset for assessing machine comprehension using script knowledge. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18). https://www.aclweb.org/anthology/L18-1564.Google Scholar
- [215] . 2018. SemEval-2018 Task 11: Machine comprehension using commonsense knowledge. In Proceedings of the 12th International Workshop on Semantic Evaluation. 747–757. Google ScholarCross Ref
- [216] . 2018. emrQA: A large corpus for question answering on electronic medical records. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 2357–2368. http://aclweb.org/anthology/D18-1258.Google ScholarCross Ref
- [217] . 2022. QuALITY: Question answering with long input texts, yes! In Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’22). 5336–5358. Google ScholarCross Ref
- [218] . 2016. The LAMBADA dataset: Word prediction requiring a broad discourse context. arXiv:1606.06031 [CS] (2016). http://arxiv.org/abs/1606.06031.Google Scholar
- [219] . 2015. Compositional semantic parsing on semi-structured tables. In Proceedings of the Joint Conference of the 53rd Annual Meeting of the Association for Computational Linguistics and the 5th International Joint Conference on Natural Language Processing (ACL-IJCNLP’21). 1470–1480. Google ScholarCross Ref
- [220] . 2020. Generating natural questions from images for multimodal assistants. arXiv:2012.03678 [CS] (2020). http://arxiv.org/abs/2012.03678.Google Scholar
- [221] . 2014. Overview of CLEF question answering track 2014. In Information Access Evaluation. Multilinguality, Multimodality, and Interaction. Springer, Cham, Switzerland, 300–306. Google ScholarCross Ref
- [222] . 2015. Overview of the CLEF question answering track 2015. In Experimental IR Meets Multilinguality, Multimodality, and Interaction.Lecture Notes in Computer Science, Vol. 9283. Springer, 539–544.Google Scholar
- [223] . 2019. Introducing MANtIS: A novel multi-domain information seeking dialogues dataset. arXiv:1912.04639 [CS] (2019). http://arxiv.org/abs/1912.04639.Google Scholar
- [224] . 2019. Multi-domain goal-oriented dialogues (MultiDoGO): Strategies toward curating and annotating large scale dialogue data. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 4526–4536. Google ScholarCross Ref
- [225] . 2022. xGQA: Cross-lingual visual question answering. In Findings of ACL’22. 2497–2511. Google ScholarCross Ref
- [226] . 2014. The NIPS Experiment. Retrieved September 16, 2022 from http://blog.mrtz.org/2014/12/15/the-nips-experiment.html.Google Scholar
- [227] . 2019. Learning to deceive with attention-based explanations. arXiv:1909.07913 [CS] (2019). http://arxiv.org/abs/1909.07913.Google Scholar
- [228] . 2021. TIMEDIAL: Temporal commonsense reasoning in dialog. arXiv:2106.04571 [CS.CL] (2021).Google Scholar
- [229] . 2019. A survey on neural machine reading comprehension. arXiv:1906.03824 [CS] (2019). http://arxiv.org/abs/1906.03824.Google Scholar
- [230] . 2018. Analyzing and characterizing user intent in information-seeking conversations. In Proceedings of the 41st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). ACM, New York, NY, 989–992. Google ScholarDigital Library
- [231] . 2019. Coached conversational preference elicitation: A case study in understanding movie preferences. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue. https://research.google/pubs/pub48414/.Google Scholar
- [232] . 2015. “Answer ka type kya he?”: Learning to classify questions in code-mixed language. In Proceedings of the 24th International Conference on World Wide Web (WWW’15 Companion). ACM, New York, NY, 853–858. Google ScholarDigital Library
- [233] . 2020. Explaining and improving model behavior with k nearest neighbor representations. arXiv:2010.09030 [CS] (2020). http://arxiv.org/abs/2010.09030.Google Scholar
- [234] . 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 784–789. http://aclweb.org/anthology/P18-2124.Google ScholarCross Ref
- [235] . 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’16). 2383–2392.Google ScholarCross Ref
- [236] . 2020. Neural unsupervised domain adaptation in NLP—A survey. In Proceedings of the International Conference on Computational Linguistics (COLING’20). 6838–6855. Google ScholarCross Ref
- [237] . 2018. Event2Mind: Commonsense inference on events, intents, and reactions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 463–473. Google ScholarCross Ref
- [238] . 2019. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7 (March 2019), 249–266. Google ScholarCross Ref
- [239] . 2016. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, New York, NY, 1135–1144. Google ScholarDigital Library
- [240] . 2020. Beyond accuracy: Behavioral testing of NLP models with checklist. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 4902–4912. https://www.aclweb.org/anthology/2020.acl-main.442.Google ScholarCross Ref
- [241] . 2013. MCTest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 193–203.Google Scholar
- [242] . 2021. Evaluation paradigms in question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 9630–9642. Google ScholarCross Ref
- [243] . 2020. Information seeking in the spirit of learning: A dataset for conversational curiosity. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 8153–8172. Google ScholarCross Ref
- [244] . 2021. Quizbowl: The case for incremental question answering. arXiv:1904.04792 [CS] (2021). Google ScholarCross Ref
- [245] . 2011. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In Proceedings of the AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning. 6.Google Scholar
- [246] . 2019. How the Transformers Broke NLP Leaderboards. Retrieved September 16, 2022 from https://hackingsemantics.xyz/2019/leaderboards/.Google Scholar
- [247] . 2021. Changing the world by changing the data. In Proceedings of the Conference of the Association for Computational Linguistics (ACL’21). 2182–2194. https://aclanthology.org/2021.acl-long.170.Google ScholarCross Ref
- [248] . 2020. What can we do to improve peer review in NLP? In Findings of EMNLP’20. 1256–1262. https://www.aclweb.org/anthology/2020.findings-emnlp.112/.Google Scholar
- [249] . 2020. Getting closer to AI complete question answering: A set of prerequisite real tasks. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). 8722–8731. https://aaai.org/ojs/index.php/AAAI/article/view/6398.Google ScholarCross Ref
- [250] . 2020. LAReQA: Language-agnostic answer retrieval from a multilingual pool. arXiv:2004.05484 [CS] (2020). http://arxiv.org/abs/2004.05484.Google Scholar
- [251] . 2021. Multi-domain multilingual question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21).Google ScholarCross Ref
- [252] . 2020. Thinking like a skeptic: Defeasible inference in natural language. In Findings of EMNLP’20. 4661–4675. Google ScholarCross Ref
- [253] . 2018. Does it care what you asked? Understanding importance of verbs in deep learning QA system. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP. 322–324. http://aclweb.org/anthology/W18-5436.Google ScholarCross Ref
- [254] . 2015. Learning answer-entailing structures for machine comprehension. In Proceedings of the Joint Conference of the 53rd Annual Meeting of the Association for Computational Linguistics and the 5th International Joint Conference on Natural Language Processing (ACL-IJCNLP’21). 239–249. Google ScholarCross Ref
- [255] . 2018. Interpretation of natural language rules in conversational machine reading. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 2087–2097. Google ScholarCross Ref
- [256] . 2019. WinoGrande: An adversarial Winograd Schema Challenge at scale. arXiv:1907.10641 [CS] (2019). http://arxiv.org/abs/1907.10641.Google Scholar
- [257] . 2019. Social IQA: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 4453–4463. Google ScholarCross Ref
- [258] . 2020. Beyond leaderboards: A survey of methods for revealing weaknesses in natural language inference data and models. arXiv:2005.14709 [CS] (2020). http://arxiv.org/abs/2005.14709.Google Scholar
- [259] . 2020. A framework for evaluation of machine reading comprehension gold standards. In Proceedings of the Language Resources and Evaluation Conference. http://arxiv.org/abs/2003.04642.Google Scholar
- [260] . 2015. A survey of available corpora for building data-driven dialogue systems. arXiv:1512.05742 [CS, STAT] (2015). http://arxiv.org/abs/1512.05742.Google Scholar
- [261] . 2019. DRCD: A Chinese machine reading comprehension dataset. arXiv:1806.00920 [CS] (2019). http://arxiv.org/abs/1806.00920.Google Scholar
- [262] . 2015. Automatically solving number word problems by semantic parsing and reasoning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1132–1142. Google ScholarCross Ref
- [263] . 2014. Overview of the NTCIR-11 QA-lab task. In Proceedings of the 11th NTCIR Conference. 518–529. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings11/pdf/NTCIR/OVERVIEW/01-NTCIR11-OV-QALAB-ShibukiH.pdf.Google Scholar
- [264] . 2019. CLUTRR: A diagnostic benchmark for inductive reasoning from text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 4496–4505. Google ScholarCross Ref
- [265] . 2020. A survey of code-switched speech and language processing. arXiv:1904.00784 [CS, STAT] (2020). http://arxiv.org/abs/1904.00784.Google Scholar
- [266] . 2021. NLQuAD: A non-factoid long question answering data set. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL’21). 1245–1255. https://aclanthology.org/2021.eacl-main.106.Google ScholarCross Ref
- [267] . 2016. An analysis of prerequisite skills for reading comprehension. In Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods. 1–5. Google ScholarCross Ref
- [268] . 2017. Evaluation metrics for machine reading comprehension: Prerequisite skills and readability. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 806–817. Google ScholarCross Ref
- [269] . 2020. Assessing the benchmarking capacity of machine reading comprehension datasets. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). http://arxiv.org/abs/1911.09241.Google ScholarCross Ref
- [270] . 2017. A corpus of natural language for visual reasoning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 217–223. Google ScholarCross Ref
- [271] . 2019. A corpus for reasoning about natural language grounded in photographs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 6418–6428. Google ScholarCross Ref
- [272] . 2022. ConditionalQA: A complex reading comprehension dataset with conditional answers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 3627–3637. Google ScholarCross Ref
- [273] . 2019. DREAM: A challenge data set and models for dialogue-based reading comprehension. Transactions of the Association for Computational Linguistics 7 (April 2019), 217–231. Google ScholarCross Ref
- [274] . 2020. TableQA: A large-scale Chinese Text-to-SQL dataset for table-aware SQL generation. arXiv:2006.06434 [CS] (2020). http://arxiv.org/abs/2006.06434.Google Scholar
- [275] . 2018. CliCR: A dataset of clinical case reports for machine reading comprehension. In Proceedings of the 16th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18). 1551–1563. Google ScholarCross Ref
- [276] . 2019. QuaRel: A dataset and models for answering questions about qualitative relationships. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19).Google ScholarDigital Library
- [277] . 2019. QuaRTz: An open-domain dataset of qualitative relationship questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 5941–5946. Google ScholarCross Ref
- [278] . 2018. The web as a knowledge-base for answering complex questions. In Proceedings of the 16th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18). 641–651. Google ScholarCross Ref
- [279] . 2019. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’22). 4149–4158. https://www.aclweb.org/anthology/papers/N/N19/N19-1421/.Google Scholar
- [280] . 2021. MultimodalQA: Complex question answering over text, tables and images. In Proceedings of the 9th International Conference on Learning Representations (ICLR’21). 12. https://openreview.net/pdf/f3dad930cb55abce99a229e35cc131a2db791b66.pdf.Google Scholar
- [281] . 2016. MovieQA: Understanding stories in movies through question-answering. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarCross Ref
- [282] . 2021. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Proceedings of the 35th Conference on Neural Information Processing Systems, Datasets, and Benchmarks Track. https://openreview.net/forum?id=wCu6T5xFjeJ.Google Scholar
- [283] . 2017. MISC: A data set of information-seeking conversations. In Proceedings of the 1st International Workshop on Conversational Approaches to Information Retrieval (CAIR’17). https://www.microsoft.com/en-us/research/wp-content/uploads/2017/07/Thomas-etal-CAIR17.pdf.Google Scholar
- [284] . 2019. Shifting the baseline: Single modality performance on visual navigation & QA. In Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’19). 1977–1983. https://www.aclweb.org/anthology/papers/N/N19/N19-1197/.Google ScholarCross Ref
- [285] . 2018. FEVER: A large-scale dataset for fact extraction and verification. In Proceedings of the 16th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18). 809–819. Google ScholarCross Ref
- [286] . 2018. Informing the design of spoken conversational search: Perspective paper. In Proceedings of the 2018 Conference on Human Information Interaction and Retrieval (CHIIR’18). ACM, New York, NY, 32–41. Google ScholarDigital Library
- [287] . 2017. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP. 191–200. Google ScholarCross Ref
- [288] . 2015. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16, 1 (April 2015), 138. Google ScholarCross Ref
- [289] . 2016. Towards machine comprehension of spoken content: Initial TOEFL listening comprehension test by machine. In Proceedings of the 17th Annual Conference of the International Speech Communication Association (Interspeech’16). 2731–2735. Google ScholarCross Ref
- [290] . 2017. Annotating derivations: A new evaluation strategy and dataset for algebra word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 494–504. https://www.aclweb.org/anthology/E17-1047.Google ScholarCross Ref
- [291] . 2017. TableQA: Question answering on tabular data. arXiv:1705.06504 [CS] (2017). http://arxiv.org/abs/1705.06504.Google Scholar
- [292] . 2019. Best practices for the human evaluation of automatically generated text. In Proceedings of the 12th International Conference on Natural Language Generation. 355–368. Google ScholarCross Ref
- [293] . 2002. Temporal order relations in language comprehension. Journal of Experimental Psychology. Learning, Memory, and Cognition 28, 4 (July 2002), 770–779.Google ScholarCross Ref
- [294] . 1983. Strategies of Discourse Comprehension. Academic Press, New York, NY.
P302 .D472 1983 Google Scholar - [295] . 2019. HEAD-QA: A healthcare dataset for complex reasoning. arXiv:1906.04701 [CS] (2019). http://arxiv.org/abs/1906.04701.Google Scholar
- [296] . 2000. Building a question answering test collection. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00). ACM, New York, NY, 200–207. Google ScholarDigital Library
- [297] . 2018. Trick me if you can: Adversarial writing of trivia challenge questions. In Proceedings of the Association for Computational Linguistics Student Research Workshop (ACL-SRW’18). 127–133. http://aclweb.org/anthology/P18-3018.Google ScholarCross Ref
- [298] . 2019. Universal adversarial triggers for attacking and analyzing NLP. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’19) http://arxiv.org/abs/1908.07125.Google Scholar
- [299] . 2020. ReCO: A large scale Chinese reading comprehension dataset on opinion. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). 8. https://www.aaai.org/Papers/AAAI/2020GB/AAAI-WangB.2547.pdf.Google ScholarCross Ref
- [300] . 2021. Improving question answering for event-focused questions in temporal collections of news articles. Information Retrieval Journal 24, 1 (Feb. 2021), 29–54. Google ScholarDigital Library
- [301] . 2022. ArchivalQA: A large-scale benchmark dataset for open domain question answering over historical news collections. arXiv:2109.03438 [CS]. Google ScholarCross Ref
- [302] . 2020. Text-to-SQL generation for question answering on electronic medical records. In Proceedings of the Web Conference 2020 (WWW’20). ACM, New York, NY, 350–361. Google ScholarDigital Library
- [303] . 2020. Developing dataset of Japanese slot filling quizzes designed for evaluation of machine reading comprehension. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’20). 6895–6901. https://www.aclweb.org/anthology/2020.lrec-1.852.Google Scholar
- [304] . 2018. Jack the Reader—A machine reading framework. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 25–30. Google ScholarCross Ref
- [305] . 2015. Towards AI-complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698 (2015).Google Scholar
- [306] . 2001. Monty Python and the Holy Grail.Google Scholar
- [307] . 2006. Learning for semantic parsing with statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. 439–446. https://www.aclweb.org/anthology/N06-1056.Google ScholarDigital Library
- [308] . 2022. QAConv: Question answering on informative conversations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 5389–5411. Google ScholarCross Ref
- [309] . 2019. TWEETQA: A social media focused question answering dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 5020–5031. Google ScholarCross Ref
- [310] . 2020. MATINF: A jointly labeled large-scale dataset for classification, question answering and summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 3586–3596. https://www.aclweb.org/anthology/2020.acl-main.330.Google ScholarCross Ref
- [311] . 2022. Fantastic questions and where to find them: FairytaleQA—An authentic dataset for narrative comprehension. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 447–460. Google ScholarCross Ref
- [312] . 2015. WikiQA: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 2013–2018. http://aclweb.org/anthology/D15-1237.Google ScholarCross Ref
- [313] . 2019. FriendsQA: Open-domain question answering on TV show transcripts. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue. 188–197. Google ScholarCross Ref
- [314] . 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the Conference on Empirical Methods in Nautral Language Processing (EMNLP’18). 2369–2380. http://aclweb.org/anthology/D18-1259.Google ScholarCross Ref
- [315] . 2019. A qualitative comparison of CoQA, SQuAD 2.0, and QuAC. In Proceedings of the 17th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 2318–2323. https://www.aclweb.org/anthology/papers/N/N19/N19-1241/.Google Scholar
- [316] . 2021. On the faithfulness measurements for model interpretations. arXiv:2104.08782 [CS] (2021). http://arxiv.org/abs/2104.08782.Google Scholar
- [317] . 2020. Towards data distillation for end-to-end spoken conversational question answering. arXiv:2010.08923 [CS, EESS] (2020). http://arxiv.org/abs/2010.08923.Google Scholar
- [318] . 2019. ReClor: A reading comprehension dataset requiring logical reasoning. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). https://openreview.net/forum?id=HJgJtT4tvB.Google Scholar
- [319] . 2018. SWAG: A large-scale adversarial dataset for grounded commonsense inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 93–104. http://aclweb.org/anthology/D18-1009.Google ScholarCross Ref
- [320] . 2019. HellaSwag: Can a machine really finish your sentence? In Proceedings of the Conference of the Association for Computational Linguistics (ACL’19). http://arxiv.org/abs/1905.07830.Google ScholarCross Ref
- [321] . 2021. SituatedQA: Incorporating extra-linguistic contexts into QA. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 7371–7387. Google ScholarCross Ref
- [322] . 2018. ReCoRD: Bridging the gap between human and machine commonsense reading comprehension. arXiv:1810.12885 [cs] (
oct 2018). arXiv:1810.12885 [cs] http://arxiv.org/abs/1810.12885.Google Scholar - [323] . 2020. When do you need billions of words of pretraining data? arXiv:2011.04946 [cs] (
nov 2020). arXiv:2011.04946 [cs] http://arxiv.org/abs/2011.04946.Google Scholar - [324] . 2018. One-shot learning for question-answering in gaokao history challenge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 449–461. https://www.aclweb.org/anthology/C18-1038.Google Scholar
- [325] . 2021. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning (ICML’21). http://arxiv.org/abs/2102.09690.Google Scholar
- [326] . 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103 [CS] (2017). http://arxiv.org/abs/1709.00103.Google Scholar
- [327] . 2019. “Going on a vacation” takes longer than “going for a walk”: A study of temporal commonsense understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP-IJCNLP’19). 3361–3367. Google ScholarCross Ref
- [328] . 2021. TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL’21). 3277–3287. Google ScholarCross Ref
- [329] . 2021. Retrieving and reading: A comprehensive survey on open-domain question answering. arXiv:2101.00774 [CS] (2021). http://arxiv.org/abs/2101.00774.Google Scholar
- [330] . 2017. Uncovering the temporal context for video question answering. International Journal of Computer Vision 124, 3 (Sept. 2017), 409–421. Google ScholarDigital Library
- [331] . 2020. Question answering with long multiple-span answers. In Findings of EMNLP’20. 3840–3849. Google ScholarCross Ref
- [332] . 2016. Situation models, mental simulations, and abstract concepts in discourse comprehension. Psychonomic Bulletin & Review 23, 4 (Aug. 2016), 1028–1034. Google ScholarCross Ref
Index Terms
- QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
Recommendations
SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis
Experimental IR Meets Multilinguality, Multimodality, and InteractionAbstractThe paper presents SberQuAD – a large Russian reading comprehension (RC) dataset created similarly to English SQuAD. SberQuAD contains about 50K question-paragraph-answer triples and is seven times larger compared to the next competitor. We ...
Common Difficulties of Reading Comprehension Experienced by Vietnamese Students
ICEMT '21: Proceedings of the 5th International Conference on Education and Multimedia TechnologyThe following paper explores the most challenging reading comprehension problems confronted by freshmen in EFL reading classes. Also, some possible solutions were suggested based on such challenges by the researcher. Fifty freshmen in a private ...
Children's reading comprehension and metacomprehension on screen versus on paper
AbstractOn-screen reading is becoming increasingly prevalent in educational settings, and children are now are expected to comprehend texts that they read on screens. However, research suggests that reading on screen impairs comprehension ...
Highlights- Reading on screen impaired children's comprehension compared to reading on paper.
Comments