当前位置: X-MOL 学术Nat. Lang. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mining, analyzing, and modeling text written on mobile devices
Natural Language Engineering ( IF 2.3 ) Pub Date : 2019-10-10 , DOI: 10.1017/s1351324919000548
K. Vertanen , P.O. Kristensson

We present a method for mining the web for text entered on mobile devices. Using searching, crawling, and parsing techniques, we locate text that can be reliably identified as originating from 300 mobile devices. This includes 341,000 sentences written on iPhones alone. Our data enables a richer understanding of how users type “in the wild” on their mobile devices. We compare text and error characteristics of different device types, such as touchscreen phones, phones with physical keyboards, and tablet computers. Using our mined data, we train language models and evaluate these models on mobile test data. A mixture model trained on our mined data, Twitter, blog, and forum data predicts mobile text better than baseline models. Using phone and smartwatch typing data from 135 users, we demonstrate our models improve the recognition accuracy and word predictions of a state-of-the-art touchscreen virtual keyboard decoder. Finally, we make our language models and mined dataset available to other researchers.

中文翻译:

挖掘、分析和建模在移动设备上编写的文本

我们提出了一种在网络上挖掘在移动设备上输入的文本的方法。使用搜索、爬网和解析技术,我们可以找到可以可靠识别为来自 300 个移动设备的文本。这包括仅在 iPhone 上编写的 341,000 个句子。我们的数据使我们能够更深入地了解用户如何在他们的移动设备上“在野外”输入内容。我们比较了不同设备类型的文本和错误特征,例如触摸屏手机、带有物理键盘的手机和平板电脑。使用我们挖掘的数据,我们训练语言模型并在移动测试数据上评估这些模型。在我们的挖掘数据、Twitter、博客和论坛数据上训练的混合模型比基线模型更好地预测移动文本。使用来自 135 位用户的手机和智能手表输入数据,我们展示了我们的模型提高了最先进的触摸屏虚拟键盘解码器的识别准确性和单词预测。最后,我们将我们的语言模型和挖掘的数据集提供给其他研究人员。
更新日期:2019-10-10
down
wechat
bug