Improving Text Analysis Using Sentence Conjunctions and Punctuation,Marketing Science

当前位置： X-MOL 学术 › Marketing Science › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving Text Analysis Using Sentence Conjunctions and Punctuation
Marketing Science ( IF 4.0 ) Pub Date : 2020-07-01 , DOI: 10.1287/mksc.2019.1214
Joachim Büschken ₁ , Greg M. Allenby ₂

Affiliation

User generated content in the form of customer reviews, blogs or tweets is an emerging and rich source of data for marketers. Topic models have been successfully applied to such data, demonstrating that empirical text analysis benefits greatly from a latent variable approach which summarizes high-level interactions among words. We propose a new topic model that allows for serial dependency of topics in text. That is, topics may carry over from word to word in a document, violating the bag-of-words assumption in traditional topic models. In our model, topic carry-over is informed by sentence conjunctions and punctuation. Typically, such observed information is eliminated prior to analyzing text data (i.e., “pre-processing”) because words such as “and” and “but” do not differentiate topics. We find that these elements of grammar contain information relevant to topic changes. We examine the performance of our model using multiple data sets and estab- lish boundary conditions for when our model leads to improved inference about customer evaluations. Implications and opportunities for future research are discussed.

中文翻译：

使用句子连词和标点符号改进文本分析

以客户评论，博客或推文的形式生成的用户生成的内容是针对营销人员的新兴且丰富的数据源。主题模型已成功应用于此类数据，表明经验文本分析受益于潜在变量方法，该方法总结了词之间的高级交互。我们提出了一个新的主题模型，该模型允许文本中主题的序列依赖性。也就是说，主题可能会在文档中的各个词之间残留下来，这违反了传统主题模型中的惯用语。在我们的模型中，话题结转是通过句子连词和标点符号告知的。通常，在分析文本数据（即“预处理”）之前会消除此类观察到的信息，因为诸如“和”和“但是”之类的词不会区分主题。我们发现语法的这些元素包含与主题更改相关的信息。我们使用多个数据集并建立边界条件来检验模型的性能，以便确定模型何时可以改进对客户评估的推断。讨论了未来研究的意义和机会。

更新日期：2020-07-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文