当前位置: X-MOL 学术EPJ Data Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimating educational outcomes from students’ short texts on social media
EPJ Data Science ( IF 3.0 ) Pub Date : 2020-09-01 , DOI: 10.1140/epjds/s13688-020-00245-8
Ivan Smirnov

Digital traces have become an essential source of data in social sciences because they provide new insights into human behavior and allow studies to be conducted on a larger scale. One particular area of interest is the estimation of various users’ characteristics from their texts on social media. Although it has been established that basic categorical attributes could be effectively predicted from social media posts, the extent to which it applies to more complex continuous characteristics is less understood. In this research, we used data from a nationally representative panel of students to predict their educational outcomes measured by standardized tests from short texts on a popular Russian social networking site VK. We combined unsupervised learning of word embeddings on a large corpus of VK posts with a simple, supervised model trained on individual posts. The resulting model was able to distinguish between posts written by high- and low-performing students with an accuracy of 94%. We then applied the model to reproduce the ranking of 914 high schools from 3 cities and of the 100 largest universities in Russia. We also showed that the same model could predict academic performance from tweets as well as from VK posts. Finally, we explored predictors of high and low academic performance to obtain insights into the factors associated with different educational outcomes.

中文翻译:

根据学生在社交媒体上的简短文字估算教育成果

数字踪迹已成为社会科学中必不可少的数据来源,因为它们为人类行为提供了新见解,并允许进行更大规模的研究。一个特别感兴趣的领域是根据社交媒体上他们的文本估算各种用户的特征。尽管已经确定可以从社交媒体帖子中有效预测基本类别属性,但是对于适用于更复杂的连续特征的程度了解甚少。在这项研究中,我们使用了来自全国范围内有代表性的学生小组的数据来预测他们的教育成果,这些成果是通过俄罗斯流行的社交网站VK上的短文通过标准化测试测得的。我们将无监督学习大型VK帖子集上的词嵌入与简单,在个别岗位上训练有监督的模型。由此产生的模型能够以94%的准确度区分高表现和低表现学生所写的帖子。然后,我们使用该模型重现了来自3个城市的914所中学和俄罗斯100所最大大学的排名。我们还表明,相同的模型可以根据推文以及VK帖子预测学习成绩。最后,我们探索了学习成绩高低的预测因素,以深入了解与不同教育成果相关的因素。我们还表明,相同的模型可以根据推文以及VK帖子预测学习成绩。最后,我们探索了学习成绩高低的预测因素,以深入了解与不同教育成果相关的因素。我们还表明,相同的模型可以根据推文以及VK帖子预测学习成绩。最后,我们探索了学习成绩高低的预测因素,以深入了解与不同教育成果相关的因素。
更新日期:2020-09-01
down
wechat
bug