当前位置: X-MOL 学术Mobile Netw. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification
Mobile Networks and Applications ( IF 3.8 ) Pub Date : 2021-01-07 , DOI: 10.1007/s11036-020-01699-w
Wei Xu , Vinh Truong Hoang

This paper takes education data mining as the research theme, mine the existing massive education big data, compares the analysis methods of existing data models, and proposes an improved random forest reference model. The information gain of various features is calculated by introducing the feature weighting system, and the evaluation index is used to improve the existing data analysis. The simulation results show that the improved model is highly efficient as compared to the existing models for classification. In order to resolve the performance bottleneck of a single node in multiple data classification tasks in the era of big data, a classification and prediction model of graduates’ large-scale employment data, based on distributed improved RF algorithm, is proposed. The MapReduce distributed computing framework is used to complete the serial writing and deserialization loading of the training model between the local disk and the distributed file system, and realizing the distributed expansion of the large-scale data classification model based on the improved RF model.



中文翻译:

基于MapReduce的改进型随机森林模型,用于大规模教育数据的处理和分类

本文以教育数据挖掘为研究主题,挖掘现有海量教育大数据,比较现有数据模型的分析方法,提出一种改进的随机森林参考模型。通过引入特征加权系统来计算各种特征的信息增益,并使用评估指标来改进现有的数据分析。仿真结果表明,与现有的分类模型相比,改进后的模型具有较高的效率。为了解决大数据时代多种数据分类任务中单个节点的性能瓶颈,提出了一种基于分布式改进RF算法的毕业生大规模就业数据分类预测模型。

更新日期:2021-01-07
down
wechat
bug