当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Gaussian mixture model based virtual sample generation approach for small datasets in industrial processes
Information Sciences Pub Date : 2021-09-14 , DOI: 10.1016/j.ins.2021.09.014
Ling Li 1, 2 , Seshu Kumar Damarla 3 , Yalin Wang 2 , Biao Huang 3
Affiliation  

Due to small-quantity and often imbalance of labeled samples, it is challenging to establish a robust and accurate prediction model through data-driven methods. To deal with the small dataset problem, new virtual samples may be generated via virtual sample generation (VSG) methods based on the trend of the original small raw dataset, thereby improving modeling performance. Effective VSG is desirable, but also challenging. Conventional VSG usually assumes that the raw sample set contains only a single operating mode. Taking multi-mode into account will improve the VSG based modeling performance since actual processes are often multi-mode. To this end, an information expansion function considering sample density and amount (IEDA) is first developed to expand the domain range of the attributes in this paper. Then, virtual samples under the multiple operating mode condition are generated by proposing a Gaussian mixture model based virtual sample generation (GMMVSG) method. Applications of GMMVSG on Tennessee Eastman benchmark process and an industrial hydrocracking process show significant improvement of modeling and predictions over other conventional VSG methods.



中文翻译:

基于高斯混合模型的工业过程小数据集虚拟样本生成方法

由于标记样本数量少且经常不平衡,因此通过数据驱动的方法建立稳健且准确的预测模型具有挑战性。针对小数据集问题,可以根据原始小原始数据集的趋势,通过虚拟样本生成(VSG)方法生成新的虚拟样本,从而提高建模性能。有效的 VSG 是可取的,但也具有挑战性。传统的 VSG 通常假设原始样本集仅包含一种操作模式。考虑多模式将提高基于 VSG 的建模性能,因为实际过程通常是多模式的。为此,本文首先开发了考虑样本密度和数量的信息扩展函数(IEDA)来扩展属性的域范围。然后,提出一种基于高斯混合模型的虚拟样本生成(GMMVSG)方法,生成多工况条件下的虚拟样本。GMMVSG 在田纳西伊士曼基准过程和工业加氢裂化过程中的应用表明,与其他传统 VSG 方法相比,建模和预测有了显着改进。

更新日期:2021-09-14
down
wechat
bug