Big data analytics in Industry 4.0 ecosystems,Software: Practice and Experience

当前位置： X-MOL 学术 › Softw. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Big data analytics in Industry 4.0 ecosystems
Software: Practice and Experience ( IF 2.6 ) Pub Date : 2021-06-11 , DOI: 10.1002/spe.3008
Gagangeet Singh Aujla ₁ , Radu Prodan ₂ , Danda B. Rawat ₃

Affiliation

The emergence of advanced technologies has triggered a sweeping digital transformation in the industrial ecosystem. The cutting-edge technologies (like, Internet of things, big data, artificial intelligence, drones, cyber-physical systems, and augmented reality, and computer vision) are key enablers of this industrial revolution. Industry 4.0 has reshaped the conventional manufacturing and production processes into automated operations and workflows. This industrial transition is fueled by advanced computing (cloud and edge computing), analytic (big data analytics and computational analytics), intelligent (machine and deep learning), and communication (programmable and intelligent networks) infrastructure and technologies. The collection, aggregation, analysis, and processing of big data generated from the industrial periphery (like manufacturing equipment and maintenance systems) enable real-time decision-making and autonomous opportunities. However, the voluminous size, variability, and frequency of this data bring a wide array of disputes and oppositions in the resource-limited Industrial systems. Moreover, the continuous decision-making workflow in production and manufacturing segments increases the sharing of data across different functions, systems, and organizational boundaries. For this reason, cloud computing and big data technologies (Hadoop and Map-Reduce) can improve the anticipated response and reaction times.

Industry 4.0 will lead toward more devices enriched with embedded computing platforms which boost the capabilities of the overall workflow. But, this also leads towards an increased communication and interaction between these devices which can end up in various challenges for the underlying network infrastructure. However, the conventional communication protocols may end up in various performance bottlenecks which in turn can increase the threat from different kinds of attacks and security challenges. Concluding the above discussion, the industrial ecosystem would rely on two entities: (1) users or infrastructure (physical world) and (2) cloud-enabled algorithms and autonomous systems (virtual world) that are connected through advanced and autonomous communication technologies. The driving force behind the success of these industrial ecosystems relies on the efficient gathering/collection, analysis, and storage of data generated by smart devices and sensors. Under this domain, big data analytics is set to be driving predictive manufacturing and provide timely detection of anomalies and system failures to predict product quality. In this way, big data is bound to play a prominent role in driving the industrial ecosystem. Even more, the only reason for this concern is not limited to the volume of data but the major concern is the contribution of this data for the design and implementation of efficient industrial processes and policies. The interpretation and understanding of the available data help to the design of efficient processes and policies related to industrial systems.

The focus of this special issue is to present novel and seminal contributions around the important issues and challenges related to big data management and analytics for industrial 4.0 ecosystems. It provides ground-breaking research from academia and industry, that emphasizes the novel solutions, applications, tools, software, and algorithms designed to handle the industrial big data. A substantial number of submissions were received for the special issue. The papers were reviewed by at least three reviewers and underwent a rigorous two rounds of reviews. After the completion of the peer review process, we have accepted 10 seminal contributions related to big data analytics for Industry 4.0. All the accepted papers either discuss the recent solutions related to big data analytics or proposes an innovative way of handling big data across diverse infrastructure deployments. The outline of these contributions discussed below.

The first paper titled “An Efficient Scheme for Secure Feature Location using Data Fusion and Data Mining in IOT Environment” by Balaji et al.¹ proposes a secure feature location approach based on data fusion and data mining to overcome the challenges of the existing textual and dynamic approaches. The first step in this approach involves the removal of repeated test cases followed by the selection of important attributes. The artificial flora optimization algorithm was used to remove the repeated test cases. After this, the Caesar Cipher-RSA algorithm was used to encrypt the selected attributes, and thereafter a score value was assigned to them. This score value acts as an input to the K-mean algorithm to normalize it using the min-max approach. The evaluation results show that the proposed approach is superior in comparison to existing variants.

The second paper titled “Data Dimensionality Reduction Techniques for Industry 4.0: Research Results, Challenges, and Future Research Directions” by Chhikara et al.² provides a comprehensive survey on dimensionality reduction techniques. This survey discussed various data dimensionality techniques, analyzed them, and provided a thorough comparison based on different factors and parameters. The survey provided an understanding of the applicability of dimensionality reduction techniques in group or stand-alone use cases.

The third paper titled “Deep-Q Learning-based Heterogeneous Earliest Finish Time Scheduling Algorithm for Scientific Workflows in Cloud” by Kaur et al.³ proposes a workflow scheduling approach wherein a deep-Q learning mechanism is used. This deep-Q mechanism was based on a heterogeneous earliest-finish-time algorithm was designed to amalgamate the deep learning approach with the heuristic approach for task scheduling. The evaluations were performed on a workflow simulator and the results depict the superiority of the proposed approach in contrast to the existing algorithms in terms of makespan and speed.

The fourth paper titled “A multi-domain VNE algorithm based on multi-objective optimization for IoD architecture in Industry 4.0” by Zhang et al.⁴ proposes a multidomain virtual network embedding algorithm to improve the performance and reduce the computational delay. This algorithm is based on centralized hierarchical architecture and avoids local optimum by improvising the particle swarm optimization algorithm to include a genetic variation factor. However, as the problem is composed of multiple objectives, the proposed work simplifies the same by decomposing it into a single-objective problem using a weighted summation method. According to the obtained results, the proposed approach converges to an optimal solution quickly. Further, a candidate selection algorithm was proposed to reduce the cost associated with mapping. In this algorithm, the physical domain calculates the mapping cost for all nodes and selects the one with the lowest mapping cost. The results show the efficiency of the proposed approach in terms of delay, cost, and several other performance indicators.

The fifth paper titled “A Community-based Hierarchical User Authentication Scheme for Industry 4.0” by Sinha et al.⁵ proposes a community-based hierarchical approach that is used to decide the way to provide access rights of the smart end devices to the users in the Industry 4.0 ecosystem. The hierarchical structure helps to ensure that only the legitimate users get access rights after clearing the multilevel authorization process. This approach also ensures identity leakage as the legitimate parties coordinate closely with each other for the authentication process. The validation shows that the proposed approach is susceptible to various types of attacks.

The sixth paper titled “PSSCC: Provably Secure Communication Framework for Crowdsourced Industrial Internet of Things Environments” by Dharminder et al.⁶ proposes an identity-based signcryption method in the provably secure communication framework. During signcryption, the end-user performs pairing-free computation that proves to be computationally efficient. Based on the modified bilinear Diffie-Hellman inversion and strong Diffie-Hellman problems, the framework is proved to be secure in the Industrial Internet of Things environment. The evaluation was performed based on communication and computation cost and the results look very promising.

The seventh paper titled “Applying Artificial Bee Colony Algorithm to the Multi-depot Vehicle Routing Problem” by Gu et al.⁷ utilizes an artificial bee colony algorithm in multidepot Vehicle Routing Problem to manage the vehicular routes among multiple depots in an optimized and time-efficient manner. Initially, the multidepot Vehicle Routing Problem is decomposed single-depot problem using depot clustering. After this, a modified artificial bee colony algorithm is used to generate solutions for each depot. In the end, a coevolution strategy is proposed to realize a complete solution to the multi-depot Vehicle Routing Problem. The proposed algorithm was validated through extensive experiments and the results were compared with greedy and genetic algorithms based on different parameters. The results depict a performance enhancement to the tune 70% over the greedy algorithm and 3% over the genetic algorithm.

The eighth paper titled “An IoT-enabled Decision Support System for Circular Economy Business Model” by Mboli et al.⁸ proposes a decision support system based on the Internet of Things for a circular economy business model. This system is based on an ontological model that effectively allows to predict, track, and monitor the residual value of the product. This allows businesses to utilize circularity decisions complemented by a semantic decision support system to create a first of its kind, semantic ontological model. The proposed model was validated based on real-world use case scenario to understand viability and applicability.

The ninth paper titled “Security Analytics for Real-Time Forecasting of Cyberattacks” by Javed et al.⁹ proposes a pattern identification framework for cyberthreats. After identification of the cyber patterns a forecasting model suggests the pattern of growth in an emerging network threat. This framework predicts the maximum threat intensity and its occurrence over the period thereby suggesting the likelihood of maximum intensity. The framework involves four steps, (1) continuous activity monitoring, (2) behavior forecasting, (3) estimating the intensity of a potential cyberattack, and (4) predicting the potential risk of cyber attacks over a predefined time window. The validation depicts an average lead time of 1.75 h good enough to limit the potential impact of the attack.

The tenth paper titled “An Efficient Hadoop based Brain Tumor Detection Framework using Big Data Analytic” by Chahal et al.¹⁰ proposes a brain tumor segmentation approach based on a hybrid weighted fuzzy mechanism. This approach works in tandem with the Matlab Distributed Computing Server and Hadoop to fuzify the pixel values to create meaningful clusters of large data. The approach is validated based on huge MR brain data across clusters of varying sized DICOM datasets using hybrid fuzzy clustering in MapReduce on Hadoop. The experiments performed compared the read, write, and processing time on each node. The outcomes show an elevation in the read and write operation time with an increase in the data size to multinode. The processing time comes out to be 35 min and 3.4 min on single and three-node clusters, respectively. Further increasing the data size to 7.3 GB, the proposed approach process the data in 235.4 min and 2085.2 min for the three-node cluster and single node, respectively.

We hope that the seminal research contributions and findings presented in this special issue would benefit the readers to enhance their knowledge base and encourage them to work on various aspects of big data analytics.

中文翻译：

工业 4.0 生态系统中的大数据分析

先进技术的出现引发了工业生态系统的全面数字化转型。尖端技术（如，事物互联网，大数据，人工智能，无人机，网络物理系统和增强现实以及计算机视觉）是关键这场工业革命的推动者。工业 4.0 将传统的制造和生产流程重塑为自动化操作和工作流程。先进计算（云和边缘计算）、分析（大数据分析和计算分析）、智能（机器和深度学习）和通信（可编程和智能网络）基础设施和技术推动了这种产业转型。从工业外围（如制造设备和维护系统）生成的大数据的收集、聚合、分析和处理能够实现实时决策和自主机会。然而，这些数据的庞大规模、可变性和频率在资源有限的工业系统中带来了广泛的争议和反对意见。而且，生产和制造环节的持续决策工作流程增加了跨不同职能、系统和组织边界的数据共享。因此，云计算和大数据技术（Hadoop 和 Map-Reduce）可以提高预期响应和响应时间。

工业 4.0 将导致更多设备配备嵌入式计算平台，从而提高整体工作流程的能力。但是，这也导致这些设备之间的通信和交互增加，最终可能会给底层网络基础设施带来各种挑战。然而，传统的通信协议可能最终会遇到各种性能瓶颈，这反过来又会增加来自不同类型攻击的威胁和安全挑战。总结上述讨论，工业生态系统将依赖于两个实体：（1）用户或基础设施（物理世界）和（2）通过先进和自主通信技术连接的支持云的算法和自治系统（虚拟世界）。这些工业生态系统成功背后的驱动力依赖于智能设备和传感器生成的数据的高效收集/收集、分析和存储。在这个领域，大数据分析将推动预测性制造，并及时检测异常和系统故障以预测产品质量。这样一来，大数据必将在驱动产业生态系统中发挥突出作用。更重要的是，这种担忧的唯一原因不仅限于数据量，主要关注的是这些数据对设计和实施高效工业流程和政策的贡献。对可用数据的解释和理解有助于设计与工业系统相关的有效流程和政策。分析和存储智能设备和传感器产生的数据。在这个领域，大数据分析将推动预测性制造，并及时检测异常和系统故障以预测产品质量。这样一来，大数据必将在驱动产业生态系统中发挥突出作用。更重要的是，这种担忧的唯一原因不仅限于数据量，主要关注的是这些数据对设计和实施高效工业流程和政策的贡献。对可用数据的解释和理解有助于设计与工业系统相关的有效流程和政策。分析和存储智能设备和传感器产生的数据。在这个领域，大数据分析将推动预测性制造，并及时检测异常和系统故障以预测产品质量。这样一来，大数据必将在驱动产业生态系统中发挥突出作用。更重要的是，这种担忧的唯一原因不仅限于数据量，主要关注的是这些数据对设计和实施高效工业流程和政策的贡献。对可用数据的解释和理解有助于设计与工业系统相关的有效流程和政策。大数据分析将推动预测性制造，并及时检测异常和系统故障以预测产品质量。这样一来，大数据必将在驱动产业生态系统中发挥突出作用。更重要的是，这种担忧的唯一原因不仅限于数据量，主要关注的是这些数据对设计和实施高效工业流程和政策的贡献。对可用数据的解释和理解有助于设计与工业系统相关的有效流程和政策。大数据分析将推动预测性制造，并及时检测异常和系统故障以预测产品质量。这样一来，大数据必将在驱动产业生态系统中发挥突出作用。更重要的是，这种担忧的唯一原因不仅限于数据量，主要关注的是这些数据对设计和实施高效工业流程和政策的贡献。对可用数据的解释和理解有助于设计与工业系统相关的有效流程和政策。这种担忧的唯一原因不仅限于数据量，主要关注的是这些数据对设计和实施高效工业流程和政策的贡献。对可用数据的解释和理解有助于设计与工业系统相关的有效流程和政策。这种担忧的唯一原因不仅限于数据量，主要关注的是这些数据对设计和实施高效工业流程和政策的贡献。对可用数据的解释和理解有助于设计与工业系统相关的有效流程和政策。

本期特刊的重点是围绕与工业 4.0 生态系统的大数据管理和分析相关的重要问题和挑战提出新颖和开创性的贡献。它提供来自学术界和工业界的突破性研究，强调旨在处理工业大数据的新颖解决方案、应用程序、工具、软件和算法。本期特刊收到了大量投稿。这些论文至少经过了三名审稿人的审阅，并经过了严格的两轮审阅。在完成同行评审过程后，我们接受了与工业 4.0 大数据分析相关的 10 项开创性贡献。所有被接受的论文要么讨论了与大数据分析相关的最新解决方案，要么提出了一种跨不同基础设施部署处理大数据的创新方法。下面讨论这些贡献的概要。

Balaji 等人的第一篇论文题为“ An Efficient Scheme for Secure Feature Location using Data Fusion and Data Mining in IOT Environment ”。¹提出了一种基于数据融合和数据挖掘的安全特征定位方法，以克服现有文本和动态方法的挑战。这种方法的第一步涉及删除重复的测试用例，然后选择重要的属性。采用人工菌群优化算法去除重复的测试用例。之后，使用 Caesar Cipher-RSA 算法对选定的属性进行加密，然后为其分配一个分值。该分数值作为 K-mean 算法的输入，以使用 min-max 方法对其进行归一化。评估结果表明，与现有变体相比，所提出的方法更优越。

Chhikara 等人的第二篇论文题为“工业 4.0 的数据降维技术：研究结果、挑战和未来研究方向”。^{图 2}提供了关于降维技术的全面调查。本次调查讨论了各种数据维度技术，对其进行了分析，并根据不同的因素和参数进行了全面的比较。该调查提供了对降维技术在组或独立用例中的适用性的理解。

Kaur等人的第三篇论文标题为“基于深度学习的异构最早结束时间调度算法”。^图3提出了一种工作流程调度方法，其中使用深度Q学习机制。这种深度Q机制基于异构最早的结束 - 时间算法旨在与任务调度的启发式方法合并。在工作流模拟器上进行评估，结果描述了与现有算法相比，与现有算法相反的方法的优势。

第四篇论文标题为“基于IOD架构的多目标优化的多域VNE算法，由Zhang等人通过Zhang等人。⁴提出多群体虚拟网络嵌入算法来提高性能并降低计算延迟。该算法基于集中分层架构，通过提高粒子群优化算法来包括遗传变异因子来避免局部最佳。然而，由于问题由多个目标组成，所提出的工作通过使用加权求和方法将其分解成单个客观问题来简化相同的操作。根据获得的结果，所提出的方法可以快速收敛到最优解。此外，提出了一种候选选择算法来降低与映射相关的成本。在该算法中，物理域计算所有节点的映射成本，并选择具有最低映射成本的映射成本。

Sinha 等人的第五篇论文题为“ A Community-based Hierarchical User Authentication Scheme for Industry 4.0 ”。⁵提出了一种基于社区的分层方法，用于决定向工业 4.0 生态系统中的用户提供智能终端设备访问权限的方式。层次结构有助于确保在清除多级授权过程后，只有合法用户才能获得访问权限。这种方法还确保了身份泄露，因为合法方在身份验证过程中相互密切协调。验证表明，所提出的方法容易受到各种类型的攻击。

Dharminder 等人的第六篇论文题为“ PSSCC: Provably Secure Communication Framework for Crowdsourced Industrial Internet of Things Environments ”。^图6在可证明安全的通信框架中提出了一种基于身份的签密方法。在签密期间，最终用户执行证明计算效率高的无配对计算。基于改进的双线性 Diffie-Hellman 反演和强 Diffie-Hellman 问题，证明该框架在工业物联网环境中是安全的。评估是基于通信和计算成本进行的，结果看起来很有希望。

第七篇论文标题为“将人为蜜蜂群算法应用于多仓库车道算法”，GU等人。⁷在多派车辆路由问题中利用人造蜂菌落算法，以优化和节省的方式管理多个仓库中的车辆路线。最初，多站点车辆路径问题是使用站点聚类分解的单站点问题。在此之后，使用修改后的人工蜂群算法为每个站点生成解决方案。最后，提出了一种协调策略来实现对多仓库车辆路由问题的完整解决方案。通过广泛的实验验证了所提出的算法，并将结果与基于不同参数的贪婪和遗传算法进行比较。结果描述了对贪婪算法的曲调70％的性能增强，并且在遗传算法上的3％。

Mboli 等人的第八篇论文题为“ An IoT-enabled Decision Support System for Circular Economy Business Model ”。⁸提出了基于物联网的循环经济商业模式的决策支持系统。该系统基于一个本体模型，可以有效地预测、跟踪和监控产品的剩余价值。这允许企业利用由语义决策支持系统补充的循环决策来创建同类中的第一个语义本体模型。所提出的模型根据实际用例场景进行了验证，以了解可行性和适用性。

Javed 等人的第九篇论文题为“用于实时预测网络攻击的安全分析”。^图9提出了网络威胁的模式识别框架。在识别网络模式后，预测模型会显示新兴网络威胁的增长模式。该框架预测最大威胁强度及其在该期间的发生，从而表明最大强度的可能性。该框架包括四个步骤，(1) 持续活动监控，(2) 行为预测，(3) 估计潜在网络攻击的强度，以及 (4) 在预定义的时间窗口内预测网络攻击的潜在风险。验证描述了 1.75 小时的平均前置时间足以限制攻击的潜在影响。

Chahal等人的第十篇论文标题为“使用大数据分析的基于Hadoop基础的脑肿瘤检测框架”。¹⁰提出了一种基于混合加权模糊机制的脑肿瘤分割方法。这种方法与 Matlab 分布式计算服务器和 Hadoop 协同工作，以模糊像素值以创建有意义的大数据集群。该方法基于在不同大小的 DICOM 数据集的集群中使用混合模糊聚类在 Hadoop 上的 MapReduce 中的大量 MR 大脑数据进行验证。进行的实验比较了每个节点上的读取、写入和处理时间。结果显示，随着多节点数据大小的增加，读写操作时间增加。单节点和三节点集群的处理时间分别为 35 分钟和 3.4 分钟。进一步将数据大小增加到 7.3 GB，所提出的方法在 235.4 分钟和 2085 分钟内处理数据。

我们希望本期特刊中提出的开创性研究贡献和发现能够使读者受益，以增强他们的知识基础，并鼓励他们致力于大数据分析的各个方面。

更新日期：2021-06-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文