Abstract
In the field of machine learning, offline training and online training occupy the same important position because they coexist in many real applications. The extreme learning machine (ELM) has the characteristics of fast learning speed and high accuracy for offline training, and online sequential ELM (OS-ELM) is a variant of ELM that supports online training. With the explosive growth of data volume, running these algorithms on distributed computing platforms is an unstoppable trend, but there is currently no efficient distributed framework to support both ELM and OS-ELM. Apache Flink is an open-source stream-based distributed platform for both offline processing and online data processing with good scalability, high throughput, and fault-tolerant ability, so it can be used to accelerate both ELM and OS-ELM. In this paper, we first research the characteristics of ELM, OS-ELM and distributed computing platforms, then propose an efficient stream-based distributed framework for both ELM and OS-ELM, named ELM-SDF, which is implemented on Flink. We then evaluate the algorithms in this framework with synthetic data on distributed cluster. In summary, the advantages of the proposed framework are highlighted as follows. (1) The training speed of FLELM is always faster than ELM on Hadoop and Spark, and its scalability behaves better as well. (2) Response time and throughput of FLOS-ELM achieve better performance than OS-ELM on Hadoop and Spark when the incremental training samples arrive. (3) The response time and throughput of FLOS-ELM behave better in native-stream processing mode when the incremental data samples are continuously arriving.
Similar content being viewed by others
References
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. IEEE Int Joint Conf Neural Netw 2:985–990
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Huang G, Huang GB, Song S, You K (2015) Trends in extreme learning machines: a review. Neural Netw Off J Int Neural Netw Soc 61(C):32–48
Ding S, Zhao H, Zhang Y, Xinzheng X, Nie R (2015) Extreme learning machine: algorithm, theory and applications. Artif Intell Rev 44(1):103–115
Wang Y, Cao F, Yuan Y (2011) A study on effectiveness of extreme learning machine *. Neurocomputing 74(16):2483–2490
Zhang R, Lan Y, Huang GB, Zong Ben X (2012) Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans Neural Netw Learn Syst 23(2):365–371
Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Guang Bin Huang and Lei Chen (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16):3460–3468
Liang N, Huang G, Saratchandran P, Sundararajan N (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423
Rong HJ, Huang GB, Sundararajan N, Saratchandran P (2009) Online sequential fuzzy extreme learning machine for function approximation and classification problems. IEEE Trans Syst Man Cybern Part B 39(4):1067–1072
Zhao J, Wang Z, Dong SP (2012) Online sequential extreme learning machine with forgetting mechanism. Neurocomputing 87(15):79–89
Wang X, Han M (2014) Online sequential extreme learning machine with kernels for nonstationary time series prediction. Neurocomputing 145(145):90–97
Scardapane S, Comminiello D, Scarpiniti M, Uncini A (2015) Online sequential extreme learning machine with kernels. IEEE Trans Neural Netw Learn Syst 26(9):2214–2220
Dong X, Li B, Zhang S (2018) An online sequential multiple hidden layers extreme learning machine method with forgetting mechanism. Chemom Intell Lab Syst 176:126–133
Ding S, Zhang N, Zhang J, Xinzheng X, Shi Z (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8(2):587–595
Zhang N, Ding S (2017) Unsupervised and semi-supervised extreme learning machine with wavelet kernel for high dimensional data. Memet Comput 9(2):129–139
Zhang N, Ding S, Zhang J (2016) Multi layer elm-rbf for multi-label learning. Appl Soft Comput 43(C):535–545
Ding S, Zhang N, Xinzheng X, Guo L, Zhang J (2015) Deep extreme learning machine and its application in eeg classification. Math Prob Eng. https://doi.org/10.1155/2015/129021
Xi ZW, Tianlun Z, Ran W (2017) Noniterative deep learning: Incorporating restricted boltzmann machine into multilayer random weight neural networks. In: IEEE transactions on systems man and cybernetics systems, pp 1–10
Zhang J, Ding S, Zhang N, Shi Z (2016) Incremental extreme learning machine based on deep feature embedded. Int J Mach Learn Cybern 7(1):111–120
Cao K, Wang G, Han D, Ning J, Zhang X (2015) Classification of uncertain data streams based on extreme learning machine. Cogn Comput 7(1):150–160
Shuliang X, Wang J (2017) Dynamic extreme learning machine for data stream classification. Neurocomputing 238(C):433–449
Bi X, Zhao X, Ma W, Zhang Z, Heng Z (2016) Record linkage for event identification in xml feeds stream using elm. Proc ELM 1:463–476
Asterios K, Sebastian S (2016) Apache flink: Stream analytics at scale. In: IEEE International Conference on Cloud Engineering Workshop, pp 193–193
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Matei Z, Mosharaf C, Michael JF, Scott S, Ion S (2010) Spark: Cluster computing with working sets. In: Proceedings of the 2Nd USENIX conference on hot topics in cloud computing, pp 10–10
He Q, Shang T, Zhuang F, Shi Z (2013) Parallel extreme learning machine for regression based on mapreduce. Neurocomputing 102(2):52–58
Xin J, Wang Z, Chen C, Ding L, Wang G, Zhao Y (2014) Elm * : distributed extreme learning machine with mapreduce. World Wide Web-Internet Web Inf Syst 17(5):1189–1204
Liu T, Fang Z, Chen Z, Zhou Y (2016) Parallelization of a series of extreme learning machine algorithms based on spark. In: International conference on computer and information science. https://doi.org/10.1109/ICIS.2016.7550906
Huang S, Wang B, Chen Y, Wang G, Ge Y (2016) An efficient parallel method for batched os-elm training using mapreduce. Memet Comput 9(3):1–15
Deng S, Wang B, Huang S, Yue C, Zhou J, Wang G (2017) Self-adaptive framework for efficient stream data classification on storm. IEEE Trans Syst Man Cybern Syst 50(1):123–136
Sun Y, Yuan Y, Wang G (2011) An os-elm based distributed ensemble classification framework in p2p networks. Neurocomputing 74(16):2438–2443
Ning K, Liu M, Dong M (2015) A new robust elm method based on a bayesian framework with heavy-tailed distribution and weighted likelihood function. Neurocomputing 149(B):891–903
Bi X, Zhao X, Wang G, Zhang P, Wang C (2015) Distributed extreme learning machine with kernels based on mapreduce. Neurocomputing 149(A):456–463
Xin J, Wang Z, Luxuan Q, Wang G (2015) Elastic extreme learning machine for big data classification. Neurocomputing 149(A):464–471
Sarwar JM, Juwel R, Marcelo M (2016) Open source initiatives and frameworks addressing distributed real-time data analytics. In: IEEE international parallel and distributed processing symposium workshops, pp 1481–1484
Banerjee KS (1971) Generalized inverse of matrices and its applications. Technometrics 15(1):197–197
Zhao YP (2016) Parsimonious kernel extreme learning machine in primal via cholesky factorization. Neural Netw 80:95–109
Deng WY, Bai Z, Huang GB, Zheng QH (2016) A fast svd-hidden-nodes based extreme learning machine for large-scale data analytics. Neural Netw 77(1):14–28
Wang B, Huang S, Qiu J, Liu Y, Wang G (2015) Parallel online sequential extreme learning machine based on mapreduce. Neurocomputing 149(A):224–232
Akusok A, Bjork KM, Miche Y, Lendasse A (2015) High performance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3:1011–1025
Acknowledgements
This research is partially funded by the National Key Research and Development Program of China (Grant No. 2016YFC1401900), the National Natural Science Foundation of China (Grant Nos. 61872072, 61572119, 61572121, 61622202, 61732003, 61729201, 61702086, and U1401256), the Fundamental Research Funds for the Central Universities (Grant Nos. N171604007, and N171904007), the Natural Science Foundation of Liaoning Province (Grant No. 20170520164), and the China Postdoctoral Science Foundation (Grant No. 2018M631806).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Informed consent
Informed consent was obtained from all individual participants.
Human and Animal Rights
This article does not contain any studies involving human participants and/or animals by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ji, H., Wu, G. & Wang, G. Accelerating ELM training over data streams. Int. J. Mach. Learn. & Cyber. 12, 87–102 (2021). https://doi.org/10.1007/s13042-020-01158-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01158-8