Big Data Research ( IF 3.5 ) Pub Date : 2020-12-16 , DOI: 10.1016/j.bdr.2020.100181 Daniel Bauer , Florian Froese , Luis Garcés-Erice , Chris Giblin , Abdel Labbi , Zoltán A. Nagy , Niels Pardon , Sean Rooney , Peter Urbanetz , Pascal Vetsch , Andreas Wespi
Over the last three years we have been running a large-scale data processing platform for applying analytics to corporate data at scale on an OpenStack private cloud instance. Our platform makes a wide variety of corporate data assets, such as sales, marketing, customer information, as well as data from less conventional sources such as weather, news and social media available for analytics purposes to hundreds of globally distributed teams across the company. We control every layer in the stack from the processing engines down to the hardware. Here we report our experiences in building and operating such a system. We describe our technical choices and describe how they evolved as we observed the actual workloads created by users.
中文翻译:
建立和运行大型企业数据分析平台
在过去三年中,我们一直在运行一个大型数据处理平台,用于在OpenStack私有云实例上将分析应用于企业数据。我们的平台可将广泛的公司数据资产(例如销售,市场营销,客户信息)以及来自不太常规的来源(例如天气,新闻和社交媒体)的数据用于分析目的,以供整个公司范围内的数百个全球分布的团队使用。我们控制堆栈中的每一层,从处理引擎到硬件。在这里,我们报告我们在构建和运行这样的系统方面的经验。我们描述了技术选择,并描述了当观察用户创建的实际工作负载时它们如何演变。