当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BAD to the bone: Big Active Data at its core
The VLDB Journal ( IF 2.8 ) Pub Date : 2020-05-23 , DOI: 10.1007/s00778-020-00616-7
Steven Jacobs , Xikui Wang , Michael J. Carey , Vassilis J. Tsotras , Md Yusuf Sarwar Uddin

Virtually, all of today’s Big Data systems are passive in nature, responding to queries posted by their users. Instead, we are working to shift Big Data platforms from passive to active. In our view, a Big Active Data (BAD) system should continuously and reliably capture Big Data while enabling timely and automatic delivery of relevant information to a large pool of interested users, as well as supporting retrospective analyses of historical information. While various scalable streaming query engines have been created, their active behavior is limited to a (relatively) small window of the incoming data. To this end, we have created a BAD platform that combines ideas and capabilities from both Big Data and Active Data (e.g., publish/subscribe, streaming engines). It supports complex subscriptions that consider not only newly arrived items but also their relationships to past, stored data. Further, it can provide actionable notifications by enriching the subscription results with other useful data. Our platform extends an existing open-source Big Data Management System, Apache AsterixDB, with an active toolkit. The toolkit contains features to rapidly ingest semistructured data, share execution pipelines among users, manage scaled user data subscriptions, and actively monitor the state of the data to produce individualized information for each user. This paper describes the features and design of our current BAD data platform and demonstrates its ability to scale without sacrificing query capabilities or result individualization.



中文翻译:

骨灰级:大活跃数据为核心

实际上,当今所有大数据系统本质上都是被动的,可以响应其用户发布的查询。相反,我们正在努力将大数据平台从被动转变为主动。我们认为,大活动数据(BAD)系统应连续可靠地捕获大数据,同时能够将相关信息及时自动地传递给大量感兴趣的用户,并支持对历史信息进行回顾性分析。虽然已经创建了各种可伸缩的流查询引擎,但是它们的活动行为仅限于(相对)较小的传入数据窗口。为此,我们创建了一个BAD平台,该平台结合了大数据和活动数据(例如,发布/订阅,流引擎)中的思想和功能。它支持复杂的订阅,这些订阅不仅考虑新到达的项目,还考虑它们与过去存储的数据的关系。此外,它可以通过使用其他有用数据丰富订阅结果来提供可操作的通知。活动工具箱。该工具包包含以下功能:快速获取半结构化数据,在用户之间共享执行管道,管理按比例缩放的用户数据订阅以及主动监视数据状态以为每个用户生成个性化信息。本文介绍了我们当前的BAD数据平台的功能和设计,并展示了其在不牺牲查询功能或结果个性化的情况下进行扩展的能力。

更新日期:2020-05-23
down
wechat
bug