当前位置: X-MOL 学术Animal › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Storing, combining and analysing turkey experimental data in the Big Data era.
Animal ( IF 4.0 ) Pub Date : 2020-06-22 , DOI: 10.1017/s175173112000155x
D Schokker 1 , I N Athanasiadis 2 , B Visser 3 , R F Veerkamp 1 , C Kamphuis 1
Affiliation  

With the increasing availability of large amounts of data in the livestock domain, we face the challenge to store, combine and analyse these data efficiently. With this study, we explored the use of a data lake for storing and analysing data to improve scalability and interoperability. Data originated from a 2-day animal experiment in which the gait score of approximately 200 turkeys was determined through visual inspection by an expert. Additionally, inertial measurement units (IMUs), a 3D-video camera and a force plate (FP) were installed to explore the effectiveness of these sensors in automating the visual gait scoring. We deployed a data lake using the IMU and FP data of a single day of that animal experiment. This encompasses data from 84 turkeys for which we preprocessed by performing an ‘extract, transform and load’ (ETL-) procedure. To test scalability of the ETL-procedure, we simulated increasing volumes of the available data from this animal experiment and computed the ‘wall time’ (elapsed real time) for converting FP data into comma-separated files and storing these files. With a simulated data set of 30 000 turkeys, the wall time reduced from 1 h to less than 15 min, when 12 cores were used compared to 1 core. This demonstrated the ETL-procedure to be scalable. Subsequently, a machine learning (ML) pipeline was developed to test the potential of a data lake to automatically distinguish between two classses, that is, very bad gait scores v. other scores. In conclusion, we have set up a dedicated customized data lake, loaded data and developed a prediction model via the creation of an ML pipeline. A data lake appears to be a useful tool to face the challenge of storing, combining and analysing increasing volumes of data of varying nature in an effective manner.



中文翻译:

在大数据时代存储,组合和分析土耳其实验数据。

随着牲畜领域中大量数据的可用性不断提高,我们面临着有效存储,组合和分析这些数据的挑战。通过这项研究,我们探索了使用数据湖来存储和分析数据以改善可伸缩性和互操作性。数据来自为期2天的动物实验,其中专家通过目视检查确定了大约200只火鸡的步态得分。此外,惯性测量单元(IMU),3D摄像机和测力板(FP)的安装,以探讨这些传感器在自动进行步态评分时的效果。我们使用该动物实验一天中的IMU和FP数据部署了一个数据湖。这包括来自84个火鸡的数据,我们通过执行“提取,转换和加载”(ETL-)过程对其进行了预处理。为了测试ETL过程的可伸缩性,我们模拟了此动物实验中可用数据的增加量,并计算了将FP数据转换为逗号分隔文件并存储这些文件的“壁挂时间”(经过的实时时间)。在使用3万只火鸡的模拟数据集的情况下,当使用12芯而不是1芯时,壁挂时间从1 h减少到不到15分钟。这证明了ETL过程是可扩展的。随后,进行机器学习(ML)开发了管道来测试数据湖自动区分两个类别的潜力,即步态分数非常差v。其他分数。总之,我们已经建立了专用的定制数据湖,通过创建ML管道来加载数据并开发了预测模型。数据湖似乎是面对以有效方式存储,组合和分析不断增长的不同数量数据的挑战的有用工具。

更新日期:2020-06-22
down
wechat
bug