当前位置: X-MOL 学术Methods Ecol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated retrieval of information on threatened species from online sources using machine learning
Methods in Ecology and Evolution ( IF 6.3 ) Pub Date : 2021-04-08 , DOI: 10.1111/2041-210x.13608
Ritwik Kulkarni 1 , Enrico Di Minin 1, 2, 3
Affiliation  

  1. As resources for conservation are limited, gathering and analysing information from digital platforms can help investigate the global biodiversity crisis in a cost-efficient manner. Development and application of methods for automated content analysis of digital data sources are especially important in the context of investigating human–nature interactions.
  2. In this study, we introduce novel application methods to automatically collect and analyse textual data on species of conservation concern from digital platforms. An end-to-end pipeline is constructed that begins from searching and downloading news articles about species listed in AppendixI of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) along with news articles from specific Twitter handles and proceeds with implementing natural language processing and machine learning methods to filter and retain only relevant articles. A crucial aspect here is the automatic annotation of training data, which can be challenging in many machine learning applications. A Named Entity Recognition model is then used to extract additional relevant information for each article.
  3. The data collected over a 1-month period included 15,088 articles focusing on 585 species listed in AppendixI of CITES. The accuracy of the neural network to detect relevant articles was 95.91% while the Named Entity recognition model helped extract information on prices, location and quantities of traded animals and plants. A regularly updated database, which can be queried and analysed for various research purposes and to inform conservation decision making, is generated by the system.
  4. The results demonstrate that natural language processing can be used successfully to extract information from digital text content. The proposed methods can be applied to multiple digital data platforms at the same time and used to investigate human–nature interactions in conservation science and practice.


中文翻译:

使用机器学习从在线资源中自动检索有关受威胁物种的信息

  1. 由于保护资源有限,从数字平台收集和分析信息有助于以具有成本效益的方式调查全球生物多样性危机。在调查人与自然相互作用的背景下,数字数据源自动内容分析方法的开发和应用尤为重要。
  2. 在这项研究中,我们引入了新的应用方法来自动收集和分析来自数字平台的保护关注物种的文本数据。构建端到端管道,从搜索和下载关于濒危野生动植物种国际贸易公约 (CITES) 附录 I 中所列物种的新闻文章以及来自特定 Twitter 处理的新闻文章开始,并继续实施自然语言处理和机器学习方法来过滤和仅保留相关文章。这里的一个关键方面是训练数据的自动注释,这在许多机器学习应用程序中可能具有挑战性。然后使用命名实体识别模型为每篇文章提取额外的相关信息。
  3. 1 个月内收集的数据包括 15,088 篇文章,重点关注 CITES 附录 I 中列出的 585 个物种。神经网络检测相关文章的准确率为 95.91%,而命名实体识别模型有助于提取有关交易动植物的价格、位置和数量的信息。该系统生成了一个定期更新的数据库,可以为各种研究目的进行查询和分析,并为保护决策提供信息。
  4. 结果表明,自然语言处理可以成功地用于从数字文本内容中提取信息。所提出的方法可以同时应用于多个数字数据平台,并用于研究保护科学和实践中的人与自然相互作用。
更新日期:2021-04-08
down
wechat
bug