当前位置: X-MOL 学术Methods Ecol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Thinking like a naturalist: Enhancing computer vision of citizen science images by harnessing contextual data
Methods in Ecology and Evolution ( IF 6.6 ) Pub Date : 2020-01-13 , DOI: 10.1111/2041-210x.13335
J. Christopher D. Terry 1, 2 , Helen E. Roy 1 , Tom A. August 1
Affiliation  

  1. The accurate identification of species in images submitted by citizen scientists is currently a bottleneck for many data uses. Machine learning tools offer the potential to provide rapid, objective and scalable species identification for the benefit of many aspects of ecological science. Currently, most approaches only make use of image pixel data for classification. However, an experienced naturalist would also use a wide variety of contextual information such as the location and date of recording.
  2. Here, we examine the automated identification of ladybird (Coccinellidae) records from the British Isles submitted to the UK Ladybird Survey, a volunteer‐led mass participation recording scheme. Each image is associated with metadata; a date, location and recorder ID, which can be cross‐referenced with other data sources to determine local weather at the time of recording, habitat types and the experience of the observer. We built multi‐input neural network models that synthesize metadata and images to identify records to species level.
  3. We show that machine learning models can effectively harness contextual information to improve the interpretation of images. Against an image‐only baseline of 48.2%, we observe a 9.1 percentage‐point improvement in top‐1 accuracy with a multi‐input model compared to only a 3.6% increase when using an ensemble of image and metadata models. This suggests that contextual data are being used to interpret an image, beyond just providing a prior expectation. We show that our neural network models appear to be utilizing similar pieces of evidence as human naturalists to make identifications.
  4. Metadata is a key tool for human naturalists. We show it can also be harnessed by computer vision systems. Contextualization offers considerable extra information, particularly for challenging species, even within small and relatively homogeneous areas such as the British Isles. Although complex relationships between disparate sources of information can be profitably interpreted by simple neural network architectures, there is likely considerable room for further progress. Contextualizing images has the potential to lead to a step change in the accuracy of automated identification tools, with considerable benefits for large‐scale verification of submitted records.


中文翻译:

像博物学家一样思考:通过利用上下文数据来增强公民科学图像的计算机视觉

  1. 目前,由公民科学家提交的图像中物种的准确识别是许多数据使用的瓶颈。机器学习工具具有为生态科学的许多方面带来好处的潜力,可以提供快速,客观和可扩展的物种识别。当前,大多数方法仅利用图像像素数据进行分类。但是,经验丰富的博物学家也会使用各种各样的上下文信息,例如记录的位置和日期。
  2. 在这里,我们检查了由不列颠群岛提交给英国瓢虫调查(由自愿者主导的群众参与记录方案)的自动识别瓢虫(Coccinellidae)记录的过程。每个图像都与元数据相关联;日期,位置和记录器ID,可以与其他数据源进行交叉引用,以确定记录时的当地天气,栖息地类型和观察者的经验。我们建立了多输入神经网络模型,该模型可综合元数据和图像以识别物种级别的记录。
  3. 我们证明了机器学习模型可以有效地利用上下文信息来改善图像的解释。在只有48.2%的纯图像基准下,我们发现多输入模型的前1位准确性提高了9.1个百分点,而使用集成图像和元数据模型时,仅提高了3.6%。这表明上下文数据已用于解释图像,而不仅仅是提供先前的期望。我们证明了我们的神经网络模型似乎正在利用与人类博物学家相似的证据来进行识别。
  4. 元数据是人类博物学家的关键工具。我们证明了计算机视觉系统也可以利用它。情境化提供了相当多的额外信息,尤其是对于具有挑战性的物种,甚至在不大而相对单一的地区(例如不列颠群岛)也是如此。尽管可以通过简单的神经网络体系结构很好地解释不同信息源之间的复杂关系,但仍有很大的发展空间。图像上下文化有可能导致自动识别工具的准确性发生阶跃变化,这对于提交记录的大规模验证具有相当大的好处。
更新日期:2020-01-13
down
wechat
bug