Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations
arXiv - CS - Information Retrieval Pub Date : 2020-07-07 , DOI: arxiv-2007.03777
Huaiyi Huang, Yuqi Zhang, Qingqiu Huang, Zhengkui Guo, Ziwei Liu, and Dahua Lin

Place is an important element in visual understanding. Given a photo of a building, people can often tell its functionality, e.g. a restaurant or a shop, its cultural style, e.g. Asian or European, as well as its economic type, e.g. industry oriented or tourism oriented. While place recognition has been widely studied in previous work, there remains a long way towards comprehensive place understanding, which is far beyond categorizing a place with an image and requires information of multiple aspects. In this work, we contribute Placepedia, a large-scale place dataset with more than 35M photos from 240K unique places. Besides the photos, each place also comes with massive multi-faceted information, e.g. GDP, population, etc., and labels at multiple levels, including function, city, country, etc.. This dataset, with its large amount of data and rich annotations, allows various studies to be conducted. Particularly, in our studies, we develop 1) PlaceNet, a unified framework for multi-level place recognition, and 2) a method for city embedding, which can produce a vector representation for a city that captures both visual and multi-faceted side information. Such studies not only reveal key challenges in place understanding, but also establish connections between visual observations and underlying socioeconomic/cultural implications.

中文翻译：

Placepedia：具有多方面注释的全面地方理解

位置是视觉理解中的一个重要元素。给定一张建筑物的照片，人们通常可以说出它的功能，例如餐厅或商店，它的文化风格，例如亚洲或欧洲，以及它的经济类型，例如工业导向或旅游导向。虽然之前的工作已经广泛研究了地点识别，但要实现全面的地点理解还有很长的路要走，这远远超出了用图像对地点进行分类，需要多方面的信息。在这项工作中，我们贡献了 Placepedia，这是一个大型地点数据集，其中包含来自 24 万个独特地点的超过 3500 万张照片。除了照片，每个地方还带有海量的多方面信息，如GDP、人口等，以及多层次的标签，包括功能、城市、国家等。这个数据集，凭借其大量的数据和丰富的注释，可以进行各种研究。特别是，在我们的研究中，我们开发了 1) PlaceNet，一个用于多级地点识别的统一框架，以及 2) 一种城市嵌入方法，它可以为城市生成一个向量表示，同时捕获视觉和多方面的辅助信息. 此类研究不仅揭示了地方理解方面的关键挑战，而且还建立了视觉观察与潜在社会经济/文化影响之间的联系。

更新日期：2020-07-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文