当前位置: X-MOL 学术IEEE Trans. Inform. Theory › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Binary Classification With XOR Queries: Fundamental Limits and an Efficient Algorithm
IEEE Transactions on Information Theory ( IF 2.2 ) Pub Date : 2021-05-04 , DOI: 10.1109/tit.2021.3077461
Daesung Kim , Hye Won Chung

We consider a query-based data acquisition problem for binary classification of unknown labels, which has diverse applications in communications, crowdsourcing, recommender systems and active learning. To ensure reliable recovery of unknown labels with as few number of queries as possible, we consider an effective query type that asks “group attribute” of a chosen subset of objects. In particular, we consider the problem of classifying m binary labels with XOR queries that ask whether the number of objects having a given attribute in the chosen subset of size d is even or odd. The subset size d, which we call query degree, can be varying over queries. We consider a general noise model where the accuracy of answers on queries changes depending both on the worker (the data provider) and query degree d. For this general model, we characterize the information-theoretic limit on the optimal number of queries to reliably recover m labels in terms of a given combination of degree-d queries and noise parameters. Further, we propose an efficient inference algorithm that achieves this limit even when the noise parameters are unknown.

中文翻译:


使用 XOR 查询的二进制分类:基本限制和高效算法



我们考虑基于查询的数据获取问题,用于未知标签的二元分类,该问题在通信、众包、推荐系统和主动学习中具有多种应用。为了确保使用尽可能少的查询可靠地恢复未知标签,我们考虑一种有效的查询类型,该类型询问所选对象子集的“组属性”。特别是,我们考虑使用 XOR 查询对 m 个二进制标签进行分类的问题,该查询询问在所选大小 d 的子集中具有给定属性的对象的数量是偶数还是奇数。子集大小 d(我们称之为查询度)可以随查询而变化。我们考虑一个通用的噪声模型,其中查询答案的准确性根据工作人员(数据提供者)和查询程度 d 的变化而变化。对于这个通用模型,我们根据 d 度查询和噪声参数的给定组合来描述最佳查询数量的信息论限制,以可靠地恢复 m 个标签。此外,我们提出了一种有效的推理算法,即使在噪声参数未知的情况下也能实现这一限制。
更新日期:2021-05-04
down
wechat
bug