Better Fewer but Better: Community Search with Outliers,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Better Fewer but Better: Community Search with Outliers
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-12-01 , DOI: arxiv-2012.00356
Francesco Bonchi, Lorenzo Severini, Mauro Sozio

Given a set of vertices in a network, that we believe are of interest for the application under analysis, community search is the problem of producing a subgraph potentially explaining the relationships existing among the vertices of interest. In practice this means that the solution should add some vertices to the query ones, so to create a connected subgraph that exhibits some "cohesiveness" property. This problem has received increasing attention in recent years: while several cohesiveness functions have been studied, the bulk of the literature looks for a solution subgraphs containing all the query vertices. However, in many exploratory analyses we might only have a reasonable belief about the vertices of interest: if only one of them is not really related to the others, forcing the solution to include all of them might hide the existence of much more cohesive and meaningful subgraphs, that we could have found by allowing the solution to detect and drop the outlier vertex. In this paper we study the problem of community search with outliers, where we are allowed to drop up to $k$ query vertices, with $k$ being an input parameter. We consider three of the most used measures of cohesiveness: the minimum degree, the diameter of the subgraph and the maximum distance with a query vertex. By optimizing one and using one of the others as a constraint we obtain three optimization problems: we study their hardness and we propose different exact and approximation algorithms.

中文翻译：

越少越好：带有异常值的社区搜索

给定网络中的一组顶点（我们认为分析中的应用程序感兴趣），社区搜索是生成可能解释所关注顶点之间存在的关系的子图的问题。在实践中，这意味着解决方案应在查询的顶点上添加一些顶点，以便创建显示出某些“内聚性”属性的连接子图。近年来，这个问题受到越来越多的关注：尽管已经研究了几个内聚函数，但是大量文献都在寻找包含所有查询顶点的解决方案子图。但是，在许多探索性分析中，我们可能对感兴趣的顶点只有一个合理的信念：如果只有其中一个顶点与其他顶点没有真正的联系，强迫解决方案包括所有这些解决方案可能会隐藏存在更多内聚和有意义的子图，而我们可以通过允许解决方案检测并丢弃离群点来发现这些子图。在本文中，我们研究了具有离群值的社区搜索问题，其中允许我们丢弃最多$ k $个查询顶点，其中$ k $作为输入参数。我们考虑三种最常用的内聚性度量：最小度，子图的直径和带有查询顶点的最大距离。通过优化一个并使用另一个作为约束，我们获得了三个优化问题：我们研究了它们的硬度，并提出了不同的精确算法和近似算法。在本文中，我们研究了具有离群值的社区搜索问题，其中允许我们丢弃最多$ k $个查询顶点，其中$ k $作为输入参数。我们考虑三种最常用的内聚性度量：最小度，子图的直径和带有查询顶点的最大距离。通过优化一个并使用另一个作为约束，我们获得了三个优化问题：我们研究了它们的硬度，并提出了不同的精确算法和近似算法。在本文中，我们研究了具有离群值的社区搜索问题，其中允许我们丢弃最多$ k $个查询顶点，其中$ k $作为输入参数。我们考虑三种最常用的内聚性度量：最小度，子图的直径和带有查询顶点的最大距离。通过优化一个并使用另一个作为约束，我们获得了三个优化问题：我们研究了它们的硬度，并提出了不同的精确算法和近似算法。

更新日期：2020-12-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文