A self-stabilizing Hashed Patricia Trie

doi:10.1016/j.ic.2021.104697

Information and Computation

Volume 285, Part A, May 2022, 104697

https://doi.org/10.1016/j.ic.2021.104697 Get rights and content

Abstract

While many research in distributed computing has covered solutions for self-stabilizing computing and topologies, there is far less work on self-stabilization for distributed data structures. However, when peers in peer-to-peer networks crash, a distributed data structure may not remain intact. We present a self-stabilizing protocol for a distributed data structure called the Hashed Patricia Trie (Kniesburges and Scheideler WALCOM'11) that enables efficient prefix search on a set of keys. The data structure has many applications while offering low overhead and efficient operations when embedded on top of a Distributed Hash Table. Especially, longest prefix matching for x can be done in $O (\log | x |)$ hash table read accesses. We show how to maintain the structure in a self-stabilizing way, while assuring a low overhead in a legal state and an asymptotically optimal memory demand of $Θ (d)$ bits, where d is the number of bits needed for storing all keys.

Introduction

We consider the problem of maintaining a distributed data structure for efficient Longest Prefix Matching in peer-to-peer (P2P) systems. We focus on the Hashed Patricia Trie (HPT) introduced in [1] and present an algorithm rendering a self-stabilizing version of this data structure when applied on top of any reliable Distributed Hash Table (DHT).

Definition 1

(Longest Prefix Matching) Consider a set of binary strings called keys and a binary string x. The task of Longest Prefix Matching is to find a key y sharing the longest common prefix with x. A prefix of a binary string is a substring beginning with the first bit. We denote the longest common prefix of x and y by $ℓ c p (x, y)$ .

We denote a prefix p of x by $p ⊑ x$ . p is a proper prefix of x ( $p ⊏ x$ ) if p is a prefix of x and $| p | < | x |$ , where $| p |$ is the length of p. Longest Prefix Matching is an old problem with applications in various areas including string matching problems and IP lookup in Internet routers. To solve it efficiently in a distributed P2P system, the HPT has been introduced [1]. The HPT is a distributed data structure applied to any common DHT which allows efficient prefix search for x in $O (\log | x |)$ read accesses to the hash table, i.e., solely based on the length of the search word x. The costs for an insertion of x is in $O (\log | x |)$ read accesses and $O (1)$ write accesses, while deletion can be done in $O (1)$ accesses. The memory space used is asymptotically optimal in $Θ (sum of all key lengths)$ . Moreover, Suffix Trees can be implemented efficiently using Patricia Tries and thus also hashed Patrica Tries (called PAT Trees [2]). This allows us to efficiently decide if a given string x is a substring of a text in a runtime only depending on the length of x.

The usefulness of Patricia Tries motivates us to investigate how a HPT can be maintained in a P2P system where nodes may enter/leave or even fail, i.e., crash. While a lot of research has considered the design of self-stabilizing computation or topologies (see Section 1.2), to the best of our knowledge there are far fewer results concerning self-stabilizing distributed data structures. However, failures of peers may affect the correctness of any distributed data structure. For example, if a peer loses its memory or crashes, parts of the data structure that were distributed to that peer may be inaccessible or invalid. Therefore, we consider the problem of finding an efficient distributed protocol to maintain a HPT in a self-stabilizing way.

We assume the existence of a self-stabilizing Distributed Hash Table (DHT) providing the operations

to insert data and

to retrieve data. These operations are carried out reliably on the stored data, i.e., no operation is canceled and a search operation for x succeeds if and only if x is stored in the system. We assume the existence of a collision-free hash function which maps binary strings to positions in

[0, 1)

to store data in the DHT. The function is available locally at every peer. Each peer has a unique identifier, manages local variables and maintains a channel. When a peer sends a message m to peer p, it puts m in the channel of p. A channel has unbounded capacity and messages never get lost. If a peer processes a message in its channel, the message is removed from the channel afterwards.

We distinguish between two types of actions: The first one is for standard procedures and has the form $〈 l a b e l 〉 (〈 p a r a m e t e r s 〉) : 〈 c o m m a n d 〉$ where label is the name of the action, parameters define the set of parameters and command defines the statements that are executed when calling the action. It may be executed locally or remotely. The second type has the form $〈 l a b e l 〉 : (〈 g u a r d 〉) \to 〈 c o m m a n d 〉$ where label and command are defined as above and guard is a predicate over local variables. An action at peer p can only be executed if its guard is true or a message in the channel of p requests to call it. We call such an action enabled. The guard of our protocol routine

is always true.

A state of the system is defined by the assignment of variables at every peer, the data items and their values stored at every peer and all messages in channels of peers. The system can transform from a state s to another state $s^{'}$ by executing an enabled action at a peer. An infinite sequence of states $(s_{1}, s_{2}, \dots)$ is a computation if $s_{i + 1}$ can be reached by executing an action enabled in $s_{i}$ for all $i \geq 1$ . The state $s_{1}$ is called initial state. We assume fair message receipt, i.e., every message contained in a channel is eventually processed. Also, we assume weakly fair action execution such that any action that is enabled in all but finitely many states is executed infinitely often. This especially applies to the

procedure. We call a protocol self-stabilizing if it fulfills convergence and closure. Convergence means that starting from an arbitrary initial state, the protocol transforms the system to a legal state in finite time. Closure means that starting from a legal state, the protocol only transforms the system to consecutive legal states. Our goal is to provide a self-stabilizing HPT. We define the legal state of a HPT later in Section 4.1.

The basic data structure we consider here is the Patricia Trie (Practical Algorithm To Retrieve Information Coded In Alphanumeric). This compressed tree structure has been introduced by Morrison in [3]. It was extended to the Hashed Patricia Trie by Kniesburges and Scheideler in [1]. In [2], Gonnet et al. presented PAT Trees which are essentially Patricia Tries for special suffixes (sistrings) of a text. This widens the applications of Patricia Tries to general string problems such as deciding if a word or sentence is contained in a text [2]. The work on self-stabilization started with the research of Dijkstra in [4] where he analyzed self-stabilization in a token ring scenario. Since then, research has covered wide areas including self-stabilizing computation [5], [6] and coordination [4], [7], [8], [9]. Furthermore, with the rise of P2P systems [10], [11], self-stabilizing topologies in the sense of overlay networks gained attraction [12], [13], [14], [15], [16], [17], [18]. We use approaches originally presented for topological self-stabilization. This includes a technique called linearization presented by Onus et al. in [19] which describes a self-stabilizing technique for a sorted list topology. A common approach for storing data in overlay networks is a Distributed Hash Table (DHT) like Chord [11]. Using hashing, data items, as well as network peers, are mapped to the $[0, 1)$ interval such that a mapping between them is established. There are various results on self-stabilizing DHTs in the literature (for example [14], [17], [20]). Further, most (self-stabilizing) overlay networks can easily be extended to a DHT given sortable unique identifiers for the peers which is a common assumption.

The work which is probably closest to us is [21], where the authors consider a self-stabilizing protocol for the construction of a prefix tree out of given data in a DHT-like model. However, all data that is in the system in an initial configuration is part of the final prefix tree. The main difference to our work is that we seek to repair the HPT with respect to the set of keys only. This implies that our self-stabilizing protocol also has to deal with insertion of new data items and deletion of unnecessary data items.

We present a self-stabilizing protocol called SHPT (Self-stabilizing Hashed Patricia Trie) to maintain a slightly modified version of the HPT as presented in [1]. Whenever we refer to HPT, we implicitly mean the modified version. The HPT and our modification are briefly introduced in Section 2. Afterwards, Section 3 gives a high-level description of the mechanisms of our protocol along with detailed pseudocode. We only require for an initial state that the underlying DHT is in a legal state and that a set of unique keys is stored at DHT nodes. In Section 4, we formally show that our protocol stabilizes a HPT in finite time out of any initial state. When the HPT is in a legal state, our protocol guarantees a low overhead of a constant amount of hash table read accesses and messages generated at each DHT node per call of the protocol routine. Furthermore, we can bound the total memory consumption in a legal state to $Θ (d)$ bits if d is the number of bits needed to store all keys. We conclude the paper in Section 5 with some closing remarks on our work.

Section snippets

Hashed Patricia Trie

We consider a data structure called the Hashed Patricia Trie (HPT) as presented in [1]. The HPT is an extended Patricia Trie that is distributed in a P2P System by using a DHT. It supports operations

and

for a binary string x in

O (\log | x |)

read accesses on the hash table. Insertion takes additional

O (1)

write accesses and

is supported in constant hash table accesses. Furthermore, the memory space usage is in

Θ (\sum_{k \in KEYS} | k |)

. Next, we describe the construction.

The Patricia

The SHPT protocol

In the following, we present SHPT (Self-stabilizing Hashed Patricia Trie), our self-stabilizing protocol for maintaining a HPT. The corrections of SHPT can be divided into several parts. We present our assumptions concerning the underlying DHT first. Afterwards, we give an intuition on the different types of repairs our protocol performs along with detailed pseudocode. For a better understanding of the connections between the actions of the pseudocode and our analysis in Section 4, we append

Protocol analysis

In this section, we show that SHPT is self-stabilizing and transfers the HPT in finite time to a legal state. Furthermore, we present results concerning memory usage and the number of hash table accesses and messages when the HPT is in a legal state.

Closing remarks

In this paper, we considered a self-stabilizing protocol for a data structure that can easily be distributed over any DHT. We remark that it should be possible to adapt our protocol to other distributed data structures that resemble a binary tree structure. This is because the essential ideas, e.g., the linearization for Branch Sets or the management of deletions and insertions, are based on exploiting the binary tree structure and not specific properties of a HPT.

When designing SHPT we did not

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (25)

Z. Collin et al.
Self-stabilizing depth-first search
Inf. Process. Lett.
(1994)
T. Clouser et al.
Tiara: a self-stabilizing deterministic skip list and skip graph
Theor. Comput. Sci.
(2012)
S. Kniesburges et al.
Hashed Patricia Trie: efficient longest prefix matching in peer-to-peer systems
G.H. Gonnet et al.
New indices for text: PAT trees and PAT arrays
D.R. Morrison
PATRICIA—practical algorithm to retrieve information coded in alphanumeric
J. ACM
(1968)
E.W. Dijkstra
Self-stabilizing systems in spite of distributed control
Commun. ACM
(1974)
B. Awerbuch et al.
Distributed program checking: a paradigm for building self-stabilizing distributed protocols
Y. Afek et al.
Memory-efficient self stabilizing protocols for general networks
A. Arora et al.
Distributed reset
M. Flatebo et al.
Two-state self-stabilizing algorithms for token rings
IEEE Trans. Softw. Eng.
(1994)

A. Rowstron et al.

Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems

I. Stoica et al.

Chord: a scalable peer-to-peer lookup protocol for Internet applications

IEEE/ACM Trans. Netw.

(2003)

Cited by (2)

Adaptive Restructuring of Merkle and Verkle Trees for Enhanced Blockchain Scalability
2024, arXiv
High Performance UAV Data Retrieval Algorithm Based on Bit Vector Segmented Hash
2022, 2022 4th International Conference on Intelligent Control, Measurement and Signal Processing, ICMSP 2022

^☆: This work was partially supported by the German Research Foundation (DFG) within the Collaborative Research Centre On-The-Fly Computing (GZ: SFB 901/3) under the project number 160364472.

View full text

A self-stabilizing Hashed Patricia Trie☆

Abstract

Introduction

Section snippets

Hashed Patricia Trie

The SHPT protocol

Protocol analysis

Closing remarks

Declaration of Competing Interest

Inf. Process. Lett.

Theor. Comput. Sci.

Hashed Patricia Trie: efficient longest prefix matching in peer-to-peer systems

New indices for text: PAT trees and PAT arrays

PATRICIA—practical algorithm to retrieve information coded in alphanumeric

J. ACM

Self-stabilizing systems in spite of distributed control

Commun. ACM

Distributed program checking: a paradigm for building self-stabilizing distributed protocols

Memory-efficient self stabilizing protocols for general networks

Distributed reset

Two-state self-stabilizing algorithms for token rings

IEEE Trans. Softw. Eng.

Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems

Chord: a scalable peer-to-peer lookup protocol for Internet applications

IEEE/ACM Trans. Netw.