1 Introduction

1.1 Searchable encryption

Using protocols for Searchable Encryption [2, 10, 20, 29] clients with limited computing and storage resources can outsource encrypted data to a server or a collection of servers, perform search over the encrypted data (typically using encrypted keywords) and eventually retrieve searched data while preserving its privacy against the servers. Existing searchable encryption schemes can be broadly split into those where the keyword search procedure requires either high-entropy shared keys such as Symmetric Searchable Encryption (SSE) schemes or a private-public key pair such as Public Key Encryption with Keyword Search (PEKS) schemes on the user side.

In practice, the requirement to maintain high-entropy keys on the user side results in less flexibility when it comes to the use of multiple, different devices for outsourcing and retrieval of data. The user is effectively prevented from using different devices unless the private key is made available to every such device.

1.1.1 Symmetric searchable encryption

Symmetric searchable encryption enables the user to encrypt the data, organizing it in an arbitrary way (before encryption) and includes additional data structures to allow for efficient access of relevant data. In this setting, the initial work for the user (i.e., for preprocessing the data) is at least as large as the data, but subsequent work (i.e., for accessing the data) is very small relative to the size of the data for both the user and the server. Ostrovsky demonstrated that symmetric searchable encryption can be achieved in its full generality and with optimal security using Oblivious RAM but with huge overhead [35]. Further works try to make the construction efficient with more rounds and a weaker security model to reduce the overhead. Song et al. [21] approached SSE using a new two layered encryption, whose outer layer discloses whether a particular keyword is stored in an inner encryption using a trapdoor. Unfortunately, search requires computation linear in the size of each document and reveals statistical information about the distribution of the underlying plaintext. Both of these where limitations were addressed by Goh [23] through associating secure indexes to each document in a collection. It also introduced the notion of semantic security against chosen-keyword attacks (called IND-CKA), which is the first formal notion of security defined for searchable encryption.

In the context of complex search queries, the above schemes are restricted to single-keyword equality queries. Ballard et al. [5] provided an secure and efficient system to perform Boolean keyword searches using Shamir’s secret sharing. Curtmola et al. [20] introduced two variants (adaptively secure and non adaptively secure) SSE with the use of lookup tables. Chase et al. [17] introduced the notion of structured encryption, where arbitrarily structured data are encrypted in such a way that it can be queried through the use of a query specific token that can only be generated with knowledge of the secret key. The scheme improves over non-adaptive variant of [20] achieving keyword search through generating dictionaries of each keyword which contain pointer-output for each document. Kamara et al. [29] further refine the model to a dynamic searchable encryption scheme based on the inverted indexes approach of [20].

Other variants of SSE include Message Lock Encryption by Bellare et al. [8], where the key under which encryption and decryption are performed is derived from the message itself and search pattern obfuscation by Orencik at al. [34] using preprocessed term frequency-inverse document frequency (tf-idf) weights of keyword-document pairs.

1.1.2 Public key encryption with keyword search (PEKS)

The notion of Public Key Encryption with Keyword Search was introduced by Boneh et al. [10] using bilinear maps and trapdoor permutations. The mechanism provided an efficient way to check whether a keyword is associated with a given document without leaking anything else about the document. However, due to the computation cost of public key encryption, the constructions were applicable to searching on a small number of keywords rather than an entire file. Moving beyond just equality-based keyword search, Park et al. [36] and Boneh et al. [10] extended PEKS for conjunctive [10, 36], subset and range [10] queries on encrypted data.

However, the PEKS construction does not allow the recipient to decrypt keywords, i.e., encryption is not invertible. This was addressed by Fuhr et al. [22], through introducing decryptable searchable encryption using identity-based key encapsulation mechanism (ID-KEM). The concept also paved way for management of encrypted data, since the decryption key and the trapdoor derivation key are generated independently from one another and hence data can be decrypted by an entity and trapdoors be generated by some other managing party. Abdalla et al. [2] defined the computational and statistical relaxations of the existing notion of perfect consistency, showing that [10] is computationally consistent, and providing a new scheme that is statistically consistent. Third party delegation was further studied by Ibraimi et al. [25], employing the notion of Public Key Encryption with Delegated Search (PKEDS) which enables a third party to search an document for a particular keyword encrypted by the user.

Other variants of public key encryption in the context of keyword search include Deterministic Searchable Encryption [6] and Plaintext-Checkable Encryption [15]. Bellare et al. [6] achieved deterministic searchable encryption using RSA-DOAEP, a length preserving deterministic encryption scheme. A plaintext-checkable encryption scheme is a probabilistic public-key encryption scheme with the additional functionality that anyone can test whether a ciphertext is the encryption of a given plaintext message under a public encryption key. Canard et al. [15] provided an efficient construction for plaintext checkable encryption using an ElGamal-based approach.

1.2 Password-authenticated searchable encryption (PASE)

The idea of basing searchable encryption solely on passwords, proposed in this paper, helps to avoid costly and risky key management on the user side and enables the whole process to be device-agnostic. This, however, comes with challenges considering that both passwords and keywords typically have low entropy. Amongst the core security properties of PASE, there is a need to guarantee that only the legitimate user, who knows the password, can outsource, search and retrieve data. Hence, basing security of searchable encryption schemes on passwords introduces the need for a distributed server environment where trust is spread across at least two non-colluding servers, as is also the case in many password-based protocols for authentication and secret sharing, e.g., [4, 12,13,14, 26,27,28, 30, 31, 40]. The use of two servers provides the most practical scenario and the minimum requirement to achieve protection against offline dictionary attacks, while a more general secret sharing architecture with t-out-of-n servers would be applicable as well. Chen et al. [18] further demonstrated the resilience of two server model against keyword guessing attacks. Thus, the PASE’s two server model offers best performance to protection trade-off for (public key-based) PEKS schemes, protecting against offline dictionary and keyword guessing attacks.

We model PASE as a searchable encryption scheme where users can register their passwords with the servers and then re-use these passwords for multiple sessions of the outsource and retrieval protocols. In each outsource session, the user can outsource encrypted keywords along with some (encrypted) document to both servers. The retrieval protocol realizes the search procedure based on the keyword that the user inputs to the protocol and provides the user with all documents associated with that keyword allowing the user to also verify the integrity of the retrieved documents. We define security of the PASE scheme using BPR-like models [3, 9] that have been widely used for password-based protocols. We define privacy of PASE keywords through indistinguishability against chosen keyword attacks (IND-CKA) while considering active adversaries, possibly in control of at most one server, who can also register own passwords in the system. While IND-CKA security protects against the adversary who does not know the password from successfully retrieving outsourced data, we additionally require authentication to protect the outsourcing operation itself, thus preventing the adversary from outsourcing data on behalf of the user; this requirement must also hold even if the adversary controls one of the servers.

Our direct PASE construction follows conceptually the following more general approach that combines ideas behind Password Authenticated Secret Sharing (PASS) [4, 12,13,14, 26, 27, 40] and SSE [5, 20, 34]. In the registration phase, the user picks a password \(\pi \) and a high-entropy symmetric key \(K\) that will be used to encrypt keywords and secret-shares \(K\) protected with \(\pi \) across both servers. In order to outsource keywords, the user engages into the PASS reconstruction protocol to obtain \(K\) and then into the SSE outsource protocol to outsource the keywords. In order to search for keywords and retrieve data, the user again reconstructs \(K\) using PASS and performs the keyword search using SSE. We stress, however, that our construction is direct and does not use PASS and SSE as generic building blocks. A generic construction from these two primitives remains currently out of reach due to significant differences in the syntax, functionality and security amongst the existing PASS protocols. First, PASS protocols do not separate registration from secret sharing phase and therefore do not enforce user authentication upon secret sharing which would be required for the outsourcing protocol in PASE. Existing PASS protocols were proven in different security models, e.g., BPR-like in [4, 40] and UC-based in [12, 14, 27, 28] and do not necessarily follow the same functionality and syntax, which makes it hard to use PASS as a generic building block in PASE without revising the syntax and security models of those PASS protocols. While we could update the syntax of PASS protocols to allow for a generic usage in PASE such update would introduce changes to the original PASS protocols and require new security proofs. Moreover, generic constructions often lead to less efficient instantiations than directly constructed schemes. For all the aforementioned reasons, we are not formally proposing a generic PASE construction in this paper and opt for a direct and efficient scheme (cf. Sect. 3) based on well-known assumptions in the standard model.

1.3 Paper organization

Section 2 formally models PASE functionality and defines its main security properties. Section 3 introduces our direct PASE construction. We recall the underlying cryptographic building blocks and present a high-level design rationale for the scheme. This section also compares the efficiency of the key reconstruction phase of the proposed PASE scheme with existing PASS protocols and highlights additional support for multi-keyword operations and password change. Section 4 contains formal security analysis of the proposed scheme. In Sect. 5, we present our browser-based demonstrator with complete implementation of the proposed PASE functionality. This section also contains experimental results on the evaluation of performance and scalability of our implementation on commodity user devices. Section 6 concludes this paper.

2 PASE model and definitions

In this section, we model the functionality of PASE and provide definitions of its security requirements.

2.1 PASE functionality

2.1.1 Syntax of algorithms and protocols

In our PASE model, any user \(\mathtt {U}\) can perform an initial registration procedure with any two servers \(\mathtt {S}_{0}\) and \(\mathtt {S}_{1}\) in the system and then use the registered password \(\pi \) (from some dictionary \(\mathcal {D}\)) to outsource and retrieve data based on associated keywords \(w \in \mathcal {W}\). Each server \(\mathtt {S}_{d}\), \(d\in \{0,1\}\) maintains its own database where for each user it records the associated secret information \(\mathtt {info} _{d}\) obtained during the registration procedure and the outsourced data \((C, \mathtt {ix})\) obtained from multiple executions of the outsource protocol; \(C\) is used to represent a ciphertext for the keywords, whereas index \(\mathtt {ix}\) stands for the outsourced (and possibly encrypted) document that is associated with the encrypted keywords. Similar to other searchable encryption schemes (e.g., [2]) we do not explicitly model the encryption of outsourced documents and use indices \(\mathtt {ix}\in \pmb {I}\) as placeholders for these documents.

  • \(\mathtt {Setup}(1^{\kappa })\) is an initialization algorithm that on input a security parameter \(\kappa \in \mathbb {N}\) generates public parameters \(\mathtt {par}\) of the scheme.

  • \(\mathtt {Register}\) is a registration protocol executed between some user \(\mathtt {U}\) (running interactive algorithm \(\mathtt {Register}\mathtt {U}\)) and two servers \(\mathtt {S}_{0}\) and \(\mathtt {S}_{1}\) (running interactive algorithms \(\mathtt {Register}\mathtt {S}_{d}\), \(d\in \{0,1\}\)) according to the following specification:

    • \(\mathtt {Register}\mathtt {U}(\mathtt {par},\pi ,\mathtt {S}_{0},\mathtt {S}_{1})\): on input \(\mathtt {par}\) and some password \(\pi \leftarrow \mathcal {D}\), this algorithm interacts with \(\mathtt {Register}\mathtt {S}_{d}\), \(d\in \{0,1\}\) and outputs a flag \(\mathtt {s}\in \{\mathtt {succ},\mathtt {fail}\}\). If \((\mathtt {s}=\mathtt {succ})\), the user remembers \(\pi \) and forgets all other informations.

    • \(\mathtt {Register}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {S}_{1\hbox {-}d})\): on input \(\mathtt {par}\), this algorithm interacts with \(\mathtt {Register}\mathtt {U}\) (and possibly \(\mathtt {Register}\mathtt {S}_{1\hbox {-}d}\)) and at the end of successful interaction stores some secret information \(\mathtt {info} _{d}\) associated with \(\mathtt {U}\) at \(\mathtt {S}_{d}\).

  • \(\mathtt {Outsource}\) is an outsourcing protocol executed between some user \(\mathtt {U}\) (running interactive algorithm \(\mathtt {Outsource}\mathtt {U}\)) and two servers \(\mathtt {S}_{0}\) and \(\mathtt {S}_{1}\) (running interactive algorithms \(\mathtt {Outsource}\mathtt {S}_{d}\), \(d\in \{0,1\}\)) according to the following specification:

    • \(\mathtt {Outsource}\mathtt {U}(\mathtt {par},\pi ,w,\mathtt {ix},\mathtt {S}_{0},\mathtt {S}_{1})\): on input \(\pi \), a keyword \(w\), and some index \(\mathtt {ix}\) this algorithms interacts with \(\mathtt {Outsource}\mathtt {S}_{d}\), \(d\in \{0,1\}\) and outputs a flag \(\mathtt {s}\in \{\mathtt {succ},\mathtt {fail}\}\).

    • \(\mathtt {Outsource}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {info} _{d})\): on input \(\mathtt {info} _{d}\), this algorithm upon successful interaction with \(\mathtt {Outsource}\mathtt {U}\) (and possibly \(\mathtt {Outsource}\mathtt {S}_{1\hbox {-}d}\)) stores a record \((C, \mathtt {ix})\) in its database \(\pmb {C}_{d}\).

  • \(\mathtt {Retrieve}\) is a retrieval protocol executed between some user \(\mathtt {U}\) (running interactive algorithm \(\mathtt {Retrieve}\mathtt {U}\)) and two servers \(\mathtt {S}_{0}\) and \(\mathtt {S}_{1}\) (running interactive algorithms \(\mathtt {Retrieve}\mathtt {S}_{d}\), \(d\in \{0,1\}\)) according to the following specification:

    • \(\mathtt {Retrieve}\mathtt {U}(\mathtt {par},\pi ,w,\mathtt {S}_{0},\mathtt {S}_{1})\): on input \(\pi \) and a keyword \(w\), this algorithm upon successful interaction with \(\mathtt {Retrieve}\mathtt {S}_{d}\), \(d\in \{0,1\}\) outputs set \(\pmb {I}\) containing all \(\mathtt {ix}\) associated with \(w\).

    • \(\mathtt {Retrieve}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {info} _{d})\): on input \(\mathtt {info} _{d}\), this algorithm interacts with \(\mathtt {Retrieve}\mathtt {U}\) (and possibly \(\mathtt {Retrieve}\mathtt {S}_{1\hbox {-}d}\)) and outputs a flag \(\mathtt {s}\in \{\mathtt {succ},\mathtt {fail}\}\).

2.1.2 Correctness

A PASE scheme is correct if for all \(\kappa \in \mathbb {N},\mathtt {ix}\in \mathcal {I},w\in \mathcal {W},\pi \in \mathcal {D}\), \(\mathtt {par}\leftarrow \mathtt {Setup}(1^{\kappa })\) the probability \(\Pr [\mathtt {ix}\in \pmb {I}]=1\) iff

$$\begin{aligned}&\langle \mathtt {succ},\mathtt {info} _{0},\mathtt {info} _{1}\rangle \leftarrow \langle \mathtt {Register}\mathtt {U}(\mathtt {par},\pi ,\mathtt {S}_{0},\mathtt {S}_{1}),\\&\quad \mathtt {Register}\mathtt {S}_{0}(\mathtt {par},\mathtt {U},\mathtt {S}_{1}),\mathtt {Register}\mathtt {S}_{1}(\mathtt {par},\mathtt {U},\mathtt {S}_{0})\rangle ;\\&\quad \langle \mathtt {succ},(C, \mathtt {ix}),(C, \mathtt {ix})\rangle \leftarrow \langle \mathtt {Outsource}\mathtt {U}(\mathtt {par},\pi ,w,\mathtt {ix},\mathtt {S}_{0},\mathtt {S}_{1}),\\&\quad \mathtt {Outsource}\mathtt {S}_{0}(\mathtt {par},\mathtt {U},\mathtt {info} _{0}),\mathtt {Outsource}\mathtt {S}_{1}(\mathtt {par},\mathtt {U},\mathtt {info} _{1})\rangle ;\\&\quad \langle \pmb {I},\mathtt {succ},\mathtt {succ}\rangle \leftarrow \langle \mathtt {Retrieve}\mathtt {U}(\mathtt {par},\pi ,w,\mathtt {S}_{0},\mathtt {S}_{1}),\\&\quad \mathtt {Retrieve}\mathtt {S}_{0}(\mathtt {par},\mathtt {U},\mathtt {info} _{0}),\mathtt {Retrieve}\mathtt {S}_{1}(\mathtt {par},\mathtt {U},\mathtt {info} _{1})\rangle ; \end{aligned}$$

In other words, the user should always be able to retrieve all indices \(\mathtt {ix}\) that were previously outsourced under some keyword \(w\) as long as this user is registered and has used its registered password \(\pi \) in these outsourcing and retrieval protocol sessions.

2.2 PASE security model

The security of PASE is defined based on two main security goals: indistinguishability against chosen keyword attacks (\(\mathtt {IND} \hbox {-}\mathtt {CKA}\)) and authentication. We adopt a BPR-like modeling approach [9] for password-based cryptographic protocols and define security through experiments (cf. Fig. 1) where a PPT adversary \(\mathcal {A}\) has full control over the communication channels and can interact with parties (controlled by a simulator) through the set of oracles defined in the following.

2.2.1 Adversarial model and oracles

For each user \(\mathtt {U}\), we allow \(\mathcal {A}\) to take full control over at most one of the two servers \(\mathtt {S}_{0}\) and \(\mathtt {S}_{1}\) that were chosen by \(\mathtt {U}\) during the registration phase to capture the required distributed trust relationship. We mostly use \(\mathtt {S}_{d}\) to denote the uncorrupted server and \(\mathtt {S}_{1\hbox {-}d}\) to denote the server controlled by the adversary. The oracles allow \(\mathcal {A}\) to invoke interactive algorithms for all protocols of PASE which will be executed (honestly) by the simulator. \(\mathcal {A}\) can interact with these algorithms and by this participate in the protocol. In particular, we allow \(\mathcal {A}\) to participate in outsourcing and retrieval protocols on behalf of some corrupted server and also as some (illegitimate) user who tries to guess the registered password during the execution of the protocol.

Let \(\pmb {\tau }\) be an initially empty array that will be populated with tuples of the form \(\pmb {\tau }[j]\leftarrow (d,\pi ,\mathtt {info} _{d})\) at the end of each successful j-th registration session such that \(\pi \) is the registered password and \(\mathtt {info} _{d}\) is the secret data stored at the server \(\mathtt {S}_{d}\) at the end of that session. We also use variables \(i^{*}\in \mathbb {Z}\), \(\mathtt {ix}^{*}\in \mathcal {I}\) and a set \(\mathtt {Set}\) that are maintained by the experiments. The adversary \(\mathcal {A}\) can access the following oracles.

  • Challenge oracle \(\mathtt {Ch_{\mathtt {ind}}}(b,\cdot ,\cdot ,\cdot ,\cdot )\): on input \((i,w_{0},w_{1},\mathtt {ix}^{*})\), the oracle aborts if \(((i^{*}\ge 0)\vee (i\ge j)\vee ((i,w_{0}) \in \mathtt {Set})\vee ((i,w_{1}) \in \mathtt {Set}))\). Otherwise, it sets \(i^{*}\leftarrow i\) and invokes oracle \(\mathtt {Out}(i^{*},w_{b},\mathtt {ix}^{*})\). Note that this oracle will be used to model \(\mathtt {IND} \hbox {-}\mathtt {CKA}\) security of PASE.

  • Registration oracle \(\mathtt {Reg}(\cdot )\): on input \(d\in \{0,1\}\), the experiment first initializes \(\pmb {C}_{d,j}\leftarrow \emptyset \) as a database for session j. Then, it randomly picks fresh \((\pi {\mathop {\leftarrow }\limits ^{\$}}\mathcal {D})\wedge ((i,\pi ,\cdot )\not \in \pmb {\tau })\) for all \(i\in [0,j-1]\). The \(\mathtt {Register}\) protocol is executed with \(\mathcal {A}\) where the oracle plays the roles of honest \(\mathtt {U}\) and \(\mathtt {S}_{d}\) executing algorithms \(\mathtt {Register}\mathtt {U}(\mathtt {par},\pi ,\mathtt {S}_{0},\mathtt {S}_{1})\) and \(\mathtt {Register}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {S}_{1\hbox {-}d})\), respectively, and \(\mathcal {A}\) plays the role of corrupted \(\mathtt {S}_{1\hbox {-}d}\). After interactions, the experiment records \(\pmb {\tau }[j]\leftarrow (d,\pi ,\mathtt {info} _{d})\), delivers j to the adversary and increases \(j\leftarrow j+1\).

  • Outsource oracle \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\): on input \((i,w,\mathtt {ix})\), the oracle aborts if \(i\ge j\); or otherwise, it obtains \((d,\pi ,\mathtt {info} _{d})\leftarrow \pmb {\tau }[i]\). The \(\mathtt {Outsource}\) protocol is then executed with \(\mathcal {A}\) where the oracle plays the roles of honest \(\mathtt {U}\) and \(\mathtt {S}_{d}\) executing algorithms \(\mathtt {Outsource}\mathtt {U}(\mathtt {par},\pi ,w,\mathtt {ix},\mathtt {S}_{0},\mathtt {S}_{1})\) and \(\mathtt {Outsource}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {info} _{d})\), respectively, and \(\mathcal {A}\) plays the role of malicious \(\mathtt {S}_{1\hbox {-}d}\). In \(\mathtt {Auth}\) experiment, the oracle additionally computes \(\mathtt {Set}\leftarrow \mathtt {Set}\cup (i,w,\mathtt {ix})\).

  • Outsource oracle (server only) \(\mathtt {OutS}(\cdot )\): on input i, the oracle aborts if \(i\ge j\); otherwise, it obtains \((d,\pi ,\mathtt {info} _{d})\leftarrow \pmb {\tau }[i]\). The \(\mathtt {Outsource}\) protocol is then executed with \(\mathcal {A}\) where the oracle plays the role of honest \(\mathtt {S}_{d}\) executing algorithm \(\mathtt {Outsource}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {info} _{d})\) and \(\mathcal {A}\) plays the roles of (illegitimate) \(\mathtt {U}\) and corrupted \(\mathtt {S}_{1\hbox {-}d}\). Note that this oracle will be used to model authentication of PASE.

  • Retrieve oracle \(\mathtt {Ret}(\cdot ,\cdot )\): on input \((i,w)\), the oracle aborts if \(i\ge j\). In the \(\mathtt {IND} \hbox {-}\mathtt {CKA}\) experiment, the oracle also aborts if \(((i=i^{*})\wedge (w\in \{w_{0},w_{1}\}))\). Otherwise, it obtains the parameters \((d,\pi ,\mathtt {info} _{d})\leftarrow \pmb {\tau }[i]\). The \(\mathtt {Retrieve}\) protocol is then executed with \(\mathcal {A}\) where the oracle plays the roles of honest \(\mathtt {U}\) and \(\mathtt {S}_{d}\) executing algorithms \(\mathtt {Retrieve}\mathtt {U}(\mathtt {par},\pi ,w,\mathtt {S}_{0},\mathtt {S}_{1})\) and \(\mathtt {Retrieve}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {info} _{d})\), respectively, and \(\mathcal {A}\) plays the role of corrupted \(\mathtt {S}_{1\hbox {-}d}\). In the \(\mathtt {IND} \hbox {-}\mathtt {CKA}\) experiment, if \((i^{*}=-1)\) the oracle additionally computes \(\mathtt {Set}\leftarrow \mathtt {Set}\cup (i,w)\).

  • Retrieve oracle (server only) \(\mathtt {RetS}(\cdot )\): on input i, the oracle aborts if \(i\ge j\); otherwise, it obtains \((d,\pi ,\mathtt {info} _{d})\leftarrow \pmb {\tau }[i]\). The \(\mathtt {Retrieve}\) protocol is then executed with \(\mathcal {A}\) where the oracle plays the role of honest \(\mathtt {S}_{d}\) executing algorithm \(\mathtt {Retrieve}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {info} _{d})\) and \(\mathcal {A}\) plays the roles of (illegitimate) \(\mathtt {U}\) and corrupted \(\mathtt {S}_{1\hbox {-}d}\). Note that this oracle will be used to model \(\mathtt {IND} \hbox {-}\mathtt {CKA}\)-security of PASE.

Fig. 1
figure 1

PASE security experiments. The oracles are defined in Sect. 2.2

2.2.2 Indistinguishability against chosen keyword attacks (\(\mathtt {IND} \hbox {-}\mathtt {CKA}\))

The \(\mathtt {IND} \hbox {-}\mathtt {CKA}\) property for PASE is defined through the experiment \(Exp _{\mathtt {PASE},\mathcal {A}}^{\mathtt {IND} \hbox {-}\mathtt {CKA}\hbox {-}b}(\kappa )\) (cf. Fig. 1) and is closely related to [5] except that our setting is based on passwords. \(\mathcal {A}\) is given the public parameters \(\mathtt {par}\) and permitted to adaptively access oracles \(\mathtt {Ch_{\mathtt {ind}}}(b,\cdot ,\cdot ,\cdot ,\cdot )\), \(\mathtt {Reg}(\cdot )\), \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {Ret}(\cdot ,\cdot )\) and \(\mathtt {RetS}(\cdot )\) at most 1, \(q_{r}\), \(q_{o}\), \(q_{t}\) and \(q_s\) times, respectively. In particular, our \(\mathtt {IND} \hbox {-}\mathtt {CKA}\) experiment captures the following ways that \(\mathcal {A}\) may try to retrieve data: (i) from interaction with an honest user \(\mathtt {U}\) and the honest server \(\mathtt {S}_{d}\) playing the role of corrupted \(\mathtt {S}_{1\hbox {-}d}\) (which is captured through the oracle \(\mathtt {Ret}(\cdot ,\cdot )\)), or (ii) from interaction with the honest server \(\mathtt {S}_{d}\) playing the role of illegitimate user, e.g., trying to guess the registered password, and the corrupted server \(\mathtt {S}_{1\hbox {-}d}\) (which is captured through the oracle \(\mathtt {RetS}(\cdot )\)).

Let \(Adv _{\mathtt {PASE},\mathcal {A}}^{\mathtt {IND} \hbox {-}\mathtt {CKA}}(\kappa ){\mathop {=}\limits ^{\mathrm {def}}}\Pr [b'=b:b'\leftarrow \) \( Exp _{\mathtt {PASE},\mathcal {A}}^{\mathtt {IND} \hbox {-}\mathtt {CKA}\hbox {-}b}(\kappa )]-\frac{1}{2}\) denote the advantage of \(\mathcal {A}\) in the \(\mathtt {IND} \hbox {-}\mathtt {CKA}\) security experiment. A PASE scheme is called \(\mathtt {IND} \hbox {-}\mathtt {CKA}\)-secure if the probability \(Adv _{\mathtt {PASE},\mathcal {A}}^{\mathtt {IND} \hbox {-}\mathtt {CKA}}(\kappa )\le \frac{q_s}{|\mathcal {D}|}+\epsilon (\kappa )\) where \(|\mathcal {D}|\) is the dictionary size and \(\epsilon (\kappa )\) is negligible in the security parameter \(\kappa \). Note that probability \(\frac{q_s}{|\mathcal {D}|}\) relates to the use of oracle \(\mathtt {RetS}(\cdot )\) that models on-line dictionary attacks and assumes uniform distribution of passwords within \(\mathcal {D}\), as is also common in BPR-like models.

2.2.3 Authentication (\(\mathtt {Auth}\))

The property of authentication for PASE is defined using experiment \(Exp _{\mathtt {PASE},\mathcal {A}}^{\mathtt {Auth}}(\kappa )\) in Fig. 1. \(\mathcal {A}\) is given the public parameters \(\mathtt {par}\) and permitted to access oracles \(\mathtt {Reg}(\cdot )\), \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {OutS}(\cdot )\) and \(\mathtt {Ret}(\cdot ,\cdot )\) with at most \(q_{r}\), \(q_{o}\), \(q_s\) and \(q_{t}\) times, respectively. Our experiment effectively captures attacks where \(\mathcal {A}\) tries to outsource some data \(\mathtt {ix}^{*}\) on behalf of some user \(\mathtt {U}\) without knowing the registered password (via \(\mathtt {OutS}(\cdot )\) oracle), possibly after having interacted with \(\mathtt {U}\) and the honest server \(\mathtt {S}_{d}\). In its attack on authentication, \(\mathcal {A}\) can play the role of a corrupted server \(\mathtt {S}_{1\hbox {-}d}\) and also mount man-in-the-middle attacks on sessions of \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\) protocols involving user \(\mathtt {U}\).

A PASE scheme provides authentication if for all PPT \(\mathcal {A}\) the probability \(Adv _{\mathtt {PASE},\mathcal {A}}^{\mathtt {Auth}}(\kappa )=\Pr [1\leftarrow Exp _{\mathtt {PASE},\mathcal {A}}^{\mathtt {Auth}}(\kappa )] \le \frac{q_s}{|\mathcal {D}|}+\epsilon (\kappa )\). As in the \(\mathtt {IND} \hbox {-}\mathtt {CKA}\) case, we again need to account for the possibility of online guessing attacks via the oracle \(\mathtt {OutS}(\cdot )\).

3 Our direct PASE construction

In this section, we propose a direct and efficient construction of PASE. It follows our general idea of combining suitable password-authenticated secret sharing with symmetric searchable encryption techniques. In the introduction, we explained the difficulties behind an attempt to construct PASE generically using PASS and SSE schemes and motivated our choice for a direct construction.

3.1 Cryptographic building blocks

In our PASE construction, we rely on a number of well-known cryptographic primitives that we briefly introduce in the following.

3.1.1 Pedersen commitments [37]

Let \(g,h\) be two generators in a multiplicative cyclic group \(\mathbb {G}\) with order q, and the discrete logarithm between \(h\) and g is unknown. For a message \(m\in \mathbb {Z}_q^*\), the Pedersen commitment is computed as \(\mathtt {c}\leftarrow g^{r}h^{m}\) where \(r{\mathop {\leftarrow }\limits ^{\$}}\mathbb {Z}_q^*\) and is opened by providing \((r, m)\). We recall that Pedersen commitments offer computational binding based the discrete logarithm problem, i.e., assuming \(Adv _{\mathcal {A}}^{\mathtt {DL}}(\kappa )\) is negligible and provide perfect hiding.

3.1.2 Pseudorandom function (\(\mathtt {PRF}\)) [24, 33]

Let \(k \in \mathcal {K}_{\mathtt {PRF}}\) be a high min-entropy key in the \(\mathtt {PRF}\) key space. A pseudorandom function \(\mathtt {PRF}\) is called \((t,q,\epsilon (\kappa ))\)-secure if for any PPT algorithm \(\mathcal {A}\) running in time t with at most q oracle queries the probability \(Adv _{\mathcal {A}}^{\mathtt {PRF}}(\kappa )\le \epsilon (\kappa )\) for distinguishing the outputs of \(\mathtt {PRF}(k, m)\) from the outputs of a truly random function f of the same length, assuming that \(\mathcal {A}\) has oracle access to \(\mathcal {O}_{\mathtt {PRF}}(\cdot )\) which contains either \(\mathtt {PRF}(k, \cdot )\) or \(f(\cdot )\) and which cannot be queried on m.

3.1.3 Key derivation function (\(\mathtt {KDF}\)) [32]

Let \(\varSigma \) be a source of key material. A key derivation function \(\mathtt {KDF}\) is called \((t,q,\epsilon (\kappa ))\)-secure with respect to \(\varSigma \) if for any PPT algorithm \(\mathcal {A}\) running in time t with at most q oracle queries the probability \(Adv _{\mathcal {A}}^{\mathtt {KDF}}(\kappa )\le \epsilon (\kappa )\) for distinguishing the output of \(\mathtt {KDF}(k, c)\) from uniformly drawn random strings of the same length, assuming that \((k, \alpha ) \leftarrow \varSigma \) where k is the secret key material and \(\alpha \) is some side information. It is assumed that \(\mathcal {A}\) knows \(\alpha \), has control over the context information c and has oracle access to \(\mathtt {KDF}(k, \cdot )\) which cannot be queried on c.

3.1.4 Message authentication code [7]

A message authentication code \((\mathtt {KGen}, \mathtt {Tag}, \mathtt {Vrfy})\) is comprised of the algorithms

  • \(\mathtt {KGen}(\kappa )\): on input security parameter \(\kappa \) output key \(\mathtt {mk}\leftarrow \{0,1\}^\kappa \).

  • \(\mathtt {Tag}(\mathtt {mk},m)\): on input a key \(\mathtt {mk}\) and a message m, output tag \(\mu \leftarrow \mathtt {Tag}(\mathtt {mk},m)\).

  • \(\mathtt {Vrfy}(\mathtt {mk},m,\mu )\): on input a key \(\mathtt {mk}\), a message m and a tag \(\mu \) outputs 1 if \(\mu \) is valid or 0 otherwise.

A \(\mathtt {MAC}\) is secure if any PPT algorithm \(\mathcal {A}\) without knowledge of \(\mathtt {mk}\) has only negligible probability \(Adv _{\mathcal {A}}^{\mathtt {MAC}}(\kappa )\) to forge a tag \(\mu ^*\) for some message \(m^*\). \(\mathcal {A}\) has access to the tag oracle \(\mathcal {O}_{\mathtt {Tag}}(\cdot )\) which returns \(\mu \leftarrow \mathtt {Tag}(\mathtt {mk},m)\) on input m. The only restriction is that \(m^*\) is never queried to \(\mathcal {O}_{\mathtt {Tag}}(\cdot )\).

3.2 High-level design rationale

Our PASE protocol is inspired by the techniques used in the recent password-authenticated secret sharing protocol from [40] which we modified to address the functionality and requirements of PASE and extended with a suitable mechanism for symmetric searchable encryption of keywords. In particular, we define a new registration protocol \(\mathtt {Register}\) upon which the user registers its password \(\pi \) encrypted in \(C_\pi \) with both servers and also picks a symmetric key \(K\) for which it computes appropriate shares \(K_{0}\) and \(K_{1}\) which are then sent to the corresponding servers. The reconstruction of \(K\) is protected by \(\pi \), and MAC codes \(\mu _{d}\) are used to ensure the validity of \(K\) upon its reconstruction. The protocols \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\) proceed according to the similar pattern. First, the user reconstructs \(K\) using its password \(\pi \) after communication with both servers. Then, in \(\mathtt {Outsource}\) protocol \(\mathtt {U}\) uses \(K\) in combination with its keyword w to derive a trapdoor \(t\leftarrow \mathtt {KDF}_{2}(K,w)\) and a fresh randomness e to derive verifier \(v\leftarrow \mathtt {PRF}(t,e)\). The pair (ev) becomes part of the outsourced ciphertext C which is bound to some data \(\mathtt {ix}\). During the \(\mathtt {Retrieve}\) protocol, the user can recompute the trapdoor t for a given keyword w and then send it to the servers who can the find all outsourced ciphertexts C for which \(v\leftarrow \mathtt {PRF}(t,e)\) holds and hence identify which data \(\mathtt {ix}\) needs to be returned. In order to prevent servers from creating their own pairs (ev) for a given t the outsourced ciphertext C additionally includes a MAC tag \(\mu _c\) which authenticates (ev) and also \(\mathtt {ix}\) and which can only be computed and verified using \(K\). During the \(\mathtt {Retrieve}\) protocol, the user will ensure that it final search result contains only data that pass this integrity and authenticity check. In addition, both protocols make use of MACs to ensure authenticity of messages, where the MAC keys are derived from \(K\) on the user side. We emphasize that our PASE construction is in the password-only setting where servers are not required to possess any public keys for the security of the PASE scheme. However, if the registration protocol \(\mathtt {Register}\) is performed remotely over a public network, then this protocol needs to be executed over server-authenticated secure-channels (e.g., TLS). In order to enable reconstruction of \(K\) by the user and to protect this phase with the password both servers communicate with each other as part of the \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\) protocols. While in practice this communication between the two servers will likely be protected using a secure channel (e.g., TLS), we stress that in our protocols this communication can take place over an insecure channel.

3.3 Detailed description

In the following, we provide a detailed description of all algorithms and protocols underlying our direct PASE scheme, along with Figs. 2 and 3 that illustrate the protocols \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\), respectively.

3.3.1 Initialization procedure \(\mathtt {Setup}(1^{\kappa })\)

The algorithm generates public parameters \(\mathtt {par}\) containing \(\{\mathbb {G},q,g,h,\mathtt {KDF}_{1},\mathtt {KDF}_{2},\mathtt {PRF},\mathtt {MAC}\}\), where \((\mathbb {G},q,g,h)\) represents a multiplicative cyclic group \(\mathbb {G}\) with a prime order q and generators \(g,h{\mathop {\leftarrow }\limits ^{\$}}\mathbb {G}\) such that the discrete logarithm of \(h\) with respect to base g remains unknown. \(H:\mathbb {G}\times \mathbb {G}\rightarrow \mathbb {Z}_q^*\) is a collision-resistant hash function. \(\mathtt {KDF}_{1}:\{0,1\}^{*}\rightarrow \mathcal {K}_{\mathtt {MAC}}\) and \(\mathtt {KDF}_{2}:\mathbb {G}\times \mathcal {W}\rightarrow \mathcal {K}_{\mathtt {PRF}}\) are two key derivation functions. \(\mathtt {PRF}:\mathcal {K}_{\mathtt {PRF}}\times \{0,1\}^{\kappa }\rightarrow \{0,1\}^{\kappa }\) is a pseudorandom function. \(\mathtt {MAC}= (\mathtt {KGen}, \mathtt {Tag}, \mathtt {Vrfy})\) is a message authentication code with \(\mathtt {Tag}:\mathcal {K}_{\mathtt {MAC}}\times \{0,1\}^{*}\rightarrow \{0,1\}^{\kappa }\) and \(\mathtt {Vrfy}:\mathcal {K}_{\mathtt {MAC}}\times \{0,1\}^{*}\times \{0,1\}^{\kappa }\rightarrow \{0,1\}\) where \(\mathcal {K}_{\mathtt {PRF}}\) and \(\mathcal {K}_{\mathtt {MAC}}\) are \(\mathtt {PRF}\) and \(\mathtt {MAC}\) key spaces, respectively. We assume that passwords from \(\mathcal {D}\) are represented as elements of \(\mathbb {Z}_q^*\).

3.3.2 Registration protocol \(\mathtt {Register}\)

In order to register, a user \(\mathtt {U}\) picks \(r_{1},r_{2},x_{0},x_{1}{\mathop {\leftarrow }\limits ^{\$}}\mathbb {Z}_q^*\) and \(K,K_{0}{\mathop {\leftarrow }\limits ^{\$}}\mathbb {G}\); computes \(X\leftarrow g^{x_{0}+x_{1}}\), \(K_{1}\leftarrow X^{r_{1}}K(K_{0})^{\hbox {-}1}\) and \(C_{\pi }\leftarrow X^{r_{2}}h^{\pi }\). Then, for \(d\in \{0,1\}\), the user computes \(\mathtt {mk}_{d}\leftarrow \mathtt {KDF}_{1}(K,\mathtt {S}_{d} ,\hbox {`}1\hbox {'})\), sets \(\mathtt {info} _{d}\leftarrow (x_{d},g^{r_{1}},g^{r_{2}},C_{\pi }, K_{d},\mathtt {mk}_{d})\) and sends \(\mathtt {info} _{d}\) to each server \(\mathtt {S}_{d}\), \(d\in \{0,1\}\) over a server-authenticated secure channel. Finally, \(\mathtt {U}\) memorizes \(\pi \).

3.3.3 Outsourcing protocol \(\mathtt {Outsource}\)

Fig. 2
figure 2

The \(\mathtt {Outsource}\) protocol between user \(\mathtt {U}\) and server \(\mathtt {S}_{d}\). The server-side algorithm includes communication between servers \(\mathtt {S}_{d}\) and \(\mathtt {S}_{1\hbox {-}d}\)

The \(\mathtt {Outsource}\) protocol between the user \(\mathtt {U}\) and each server \(\mathtt {S}_{d}\), \(d\in \{0,1\}\) is illustrated in Fig. 2, and its steps are detailed in the following. Note that as part of the \(\mathtt {Outsource}\) protocol both \(\mathtt {S}_{0}\) and \(\mathtt {S}_{1}\) communicate with each other, possibly over an insecure channel.

  1. 1.

    User \(\mathtt {U}\) randomly selects \(a{\mathop {\leftarrow }\limits ^{\$}}\mathbb {Z}_q^*\), \(e{\mathop {\leftarrow }\limits ^{\$}}\{0,1\}^{\kappa }\) and sends \(A\leftarrow g^{a}h^{\pi }\) to both servers.

  2. 2.

    On input \(A\), server \(\mathtt {S}_{d}\) executes following steps:

    1. (a)

      Pick \(s _{d},y_{d}{\mathop {\leftarrow }\limits ^{\$}}\mathbb {Z}_q^*\), compute \(Y_{d}\leftarrow g^{y_{d}}, R_{d}\leftarrow (g^{r_{2}})^{y_{d}}\).

    2. (b)

      Send Pedersen commitment \(\mathtt {c}_{d}\leftarrow g^{s _{d}}h^{H(Y_{d},R_{d})}\) to server \(\mathtt {S}_{1\hbox {-}d}\) and wait for its response \(\mathtt {c}_{1\hbox {-}d}\).

    3. (c)

      Send the opening \((Y_{d},R_{d},s _{d})\) to server \(\mathtt {S}_{1\hbox {-}d}\) and wait for its response \((Y_{1\hbox {-}d},R_{1\hbox {-}d},s _{1\hbox {-}d})\). If \(\mathtt {c}_{1\hbox {-}d}\ne g^{s _{1\hbox {-}d}}h^{H(Y_{1\hbox {-}d},R_{1\hbox {-}d})}\), then abort.

    4. (d)

      Send \((Y,Z_{d},\mu _{d})\) to \(\mathtt {U}\) where \(Y\leftarrow Y_{0}Y_{1}\), \(R\leftarrow R_{0}R_{1}\), \(\mu _{d}\leftarrow \mathtt {Tag}(\mathtt {mk}_{d},(A,Y,Z_{d}))\) and \(Z_{d}\leftarrow K_{d}(C_{\pi }A^{\hbox {-}1})^{y_{d}}(g^{r_{1}}R)^{\hbox {-}x_{d}}\).

  3. 3.

    Upon receiving \((Y,Z_{0},\mu _{0})\) and \((Y,Z_{1},\mu _{1})\) from both servers, user \(\mathtt {U}\) executes following steps:

    1. (a)

      If \(\mathtt {Vrfy}(\mathtt {mk}_{d},(A,Y,Z_{d}),\mu _{d})=0\) for any \(d\in \{0,1\}\), then abort, else compute \(K\leftarrow Z_{0}Z_{1}Y^{a}\).

    2. (b)

      Compute \(t\leftarrow \mathtt {KDF}_{2}(K,w)\), \(v\leftarrow \mathtt {PRF}(t,e)\), \(\mathtt {mk}_{u}\leftarrow \mathtt {KDF}_{1}(K,\mathtt {U} ,\hbox {`}0\hbox {'})\), \(\mu _{c}\leftarrow \mathtt {Tag}(\mathtt {mk}_{u},(e,v,\mathtt {ix}))\) and \(C\leftarrow (e,v,\mu _{c})\).

    3. (c)

      Send \(((C, \mathtt {ix}),\mu _{\mathtt {sk}_{d}})\) to server \(\mathtt {S}_{d}\), \(d\in \{0,1\}\) where \(\mu _{\mathtt {sk}_{d}}\leftarrow \mathtt {Tag}(\mathtt {sk}_{d},(C, \mathtt {ix}))\) using \(\mathtt {sk}_{d}\leftarrow \mathtt {KDF}_{1}(\mathtt {mk}_{d},A,Y ,\hbox {`}2\hbox {'})\).

  4. 4.

    On input \(((C, \mathtt {ix}),\mu _{\mathtt {sk}_{d}})\), server \(\mathtt {S}_{d}\) stores \((C, \mathtt {ix})\) in its database \(\pmb {C}_{d}\) if \(\mathtt {Vrfy}(\mathtt {sk}_{d},(C, \mathtt {ix}),\mu _{\mathtt {sk}_{d}})=1\) for \(\mathtt {sk}_{d}\leftarrow \mathtt {KDF}_{1}(\mathtt {mk}_{d},A,Y ,\hbox {`}2\hbox {'})\), else \(\mathtt {S}_{d}\) aborts.

3.3.4 Retrieval protocol \(\mathtt {Retrieve}\)

Fig. 3
figure 3

The \(\mathtt {Retrieve}\) protocol between \(\mathtt {U}\) and \(\mathtt {S}_{d}\). The server-side algorithm includes communication between servers \(\mathtt {S}_{d}\) and \(\mathtt {S}_{1\hbox {-}d}\)

The \(\mathtt {Retrieve}\) protocol between the user \(\mathtt {U}\) and each server \(\mathtt {S}_{d}\), \(d\in \{0,1\}\) is illustrated in Fig. 3, and its steps are detailed in the following. Note that as part of the \(\mathtt {Retrieve}\) protocol both \(\mathtt {S}_{0}\) and \(\mathtt {S}_{1}\) communicate with each other, possibly over an insecure channel.

  1. 1.

    User \(\mathtt {U}\) randomly selects \(a{\mathop {\leftarrow }\limits ^{\$}}\mathbb {Z}_q^*\) and sends \(A\leftarrow g^{a}h^{\pi }\) to both servers.

  2. 2.

    On input \(A\), server \(\mathtt {S}_{d}\) executes following steps:

    1. (a)

      Pick \(s _{d},y_{d}{\mathop {\leftarrow }\limits ^{\$}}\mathbb {Z}_q^*\), compute \(Y_{d}\leftarrow g^{y_{d}}, R_{d}\leftarrow (g^{r_{2}})^{y_{d}}\).

    2. (b)

      Send Pedersen commitment \(\mathtt {c}_{d}\leftarrow g^{s _{d}}h^{H(Y_{d},R_{d})}\) to server \(\mathtt {S}_{1\hbox {-}d}\) and wait for its response \(\mathtt {c}_{1\hbox {-}d}\).

    3. (c)

      Send opening \((Y_{d},R_{d},s _{d})\) to server \(\mathtt {S}_{1\hbox {-}d}\) and waits for its response \((Y_{1\hbox {-}d},R_{1\hbox {-}d},s _{1\hbox {-}d})\). If \(\mathtt {c}_{1\hbox {-}d}\ne g^{s _{1\hbox {-}d}}h^{H(Y_{1\hbox {-}d},R_{1\hbox {-}d})}\), then abort.

    4. (d)

      Send \((Y,Z_{d},\mu _{d})\) to \(\mathtt {U}\) where \(Y\leftarrow Y_{0}Y_{1}\), \(R\leftarrow R_{0}R_{1}\), \(\mu _{d}\leftarrow \mathtt {Tag}(\mathtt {mk}_{d},(A,Y,Z_{d}))\) and \(Z_{d}\leftarrow K_{d}(C_{\pi }A^{\hbox {-}1})^{y_{d}}(g^{r_{1}}R)^{\hbox {-}x_{d}}\).

  3. 3.

    Upon receiving \((Y,Z_{0},\mu _{0})\) and \((Y,Z_{1},\mu _{1})\) from both servers, \(\mathtt {U}\) executes following steps:

    1. (a)

      If \(\mathtt {Vrfy}(\mathtt {mk}_{d},(A,Y,Z_{d}),\mu _{d})=0\) for any \(d\in \{0,1\}\), then abort, else compute \(K\leftarrow Z_{0}Z_{1}Y^{a}\).

    2. (b)

      Compute \(t\leftarrow \mathtt {KDF}_{2}(K,w)\) and \(\mu _{\mathtt {sk}_{d}}\leftarrow \mathtt {Tag}(\mathtt {sk}_{d},t)\) using \(\mathtt {sk}_{d}\leftarrow \mathtt {KDF}_{1}(\mathtt {mk}_{d},A,Y ,\hbox {`}2\hbox {'})\). Send \((t,\mu _{\mathtt {sk}_{d}})\) to \(\mathtt {S}_{d}\), \(d\in \{0,1\}\).

  4. 4.

    On input \((t,\mu _{\mathtt {sk}_{d}})\), server \(\mathtt {S}_{d}\) executes following steps:

    1. (a)

      If \(\mathtt {Vrfy}(\mathtt {sk}_{d},t,\mu _{\mathtt {sk}_{d}})=0\), then abort, else compute \(\mathtt {sk}_{d}\leftarrow \mathtt {KDF}_{1}(\mathtt {mk}_{d},A,Y ,\hbox {`}2\hbox {'})\).

    2. (b)

      Initialize set \(\pmb {A}_{d}\leftarrow \emptyset \). For all \((C, \mathtt {ix})\in \pmb {C}_{d}\), parse \((e,v,\mu _{c})\leftarrow C\) and add \((C, \mathtt {ix})\) to \(\pmb {A}_{d}\) if \(v=\mathtt {PRF}(t,e)\). Finally, send \(\pmb {A}_{d}\) to \(\mathtt {U}\).

  5. 5.

    Upon receiving \(\pmb {A}_{0}\) and \(\pmb {A}_{1}\), user \(\mathtt {U}\) initializes an empty set \(\pmb {I}\leftarrow \emptyset \). Then, for all \((C, \mathtt {ix})\in (\pmb {A}_{0}\cup \pmb {A}_{1})\), parses \((e,v,\mu _{c})\leftarrow C\) and adds \(\mathtt {ix}\) to \(\pmb {I}\) if \(v=\mathtt {PRF}(t,e)\) and \(\mathtt {Vrfy}(\mathtt {mk}_{u},(e,v,\mathtt {ix}),\mu _{c})=1\). This step guarantees that only outsourced data for which the integrity check was performed successfully will be added to the output set \(\pmb {I}\).

3.3.5 Correctness of our PASE scheme

In the following, we illustrate that if the initially registered password \(\pi \) is used by the user in the executions of the \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\) protocols then computing \(Z_{0}Z_{1}Y^{a}\) in the key reconstruction phase results in the original key \(K\):

$$\begin{aligned} Z_{0}Z_{1}Y^{a}=&K_{0}(C_{\pi }A^{\hbox {-}1})^{y_{0}}(g^{r_{1}}R)^{\hbox {-}x_{0}} \cdot \\&K_{1}(C_{\pi }A^{\hbox {-}1})^{y_{1}}(g^{r_{1}}R)^{\hbox {-}x_{1}}\cdot g^{a(y_{0}+y_{1})}\\ =&X^{r_{1}}K(X^{r_{2}}g^{\hbox {-}a})^{y_{0}+y_{1}}(g^{r_{1}} R)^{\hbox {-}(x_{0}+x_{1})}g^{a(y_{0}+y_{1})}\\ =&X^{r_{1}} KX^{r_{2}(y_{0}+y_{1})}X^{\hbox {-}(r_{1})}X^{\hbox {-}r_{2}(y_{0}+y_{1})}=K\end{aligned}$$

3.4 Efficiency analysis and improvements

3.4.1 Efficiency comparison with existing PASS protocols

Given that our direct PASE construction follows the general idea of building PASE protocols based on the techniques used for password-authenticated secret sharing, we compare performance with existing PASS protocols. Since our PASE scheme assumes password-only setting (except for the registration), we restrict our comparison with password-only PASS schemes [4, 26,27,28, 40] and compare only the costs that arise from the sharing and retrieval of the symmetric key \(K\)—note that in our PASE scheme sharing of \(K\) is performed as part of the \(\mathtt {Register}\) protocol, whereas retrieval of \(K\) is part of both \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\) protocols and is accomplished in step 3a) of these protocols. Since our PASE scheme adopts a two-server architecture, but the aforementioned PASS schemes were designed for a more general t-out-of-n threshold setting we consider their costs for the special case of \(t=n=2\) to ease the comparison. The results of the comparison are presented in Table 1.

Table 1 Efficiency comparison of our PASE key reconstruction phase with password-only PASS schemes

We compare computation costs through the number of modular exponentiations for the user and each of the servers during the sharing and retrieval phases of the symmetric key \(K\). We also compare communication costs in the number of bits communicated in both phases, while considering user–server and server–server communications. For the lengths of elements in \(\mathbb {G}\) and \(\mathbb {Z}_q^*\), we use \(|\mathbb {G}|=q\) and \(|q|=\kappa \) bits, respectively. We also compare the number of rounds needed for the sharing and retrieval of \(K\).

We observe that in terms of computation and communication costs key sharing and reconstruction phases in our PASE scheme compare fairly well with those of existing PASS protocols. In particular, only [27, 28] which are the most computationally efficient PASS protocol today offers better overall computation and communication performance. We stress, however, that for PASE protocols the efficiency of the retrieval phase is of greater importance than of the sharing phase. This is because in PASE sharing of \(K\) is performed only once as part of the registration procedure, but retrieval of \(K\) occurs each time the user wants to outsource data or search for keywords. Furthermore, due to the simplified key management (i.e., reliance on passwords only) PASE offers device-agnostic use of the functionality to the user and can possibly be executed on different client devices (ranging from desktops over to smartphones). In this case, it becomes important to keep the costs associated with computations on the user side and the user-server communication low. Considering this, we observe that in comparison with [27, 28] our PASE scheme achieves similar and even partly better performance for computations and communication involving the user device.

As a result of our comparison, we conclude that our PASE scheme is sufficiently practical since the additional costs arising from the encrypted keyword search functionality within our PASE protocols are negligible (due to the nature of computations involved) in comparison with the costly key sharing and retrieval steps.

3.4.2 PASE with sublinear search complexities

Our PASE construction supports all CRUD operations required for a database, but its search complexity is \(\mathcal {O}(D)\) for a database DB of size D whilst state-of-the-art schemes achieve a better bound of \(\mathcal {O}(\log {}D)\). The search complexity of our PASE can be decreased using the techniques from [16, 19, 38], yet at the cost of some security and/or functionality limitations. For instance, using the techniques from [16, 19] would require limiting PASE functionality to static databases loosing dynamic updates. The latter can be preserved with an ORAM but at a higher cost of \(\mathcal {O}(D\log {}D)\) for periodic oblivious sorting [38].

The state-of-the-art approach in [19] uses dynamic databases with limited updates. We adopt it here because of the best trade-off between efficiency and functionality. Currently, within each \(\mathtt {Outsource}\) round we outsource one keyword \(w\) associated with some document \(\mathtt {ix}\). Using [19], within each \(\mathtt {Outsource}\) round we can outsource a batch of documents by treating them as a static database DB. The optimization is achieved by constructing a look-up table T in the setup phase which holds pointers to locations of the documents in DB such that the table inputs depend on the document keyword \(w\). This restricts the functionality to dynamic databases that allow only addition of documents but not their removal as the latter would require an update of the look-up table resulting in worse than linear complexity.

In order to extend the \(\mathtt {Outsource}\) protocol (cf. Fig. 2) to outsource a database \(DB = \{( w_i, \mathtt {ix}_i) | 1 \le i \le N\}\), for all unique keywords \(w_i\) in DB, we compose a list \(L_i = \{ (w_i , \mathtt {ind}(\mathtt {ix}_{i_j}), \mathtt {ix}_{i_j}) | (w_i , \mathtt {ix}_{i_j}) \in DB , 1 \le j \le n_i \}\) where \(\mathtt {ind}(\mathtt {ix}_{i_j})\) is the index of the file in DB and \(|L_i| = n_i\). The protocol is then executed for each tuple \((w_i , \mathtt {ind}(\mathtt {ix}_{i_j}), \mathtt {ix}_{i_j} )\) followed by the computation of an additional key \(o_{i_j} \leftarrow \mathtt {KDF}_{2}(t_i, j)\) and the look-up table entry \(T[o_{i_j}] := \mathtt {ind}(\mathtt {ix}_{i_j})\). Once these values are calculated for all entries in DB, the entire encrypted database \(\pmb {C}_{d}\) is sent along with the look-up table T to each server. Notice that \(\pmb {C}_{d}\) preserves the same order of elements from DB, and \(\mathtt {ind}(\cdot )\) should give the same location for both \(\pmb {C}_{d}\) and DB. The \(\mathtt {Retrieve}\) protocol is performed in the same way (cf. Fig. 3), except each server receives \((t, \mu _{\mathtt {sk}_{d}}, {\{o_{i_j}\}}_{j = 1}^{n_i} )\) for \(|L_i| = n_i\). Servers use keys \({\{o_{i_j}\}}_{j = 1}^{n_i}\) to identify entries \({\{ C_i, \mathtt {ix}_{i_j})\}}_{j=1}^{n_i}\) in \(\pmb {C}_{d}\) based on \(\mathtt {ind}(\mathtt {ix}_{i_j})\) from the look-up table T. To stop adversaries from trivially differentiating based on the list size, [19] extends DB to \(DB^*\) with dummy documents, such that all lists have the same size \(|L_i| = n\), for \(n = max_i\{n_i\}\). The resulting version of PASE would achieve the lower search bound of \(\mathcal {O}(\log {}D)\) but have the aforementioned limitation on the removal of documents. It intuitively satisfies the same security guarantees as the original version based on the fact that each outsource operation can be seen as an outsource of a new independent static database.

3.5 Extensions with multiple keywords

In the given specification of our PASE construction, users can use only one keyword w in each execution of \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\) protocols at a time. Often, users may want to be able to outsource or search for documents associated with multiple keywords. Our PASE scheme can be extended to provide efficient support for multiple keywords. Let \(\pmb {w}=(w_1,\ldots ,w_n)\) be a set of outsourced keywords for some document \(\mathtt {ix}\) and let \(\pmb {w}'=(w'_1,\ldots ,w'_m)\) be a set of searched keywords. In the following, we show how to support (i) outsourcing of \(\mathtt {ix}\) with \(\pmb {w}\) through a single session of the \(\mathtt {Outsource}\) protocol and (ii) search for all suitable documents \(\mathtt {ix}\) using \(\pmb {w}'\) through a single session of the \(\mathtt {Retrieve}\) protocol, based on three different types of search queries [11]: conjunctive queries (\(\pmb {w}=\pmb {w}'\)), disjunctive queries \((|\pmb {w}\cap \pmb {w}'|>0)\), and those for a subset of keywords (\(\pmb {w}'\subseteq \pmb {w}\)).

3.5.1 Outsourcing documents with multiple keywords

In order to outsource some document \(\mathtt {ix}\) associated with multiple keywords \(\pmb {w} = (w_1, \ldots ,w_n)\), user \(\mathtt {U}\) can compute \(\pmb {v}=(v_1,\ldots ,v_n)\), \(t_i \leftarrow \mathtt {KDF}_{2}(K, w_i)\) and \(v_i \leftarrow \mathtt {PRF}(t_i, e)\) for \(i = 1, \cdots , n\), and \(\mu _{c}\leftarrow \mathtt {Tag}(\mathtt {mk}_{u},(e, \pmb {v}))\) as part of the same \(\mathtt {Outsource}\) execution and outsource \(C \leftarrow (e, \pmb {v},\mu _{c})\) as the resulting ciphertext to both servers.

3.5.2 Search queries with multiple keywords

In order to search for documents using multiple keywords, i.e., \(w'_1 , \ldots , w'_m\), \(m\le n\), within a single execution of the \(\mathtt {Retrieve}\) protocol, user \(\mathtt {U}\) can send a set of authenticated trapdoors \(t_i = \mathtt {KDF}_{2}(K, w_i)\) for all searched keywords \(w'_i\), \(i = 1,\cdots , m\) to both servers. Then, for all \((C, \mathtt {ix})=(e,\pmb {v},\mu _{c},\mathtt {ix})\) stored in the database \(\pmb {C}_{d}\), server \(\mathtt {S}_{d}\) can initialize an empty output set \(\pmb {A}_{d}\), compute \(\pmb {v}'=(v'_1 ,\cdots ,v'_m)\) where \(v'_i = \mathtt {PRF}(t_i, e)\), \(i = 1, \ldots , m\), and update the output set \(\pmb {A}_{d}\leftarrow \pmb {A}_{d}\cup (C, \mathtt {ix})\) according to the following conditions, depending on the type of search search query, i.e.,

  • for conjunctive queries \(w'_1 \wedge \ldots \wedge w'_m\): if \(\pmb {v}=\pmb {v}'\)

  • for disjunctive queries \(w'_1 \vee \ldots \vee w'_m\): if \(|\pmb {v}\cap \pmb {v}'| > 0\)

  • for subset queries \((w'_1, \ldots , w'_m)\subseteq \pmb {w}\): if \(\pmb {v}'\subseteq \pmb {v}\).

3.6 Password change

Our PASE scheme allows users to change their passwords without changing the encryption keys \(K\). The latter requirement is crucial since otherwise all outsourced keywords would need to be re-encrypted. In the following, we describe how a user can change current password \(\pi \) to a new password \(\pi ^{*}\) depending on whether the user still knows \(\pi \) or has forgotten it.

3.6.1 Changing known passwords

A new password \(\pi ^{*}\) can be registered with the knowledge of the current \(\pi \) as follows:

  1. 1.

    User \(\mathtt {U}\) sends \(A\leftarrow g^{a}h^{\pi }\) to both servers (as in \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\)). Each server \(\mathtt {S}_{d}\), \(d\in \{0,1\}\) uses its \(\mathtt {info} _{d}\) to respond with \((Y,Z_{d},g^{r_2},C_{\pi },\mu _{d})\) where \(\mu _{d}\leftarrow \mathtt {Tag}(\mathtt {mk}_{d},(Y,Z_{d},g^{r_2},C_{\pi }))\).

  2. 2.

    Upon reconstructing \(\mathtt {mk}_{d}\) as in \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\) protocols and verifying \(\mu _{d}\), the user picks random \(r^{*}{\mathop {\leftarrow }\limits ^{\$}}\mathbb {Z}_q^*\), computes \(C_{\pi ^{*}}\leftarrow (C_{\pi }h^{\hbox {-}\pi })^{r^{*}}h^{\pi ^{*}}\) and \(\mu _{d}^{*}\leftarrow \mathtt {Tag}(\mathtt {mk}_{d},(g^{r_2})^{r^{*}},C_{\pi ^{*}})\), and sends \((g^{r_2r^{*}},C_{\pi ^{*}},\mu _{d}^{*})\) to both servers.

  3. 3.

    If \(\mathtt {Vrfy}(\mathtt {mk}_{d},(g^{r_2r^{*}},C_{\pi ^{*}}),\mu _{d}^{*})=1\), then each \(\mathtt {S}_{d}\) replaces \((g^{r_2},C_{\pi })\) in its \(\mathtt {info} _{d}\) with \((g^{r_2r^{*}},C_{\pi ^{*}})\).

Note that the current \(\pi \) is used implicitly to authenticate the user toward both servers.

3.6.2 Changing forgotten passwords

The above procedure cannot be executed if the user has forgotten her current password \(\pi \). In this case, the user can no longer implicitly authenticate itself during the password change procedure. Since our PASE construction relies only on passwords, we naturally need to assume some alternative fall-back authentication mechanism (e.g., similar to those used on the web) that would be able to distinguish legitimate users from potential impersonators. We assume that a fall-back authentication mechanisms is in place which allows the user to independently set up secure channels with each of the two servers \(\mathtt {S}_{d}\), \(d\in \{0,1\}\). The establishment of such channels still leaves us with a challenge to register a new password \(\pi ^{*}\) for that user without changing the previously registered encryption key \(K\). We observe that upon the initial registration the encryption key \(K\) satisfies the following equation \(K_{0}K_{1}= X^{r_{1}}K\) where \(X= g^{x_{0}+x_{1}}\) and \((K_{d}, x_{d})\) is known only to the corresponding \(\mathtt {S}_{d}\). Moreover, the current password \(\pi \) is encrypted in the ElGamal ciphertext \((g^{r_{2}}, C_{\pi }= X^{r_{2}}h^{\pi })\) stored on both servers. In the following password change protocol, this ElGamal ciphertext is replaced with \((g^{r_{2}^{*}}, C_{\pi ^{*}}= X^{r_{2}^{*}}h^{\pi ^{*}})\) for the new password \(\pi ^{*}\) such that the underlying base \(X\) remains unchanged:

  1. 1.

    Each server \(\mathtt {S}_{d}\), \(d\in \{0,1\}\), computes \(X_{d}\leftarrow g^{x_{d}}\) using \(x_{d}\) from \(\mathtt {info} _{d}\) and sends \(X_{d}\) to \(\mathtt {U}\) over the previously established secure channel.

  2. 2.

    User \(\mathtt {U}\) picks random \(r_{2}^{*}{\mathop {\leftarrow }\limits ^{\$}}\mathbb {Z}_q^*\), computes \(C_{\pi ^{*}}\leftarrow (g^{x_{0}}g^{x_{1}})^{r_{2}^{*}}h^{\pi ^{*}}\) and responds with \((g^{r_{2}^{*}},C_{\pi ^{*}})\) to \(\mathtt {S}_{d}\).

  3. 3.

    Each server \(\mathtt {S}_{d}\) replaces \((g^{r_{2}}, C_{\pi })\) with \((g^{r_{2}^{*}},C_{\pi ^{*}})\) in its \(\mathtt {info} _{d}\).

This password change protocol can be seen as a compressed version of the registration protocol. Jumping ahead of Sect. 4, we observe that the newly registered password \(\pi ^{*}\) remains protected against an adversary who can compromise at most one of the two servers under the same assumptions as the old password \(\pi \).

4 Security analysis

In the following, we prove the security of our direct PASE scheme using our definitions from Sect. 2.2. In the proofs, we adopt the standard game-hopping technique. Let \(\mathtt {succ}_n\) denote the event that the adversary wins in the experiment n.

4.1 \(\mathtt {IND} \hbox {-}\mathtt {CKA}\)-security of our PASE scheme

Theorem 1

Our direct PASE construction is \(\mathtt {IND} \hbox {-}\mathtt {CKA}\)-secure assuming the hardness of the \(\mathtt {DDH}\) problem and security of \(\mathtt {KDF}_{1}\), \(\mathtt {KDF}_{2}\), \(\mathtt {PRF}\) and \(\mathtt {MAC}\).

Proof

Experiment \(Exp_{0}^{\mathtt {IND}}\). The simulator initializes \(\pmb {\tau },i^{*},j, \mathtt {Set}\) and \(\mathtt {par}\leftarrow \{\mathbb {G},q,g,h,H,\mathtt {KDF}_{1},\mathtt {KDF}_{2},\mathtt {PRF},\mathtt {MAC}\}\) as defined in the real security experiment \(Exp _{\mathtt {PASE},\mathcal {A}}^{\mathtt {IND} \hbox {-}\mathtt {CKA}\hbox {-}b}(\kappa )\). The oracles \(\mathtt {Ch_{\mathtt {ind}}}(b,\cdot ,\cdot ,\cdot ,\cdot )\), \(\mathtt {Reg}(\cdot )\), \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {Ret}(\cdot ,\cdot )\) and \(\mathtt {RetS}(\cdot )\) are implemented as follows.

  • \(\mathtt {Ch_{\mathtt {ind}}}(b,\cdot ,\cdot ,\cdot ,\cdot )\): on input \((i,w_{0},w_{1},\mathtt {ix}^{*})\), the oracle aborts if \(((i^{*}\ge 0)\vee (i\ge j)\vee ((i,w_{0}) \in \mathtt {Set})\vee ((i,w_{1}) \in \mathtt {Set}))\); otherwise, it sets \(i^{*}\leftarrow i\) and invokes oracle \(\mathtt {Out}(i^{*},w_{b},\mathtt {ix}^{*})\).

  • \(\mathtt {Reg}(\cdot )\): on input \(d\in \{0,1\}\), the simulator randomly selects fresh \(\pi {\mathop {\leftarrow }\limits ^{\$}}\mathcal {D}\) and \(K{\mathop {\leftarrow }\limits ^{\$}}\mathbb {G}\) and initializes an empty database \(\pmb {C}_{d,j}\). The simulator and \(\mathcal {A}\) complete the \(\mathtt {Register}\) protocol, where the simulator plays the roles of \(\mathtt {U}\) and \(\mathtt {S}_{d}\), and \(\mathcal {A}\) plays the role of \(\mathtt {S}_{1\hbox {-}d}\). The oracle sends j to \(\mathcal {A}\) as a session identifier. Finally, it records \(\pmb {\tau }[j]\leftarrow (d, \pi , \mathtt {info} _{d}, r_{2}, x_{1\hbox {-}d})\), \(\mathtt {info} _{d}\leftarrow (\mathtt {S}_{1\hbox {-}d},x_{d},g^{r_{1}},g^{r_{2}},C_{\pi }, K_{d},\mathtt {mk}_{d})\), increments \(j\leftarrow j+1\), and stores \(r_{2}\) and \(x_{1\hbox {-}d}\) for later use in the proof.

  • \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\): on input \((i,w,\mathtt {ix})\), the simulator aborts if \((i\ge j)\); otherwise, it obtains \((d, \pi , \mathtt {info} _{d}, r_{2}, x_{1\hbox {-}d})\leftarrow \pmb {\tau }[i]\). Then, the simulator plays the roles of \(\mathtt {U}\) and \(\mathtt {S}_{d}\) and interacts with \(\mathcal {A}\) who plays the role of \(\mathtt {S}_{1\hbox {-}d}\) in the \(\mathtt {Outsource}\) protocol.

  • \(\mathtt {Ret}(\cdot ,\cdot )\): on input \((i,w)\), the simulator aborts if \((i\ge j)\vee ((i=i^{*})\wedge (w\in \{w_{0},w_{1}\}))\); or otherwise, it obtains \((d, \pi , \mathtt {info} _{d}, r_{2}, x_{1\hbox {-}d})\leftarrow \pmb {\tau }[i]\). Then, it plays the roles of \(\mathtt {U}\) and \(\mathtt {S}_{d}\) and interacts with \(\mathcal {A}\) who plays the role of \(\mathtt {S}_{1\hbox {-}d}\) party in the \(\mathtt {Retrieve}\) protocol. Finally, the simulator computes \(\mathtt {Set}\leftarrow \mathtt {Set}\cup (i,w)\) if \((i^{*}=-1)\).

  • \(\mathtt {RetS}(\cdot )\): on input i, the simulator aborts if \((i\ge j)\); otherwise, it obtains parameters \((d, \pi , \mathtt {info} _{d}, r_{2}, x_{1\hbox {-}d})\leftarrow \pmb {\tau }[i]\) and executes \(\mathtt {Retrieve}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {info} _{d})\).

\(\square \)

Lemma 1

\(Adv _{\mathtt {PASE},\mathcal {A}}^{\mathtt {IND} \hbox {-}\mathtt {CKA}}(\kappa )=\Pr [\mathtt {succ}_{0}^{\mathtt {IND}}]-1/2 \)

Experiment \(Exp_{1}^{\mathtt {IND}}\). This experiment is similar to \(Exp_{1}^{\mathtt {IND}}\) except that the simulator aborts if some value for \(y_{d}\) used on behalf of honest server \(\mathtt {S}_{d}\) appears in two different protocol sessions through oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {Ret}(\cdot ,\cdot )\) and \(\mathtt {RetS}(\cdot )\).

Lemma 2

\(\Pr [\mathtt {succ}_{1}^{\mathtt {IND}}]=\Pr [\mathtt {succ}_{1}^{\mathtt {IND}}]\)

Experiment \(Exp_{2}^{\mathtt {IND}}\). This experiment is similar to \(Exp_{2}^{\mathtt {IND}}\) except that the simulator aborts if some value for \(Y\) appears in two different protocol sessions executed through oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {Ret}(\cdot ,\cdot )\) and \(\mathtt {RetS}(\cdot )\).

  1. 1.

    By the perfect hiding property of Pedersen commitments, value \(Y_{1\hbox {-}d}\) is guaranteed to be independent from \(Y_{d}\) because the adversary acquires nothing from \(\mathtt {c}_{d}\).

  2. 2.

    Due to the binding property of Pedersen commitments, which is based on the hardness of the DL problem, it is hard to open \(\mathtt {c}_{1\hbox {-}d}\) to a different \(Y_{1\hbox {-}d}'\ne Y_{1\hbox {-}d}\).

Since \(Y_{1\hbox {-}d}\) is guaranteed to be independent from \(Y_{d}\); and \(Y_{d}\) is fresh, we can follow that \(Y\) is fresh based on the hardness of the DL problem.

Lemma 3

\(|\Pr [\mathtt {succ}_{2}^{\mathtt {IND}}]-\Pr [\mathtt {succ}_{2}^{\mathtt {IND}}]|\le Adv _{\mathcal {A}}^{\mathtt {DL}}(\kappa )\)

Experiment \(Exp_{3}^{\mathtt {IND}}\). This experiment is similar to \(Exp_{3}^{\mathtt {IND}}\) except that in oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {Ret}(\cdot ,\cdot )\) and \(\mathtt {RetS}(\cdot )\), the message \((Z_{d},\mu _{d})\) from the honest server \(\mathtt {S}_{d}\) to the user is replaced with \((E,\mu _{d}')\) where \(E{\mathop {\leftarrow }\limits ^{\$}}\mathbb {G}\) and \(\mu _{d}'\leftarrow \mathtt {Tag}(\mathtt {mk}_{d},A,Y,E)\). We discuss the following two cases:

  1. 1.

    For the oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\) and \(\mathtt {Ret}(\cdot ,\cdot )\), let \((g,g^{\alpha },g^{\beta },Q)\) be an instance of the \(\mathtt {DDH}\) problem, the simulator aims to output 1 if \(Q=g^{\alpha \beta }\); or 0 otherwise. The simulator sets \(A\leftarrow g^{\alpha }h^{\pi }\), \(Y_{d}\leftarrow g^{\beta }\), \(R_{d}\leftarrow (g^{\beta })^{r_{2}}\) and

    $$\begin{aligned} Z_{d}\leftarrow K_{d}(g^{\beta })^{r_{2}(x_{0}+x_{1})}Q^{\hbox {-}1}(g^{r_{1}} \cdot (g^{\beta })^{r_{2}} \cdot R_{1\hbox {-}d})^{\hbox {-}x_{d}} \end{aligned}$$

    If \(Q=g^{\alpha \beta }\), this experiment is identical to \(Exp_{3}^{\mathtt {IND}}\); otherwise, to \(Exp_{3}^{\mathtt {IND}}\). The hardness of the DDH problem implies the indistinguishability of \(Exp_{3}^{\mathtt {IND}}\) from \(Exp_{3}^{\mathtt {IND}}\).

  2. 2.

    For oracle \(\mathtt {RetS}(\cdot )\), assume \(\pi '\) is the password tried by \(\mathcal {A}\), the key \(K\) (in \(Exp_{3}^{\mathtt {IND}}\)) is equal to \(Z_{0}Z_{1}Y^{a}h^{(\pi -\pi ')(y_{0}+y_{1})}\); under the \(\mathtt {DDH}\) assumption, the adversary cannot distinguish \(h^{(\pi -\pi ')(y_{0}+y_{1})}\) (in \(Exp_{3}^{\mathtt {IND}}\)) from a random number in \(\mathbb {G}\) (in \(Exp_{3}^{\mathtt {IND}}\)) unless \(\pi '=\pi \) which denotes a successful on-line dictionary attack. By the uniform distribution of passwords, its probability is estimated as \(q_s\cdot Adv _{\mathcal {A}}^{\mathtt {DDH}}(\kappa )+\frac{q_s}{|\mathcal {D}|}\).

Lemma 4

\(|\Pr [\mathtt {succ}_{3}^{\mathtt {IND}}]-\Pr [\mathtt {succ}_{3}^{\mathtt {IND}}]|\le (q_s+1)Adv _{\mathcal {A}}^{\mathtt {DDH}}(\kappa )+\frac{q_s}{|\mathcal {D}|}\)

Experiment \(Exp_{4}^{\mathtt {IND}}\). This experiment is similar to \(Exp_{4}^{\mathtt {IND}}\) except that in each session i, values \(\mathtt {mk}_{u}\leftarrow \mathtt {KDF}_{1}(K,\mathtt {U} ,\hbox {`}0\hbox {'})\), \(\mathtt {mk}_{d}\leftarrow \mathtt {KDF}_{1}(K,\mathtt {S}_{d} ,\hbox {`}1\hbox {'})\), \(\mathtt {mk}_{1\hbox {-}d}\leftarrow \mathtt {KDF}_{1}(K,\mathtt {S}_{1\hbox {-}d} ,\hbox {`}1\hbox {'}) \) are replaced with \(\mathtt {mk}_{u}\leftarrow F_{1}(i,\mathtt {U},\hbox {`}0\hbox {'})\), \(\mathtt {mk}_{d}\leftarrow F_{1}(i,\mathtt {S}_{d},\hbox {`}1\hbox {'})\) and \(\mathtt {mk}_{1\hbox {-}d}\leftarrow F_{1}(i,\mathtt {S}_{1\hbox {-}d},\hbox {`}1\hbox {'})\), respectively. A table \(T_{1}\) is initialized to be empty in the beginning of \(Exp_{4}^{\mathtt {IND}}\). The deterministic function \(F_{1}:\{0,1\}^{*}\rightarrow \mathcal {K}_{\mathtt {MAC}}\) is defined as follows: if \(\exists (i, id, k ,\mathtt {mk})\in T_{1}\), then \(F_{1}(i, id, k)\) returns \(\mathtt {mk}\); otherwise, the simulator randomly picks a fresh \(\mathtt {mk}{\mathop {\leftarrow }\limits ^{\$}}\mathcal {K}_{\mathtt {MAC}}\), stores \((i, id, k ,\mathtt {mk})\) in \(T_{1}\) and returns \(\mathtt {mk}\) where fresh means that no record of the form \((\cdot ,\cdot ,\cdot ,\mathtt {mk})\in T_{1}\) exists so far. Since \(\mathcal {A}\) only acquires \(\mathtt {mk}_{1\hbox {-}d}\), by the uniform distribution of \(K\) and the security of \(\mathtt {KDF}_{1}\), we obtain

Lemma 5

\(|\Pr [\mathtt {succ}_{4}^{\mathtt {IND}}]-\Pr [\mathtt {succ}_{4}^{\mathtt {IND}}]|\le q_{r}\cdot Adv _{\mathcal {A}}^{\mathtt {KDF}}(\kappa )\)

Experiment \(Exp_{5}^{\mathtt {IND}}\). This experiment is similar to \(Exp_{5}^{\mathtt {IND}}\) except that in each session i of oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\) and \(\mathtt {Ret}(\cdot ,\cdot )\), value \(t\leftarrow \mathtt {KDF}_{2}(K,w)\) is replaced with \(t\leftarrow F_{2}(i,w)\). \(T_{2}\) is initialized as an empty table in the beginning of \(Exp_{5}^{\mathtt {IND}}\). \(F_{2}\) returns \(t\) if \(\exists (i,w,t)\in T_{2}\); otherwise, \(F_{2}\) picks a fresh \(t{\mathop {\leftarrow }\limits ^{\$}}\mathcal {K}_{\mathtt {PRF}}\), stores \((i,w,t)\) in \(T_{2}\) and returns \(t\) where fresh means that no record of the form \((\cdot ,\cdot ,t)\) exists in \(T_{2}\). By the uniform distribution of \(K\) and the security of \(\mathtt {KDF}_{2}\), we have

Lemma 6

\(|\Pr [\mathtt {succ}_{5}^{\mathtt {IND}}]-\Pr [\mathtt {succ}_{5}^{\mathtt {IND}}]|\le (q_{o}+q_{t})Adv _{\mathcal {A}}^{\mathtt {KDF}}(\kappa )\)

Experiment \(Exp_{6}^{\mathtt {IND}}\). This experiment is similar to \(Exp_{6}^{\mathtt {IND}}\) except for one of the following cases:

  1. 1.

    For the oracle \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), the adversary successfully forges \(((C, \mathtt {ix}),\mu _{\mathtt {sk}_{d}})\) which satisfies \(\mathtt {Vrfy}(\mathtt {sk}_{d},(C, \mathtt {ix}),\mu _{\mathtt {sk}_{d}})=1\).

  2. 2.

    For the oracles \(\mathtt {Ret}(\cdot ,\cdot )\) or \(\mathtt {RetS}(\cdot )\), the adversary successfully forges \((t,\mu _{\mathtt {sk}_{d}})\) which satisfies \(\mathtt {Vrfy}(\mathtt {sk}_{d},t,\mu _{\mathtt {sk}_{d}})=1\).

By the unforgeability of \(\mathtt {MAC}\), we have

Lemma 7

\(|\Pr [\mathtt {succ}_{6}^{\mathtt {IND}}]-\Pr [\mathtt {succ}_{6}^{\mathtt {IND}}]|\le (q_{o}+q_{t}+q_s)Adv _{\mathcal {A}}^{\mathtt {MAC}}(\kappa )\)

Experiment \(Exp_{7}^{\mathtt {IND}}\). This experiment is similar to \(Exp_{7}^{\mathtt {IND}}\) except that in oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\) and \(\mathtt {Ret}(\cdot ,\cdot )\), the value \(v\) is set in a different way. Let \(\mathcal {O}_{\mathtt {PRF}}(\cdot )\) be the oracle from the security experiment of the pseudorandom function \(\mathtt {PRF}\); and let \(T_{v}\) be initialized as an empty table in the beginning of \(Exp_{7}^{\mathtt {IND}}\). When the simulator needs to compute \(v\leftarrow \mathtt {PRF}(t,e)\) in session i, it obtains \(v\) using table \(T_{v}\). If \(\exists (i,t,e,r_{v},v)\in T_{v}\), the simulator uses \(v\) from \(T_{v}\); otherwise, it randomly picks \(r_{v} {\mathop {\leftarrow }\limits ^{\$}}\{0,1\}^{\kappa }\), stores \((i,t,e,r_{v},\mathcal {O}_{\mathtt {PRF}}(r_{v}))\) in \(T_{v}\) and obtains \(v\leftarrow \mathcal {O}_{\mathtt {PRF}}(r_{v})\). Assuming the pseudorandomness of \(\mathtt {PRF}\), we have

Lemma 8

\(\Pr [\mathtt {succ}_{7}^{\mathtt {IND}}]\le 1/2 +(q_{o}+q_{t})Adv _{\mathcal {A}}^{\mathtt {PRF}}(\kappa ) \)

As a consequence, based on Lemmas 1 to 8 we can conclude that our proposed PASE construction is \(\mathtt {IND} \hbox {-}\mathtt {CKA}\)-secure assuming the intractability of the \(\mathtt {DDH}\) problem and security of \(\mathtt {KDF}_{1}\), \(\mathtt {KDF}_{2}\), \(\mathtt {PRF}\) and \(\mathtt {MAC}\).

4.2 Authentication property of our PASE scheme

Theorem 2

Our proposed PASE construction provides authentication based on the hardness of the \(\mathtt {DDH}\) problem and security of \(\mathtt {KDF}_{1}\), \(\mathtt {KDF}_{2}\) and \(\mathtt {MAC}\).

Proof

Experiment \(Exp_{0}^{\mathtt {Auth}}\). The simulator initializes \(\pmb {\tau },j, \mathtt {Set}\) and \(\mathtt {par}\leftarrow \{\mathbb {G},q,g,h,H, \mathtt {KDF}_{1}, \mathtt {KDF}_{2}, \mathtt {PRF}, \mathtt {MAC}\}\) as defined in the real security experiment \(Exp _{\mathtt {PASE},\mathcal {A}}^{\mathtt {Auth}}(\kappa )\). The oracles \(\mathtt {Reg}(\cdot )\), \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {OutS}(\cdot )\) and \(\mathtt {Ret}(\cdot ,\cdot )\) are executed by the simulator as follows.

  • \(\mathtt {Reg}(\cdot )\): on input \(d\in \{0,1\}\), the simulator randomly selects a fresh \(\pi {\mathop {\leftarrow }\limits ^{\$}}\mathcal {D}\) and \(K{\mathop {\leftarrow }\limits ^{\$}}\mathbb {G}\) and initializes an empty database \(\pmb {C}_{d,j}\). Then, the simulator and \(\mathcal {A}\) execute the \(\mathtt {Register}\) protocol, where the simulator plays the role of \(\mathtt {U},\mathtt {S}_{d}\) and \(\mathcal {A}\) plays the role of \(\mathtt {S}_{1\hbox {-}d}\). The simulator then sends j to \(\mathcal {A}\) as a session identifier. Finally, the simulator records \(\pmb {\tau }[j]\leftarrow (d, \pi , \mathtt {info} _{d}, r_{2}, x_{1\hbox {-}d})\), \(\mathtt {info} _{d}\leftarrow (\mathtt {S}_{1\hbox {-}d},x_{d},g^{r_{1}},g^{r_{2}},C_{\pi }, K_{d},\mathtt {mk}_{d})\), increments \(j\leftarrow j+1\), and stores \(r_{2}\) and \(x_{1\hbox {-}d}\) for later use in the proof.

  • \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\): on input \((i,w,\mathtt {ix})\), the simulator aborts if \((i\ge j)\); otherwise, it obtains \((d, \pi , \mathtt {info} _{d}, r_{2}, x_{1\hbox {-}d})\leftarrow \pmb {\tau }[i]\). Then, it sets \(\mathtt {Set}\leftarrow \mathtt {Set}\cup (i,w,\mathtt {ix})\). Finally, it plays the roles of \(\mathtt {U}\) and \(\mathtt {S}_{d}\) and interacts with \(\mathcal {A}\) who plays the role of \(\mathtt {S}_{1\hbox {-}d}\) party in the \(\mathtt {Outsource}\) protocol.

  • \(\mathtt {OutS}(\cdot )\): on input i, the simulator aborts if \((i\ge j)\); otherwise, it obtains parameters \((d, \pi , \mathtt {info} _{d}, r_{2}, x_{1\hbox {-}d})\leftarrow \pmb {\tau }[i]\) and executes \(\mathtt {Outsource}\mathtt {S}_{d}(\mathtt {par},\mathtt {U},\mathtt {info} _{d})\).

  • \(\mathtt {Ret}(\cdot ,\cdot )\): on input \((i,w)\), the simulator aborts if \((i\ge j)\); otherwise, it obtains parameters \((d, \pi , \mathtt {info} _{d}, r_{2}, x_{1\hbox {-}d})\leftarrow \pmb {\tau }[i]\). Then, it plays the roles of \(\mathtt {U}\) and \(\mathtt {S}_{d}\) and interacts with \(\mathcal {A}\) who plays the role of \(\mathtt {S}_{1\hbox {-}d}\) in the \(\mathtt {Retrieve}\) protocol.

\(\square \)

Lemma 9

\(Adv _{\mathtt {PASE},\mathcal {A}}^{\mathtt {Auth}}(\kappa )=\Pr [\mathtt {succ}_{0}^{\mathtt {Auth}}]\)

Experiment \(Exp_{1}^{\mathtt {Auth}}\). This experiment is similar to \(Exp_{0}^{\mathtt {Auth}}\) except that the value \(y_{d}\) is ensured to be fresh in every session executed by the simulator through the oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {OutS}(\cdot )\) and \(\mathtt {Ret}(\cdot ,\cdot )\).

Lemma 10

\(\Pr [\mathtt {succ}_{0}^{\mathtt {Auth}}]=\Pr [\mathtt {succ}_{1}^{\mathtt {Auth}}]\)

Experiment \(Exp_{2}^{\mathtt {Auth}}\). This experiment is similar to \(Exp_{1}^{\mathtt {Auth}}\) except that the simulator aborts if a value for \(Y\) repeats in two different sessions of the protocol executed by the simulator through oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {OutS}(\cdot )\), and \(\mathtt {Ret}(\cdot ,\cdot )\).

  1. 1.

    By the perfect hiding of Pedersen commitments, values of \(Y_{1\hbox {-}d}\) are guaranteed to be independent from \(Y_{d}\) because the adversary acquires nothing from \(\mathtt {c}_{d}\).

  2. 2.

    Because of the binding property of Pedersen commitments, which is based on the hardness of the DL problem, it is hard to open \(\mathtt {c}_{1\hbox {-}d}\) to a different value \(Y_{1\hbox {-}d}'\ne Y_{1\hbox {-}d}\).

Since \(Y_{1\hbox {-}d}\) is guaranteed to be independent from \(Y_{d}\) and \(Y_{d}\) is fresh, the freshness of \(Y\) is implied by the hardness of the DL problem.

Lemma 11

\(|\Pr [\mathtt {succ}_{1}^{\mathtt {Auth}}]-\Pr [\mathtt {succ}_{2}^{\mathtt {Auth}}]|\le Adv _{\mathcal {A}}^{\mathtt {DL}}(\kappa )\)

Experiment \(Exp_{3}^{\mathtt {Auth}}\). This experiment is similar to \(Exp_{2}^{\mathtt {Auth}}\) except that in oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\), \(\mathtt {Ret}(\cdot ,\cdot )\) and \(\mathtt {OutS}(\cdot )\), the message \((Z_{d},\mu _{d})\) from the honest server \(\mathtt {S}_{d}\) to the user is replaced with \((E,\mu _{d}')\) where \(E{\mathop {\leftarrow }\limits ^{\$}}\mathbb {G}\) and \(\mu _{d}'\leftarrow \mathtt {Tag}(\mathtt {mk}_{d},A,Y,E)\). We consider the following two case:

  1. 1.

    For oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\) and \(\mathtt {Ret}(\cdot ,\cdot )\), let \((g,g^{\alpha },g^{\beta },Q)\) be an instance of the \(\mathtt {DDH}\) problem, the simulator aims to output 1 if \(Q=g^{\alpha \beta }\); or 0 otherwise. The simulator sets \(A\leftarrow g^{\alpha }h^{\pi }\), \(Y_{d}\leftarrow g^{\beta }\), \(R_{d}\leftarrow (g^{\beta })^{r_{2}}\) and

    $$\begin{aligned} Z_{d}\leftarrow K_{d}(g^{\beta })^{r_{2}(x_{0}+x_{1})}Q^{\hbox {-}1}(g^{r_{1}} \cdot (g^{\beta })^{r_{2}} \cdot R_{1\hbox {-}d})^{\hbox {-}x_{d}} \end{aligned}$$

    If \(Q=g^{\alpha \beta }\), then this experiment is identical to \(Exp_{2}^{\mathtt {Auth}}\); otherwise, it is identical to \(Exp_{3}^{\mathtt {Auth}}\). The hardness of the DDH problem directly implies the indistinguishability of \(Exp_{2}^{\mathtt {Auth}}\) from \(Exp_{3}^{\mathtt {Auth}}\).

  2. 2.

    For the oracle \(\mathtt {OutS}(\cdot )\), assume \(\pi '\) is a password used by the adversary, the key \(K\) (in \(Exp_{2}^{\mathtt {Auth}}\)) is equal to \(Z_{0}Z_{1}Y^{a}h^{(\pi -\pi ')(y_{0}+y_{1})}\); under the \(\mathtt {DDH}\) assumption, the adversary cannot distinguish \(h^{(\pi -\pi ')(y_{0}+y_{1})}\) (in \(Exp_{2}^{\mathtt {Auth}}\)) from a random number in \(\mathbb {G}\) (in \(Exp_{3}^{\mathtt {Auth}}\)) unless \(\pi '=\pi \) which denotes a successful on-line dictionary attack. By the uniform distribution of passwords, its probability is estimated as \(q_s\cdot Adv _{\mathcal {A}}^{\mathtt {DDH}}(\kappa )+\frac{q_s}{|\mathcal {D}|}\).

Lemma 12

\(|\Pr [\mathtt {succ}_{2}^{\mathtt {Auth}}]-\Pr [\mathtt {succ}_{3}^{\mathtt {Auth}}]|\le (q_s+1)Adv _{\mathcal {A}}^{\mathtt {DDH}}(\kappa )+\frac{q_s}{|\mathcal {D}|}\)

Experiment \(Exp_{4}^{\mathtt {Auth}}\). This experiment is similar to \(Exp_{3}^{\mathtt {Auth}}\) except that in each session i, values for \(\mathtt {mk}_{u}\leftarrow \mathtt {KDF}_{1}(K,\mathtt {U} ,\hbox {`}0\hbox {'})\), \(\mathtt {mk}_{d}\leftarrow \mathtt {KDF}_{1}(K,\mathtt {S}_{d} ,\hbox {`}1\hbox {'})\), \(\mathtt {mk}_{1\hbox {-}d}\leftarrow \mathtt {KDF}_{1}(K,\mathtt {S}_{1\hbox {-}d} ,\hbox {`}1\hbox {'}) \) are replaced with \(\mathtt {mk}_{u}\leftarrow F_{1}(i,\mathtt {U},\hbox {`}0\hbox {'})\), \(\mathtt {mk}_{d}\leftarrow F_{1}(i,\mathtt {S}_{d},\hbox {`}1\hbox {'})\) and \(\mathtt {mk}_{1\hbox {-}d}\leftarrow F_{1}(i,\mathtt {S}_{1\hbox {-}d},\hbox {`}1\hbox {'})\), respectively. A table \(T_{1}\) is initialized to be empty in the beginning of \(Exp_{4}^{\mathtt {Auth}}\). A deterministic function \(F_{1}:\{0,1\}^{*}\rightarrow \mathcal {K}_{\mathtt {MAC}}\) is defined as follows: if \(\exists (i, id, k ,\mathtt {mk})\in T_{1}\), \(F_{1}(i, id, k)\) then return \(\mathtt {mk}\); otherwise, the simulator randomly picks a fresh \(\mathtt {mk}{\mathop {\leftarrow }\limits ^{\$}}\mathcal {K}_{\mathtt {MAC}}\), stores \((i, id, k ,\mathtt {mk})\) on \(T_{1}\) and returns \(\mathtt {mk}\leftarrow F_{1}(i, id, k)\) where fresh means that no record of the form \((\cdot ,\cdot ,\cdot ,\mathtt {mk})\in T_{1}\) exists so far. Since the adversary only acquires \(\mathtt {mk}_{1\hbox {-}d}\), by the uniform distribution of \(K\) as well as the security of \(\mathtt {KDF}_{1}\), we obtain

Lemma 13

\(|\Pr [\mathtt {succ}_{3}^{\mathtt {Auth}}]-\Pr [\mathtt {succ}_{4}^{\mathtt {Auth}}]|\le q_{r}\cdot Adv _{\mathcal {A}}^{\mathtt {KDF}}(\kappa )\)

Experiment \(Exp_{5}^{\mathtt {Auth}}\). This experiment is similar to \(Exp_{4}^{\mathtt {Auth}}\) except that in each session i for the oracles \(\mathtt {Out}(\cdot ,\cdot ,\cdot )\) and \(\mathtt {Ret}(\cdot ,\cdot )\), the value \(t\leftarrow \mathtt {KDF}_{2}(K,w)\) is replaced with \(t\leftarrow F_{2}(i,w)\). \(T_{2}\) is initialized as an empty table in the beginning of \(Exp_{5}^{\mathtt {Auth}}\). Function \(F_{2}\) returns \(t\) if \(\exists (i,w,t)\in T_{2}\); otherwise, the simulator randomly picks a fresh \(t{\mathop {\leftarrow }\limits ^{\$}}\mathcal {K}_{\mathtt {PRF}}\), stores \((i,w,t)\) on table \(T_{2}\) and returns \(t\) where fresh means that no record of the form \((\cdot ,\cdot ,t)\) exists so far in \(T_{2}\) . By the uniform distribution of \(K\) and the security of \(\mathtt {KDF}_{2}\), we obtain

Lemma 14

\(|\Pr [\mathtt {succ}_{4}^{\mathtt {Auth}}]-\Pr [\mathtt {succ}_{5}^{\mathtt {Auth}}]|\le (q_{o}+q_{t})Adv _{\mathcal {A}}^{\mathtt {KDF}}(\kappa )\)

We observe that \(Exp_{5}^{\mathtt {Auth}}\) is simulated independent the key \(K\). The only probability of winning \(Exp_{5}^{\mathtt {Auth}}\) comes from the adversary successfully forging \(\mu _{c}\) for \((e,v,\mathtt {ix})\) such that \(\mathtt {Vrfy}(\mathtt {mk}_{u},(e,v,\mathtt {ix}),\mu _{c})= 1\). Assuming that \(\mathtt {MAC}\) is unforgeable, we obtain

Lemma 15

\(\Pr [\mathtt {succ}_{5}^{\mathtt {Auth}}]=Adv _{\mathcal {A}}^{\mathtt {MAC}}(\kappa )\)

To sum, by Lemmas 9 to 15, we can conclude that our direct PASE scheme provides authentication based on the hardness of the \(\mathtt {DDH}\) problem and security of \(\mathtt {KDF}_{1}\), \(\mathtt {KDF}_{2}\) and \(\mathtt {MAC}\).

5 PASE in practice: browser-based implementation and performance evaluation

In order to demonstrate the functionality of our PASE scheme, we implemented a stateful web application that can be accessed from any web or mobile browser. Our PASE demonstrator implements the client and server sides of the protocol and comes with a single portal (cf. Fig. 4) through which users can register, outsource/retrieve files based on multiple keywords and change their passwords. The source code is available from https://github.com/Spockuto/surrey-paks.

The entire PASE implementation is written in Javascript with the client side backed by browser’s V8 engineFootnote 1 and the server side backed by NodeJS serverFootnote 2. By choosing JavaScript, we could use Stanford JavaScript Crypto libraryFootnote 3 in the implementation of both sides (client and server) whereby reusing some parts of the code. An alternative would be to use libsodiumFootnote 4 or OpenSSL with a wrapper based on PHP. Since modern applications heavily adopt JavaScript, our implementation can in turn be used as a library to provide support for other applications that wish to use the functionality of PASE.

In the following, we provide a more detailed description of our PASE demonstrator and evaluate performance of its functionality.

Fig. 4
figure 4

PASE portal including \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\) forms

5.1 Cryptographic implementation choices

The following choices of cryptographic parameters and algorithms have been made for our implementation. For the cyclic group \(\mathbb {G}\) of prime order q and its generator g, we use the parameters of the NISTP384 elliptic curve groupFootnote 5. The additional generator h is chosen at random. For the hash function \(H\), we adopt SHA256 (256 bits). Both key derivation functions \(\mathtt {KDF}_{1}\) and \(\mathtt {KDF}_{2}\) are implemented as PBKDF2 (256 bits)Footnote 6. Although PBKDF2 might not be the most efficient on mobile devices, better alternatives such as ARGON2Footnote 7 have not been adopted yet in major cryptographic libraries. Our pseudorandom function \(\mathtt {PRF}\) uses AES256 in GCM mode with the output truncated to 256 bitsFootnote 8. For the message authentication code \(\mathtt {MAC}\), we adopt the standard HMACFootnote 9 construction.

5.2 PASE client and servers

The JavaScript code running on the client includes the main.js file (about 800 LoCs) to manage the requests and formatting and the crypto.js file (300 LoCs) to execute the protocols on the client side. The code running on each server is split into multiple files with the protocol.js (400 LoC) occupying the major part of the implementation. The demonstrator requires NodeJS environment to run and can be deployed instantly.

In our implementation, one server acts as a primary server in that it serves the PASE website and is also used to store all outsourced files. It is helped by the secondary server during the registration, outsourcing and retrieval protocols. The database adopted in our PASE implementation is MongoDBFootnote 10, which is particularly suited for storing and retrieving files. The MongoDB is also used to store all user related information from the registration process. Each server runs its own instance of the database.

Our browser-based demonstrator offers a registration interface where a user can provide its username (e.g., email address) and a chosen password to execute the registration protocol with both servers. Once registered, the user can outsource files and associate them with multiple keywords. Similarly, the user can retrieve outsourced files based on the keywords entered into the corresponding form. Note that for outsourcing and retrieval, the login form must contain the registered username and password. Our demonstrator supports outsourcing and retrieval using multiple keywords, which can be entered into the corresponding box separated by commas. If multiple keywords are used in the retrieval protocol, then the output produced currently will be based on the logic used to define subset queries (cf. Sect. 3.5).

5.2.1 Communication

In our PASE protocols, there are two types of communication which are implemented using different techniques as discussed in the following:

  • The client-server communication which is present in the registration, outsourcing and retrieval protocols is realized in our implementation using AJAX queries which are executed asynchronously to provide better functionality. This is only possible if the server accepts Cross Origin Resourse Sharing which can be easily setup through the NodeJS core library.

  • The server-server communication in the outsourcing and retrieval protocols requires both serves to maintain a shortlived state information for the two communication rounds of the protocol session (cf. Fig. 2). This is realized in our implementation using NodeCacheFootnote 11 functionality which provides a simple and fast internal caching for NodeJS servers.

5.3 Encryption of outsourced files

In our demonstrator, we expand the implemented PASE functionality with the encryption of outsourced files, in addition to encrypted keywords. For this purpose, the client could use the secret-shared key \(K\) which it reconstructs on the client side during the execution of the outsourcing and retrieval protocols. More precisely, the client could use \(K\) to derive another key and use it with some standard symmetric encryption scheme, e.g., AES, to encrypt outsourced files and decrypt them upon retrieval. In addition, to minimize information leakage and distribute the encrypted data among the two servers, we XOR the encrypted data \(\mathtt {ENC}\) with a random stream of data \(\mathtt {RND}\) generated to the length of the encrypted data using FortunaFootnote 12 PRNG. The resulting files \(\mathtt {F_0} \leftarrow \mathtt {ENC} \oplus \mathtt {RND}\) and \(\mathtt {F_1} \leftarrow \mathtt {RND}\) are sent to the respective servers \(\mathtt {S_0}\) and \(\mathtt {S_1}\). The data \(\mathtt {ix}\) (cf. Fig. 2), in this case, would be a concatenation of the encrypted file name with the random IV generated for symmertic encryption(\(ix \leftarrow \mathtt {Enc(Name) || IV }\)). The encrypted file name acts as the encrypted file identifier in the server for querying. During \(\mathtt {Retrieve}\), from \(\mathtt {ix}\), the file name is decrypted using the reconstructed key \(K\) and \(\mathtt {IV}\). The encrypted file name is used to retrieve files \(\mathtt {F_0}\) and \(\mathtt {F_1}\) from servers \(\mathtt {S_0}\) and \(\mathtt {S_1}\), respectively. The file is recovered by \(\mathtt {Dec(}\mathtt {F_0}\oplus \mathtt {F_1}\mathtt {)}\) and made available to the user.

5.4 Evaluation of performance and scalability

5.4.1 Performance of PASE

Our performance evaluation focuses on the computational overheads of the PASE scheme and does not consider the varying network latency. All experiments were performed on a MacBook Pro laptop with 2.2GHz Intel Core i7 and 16GB RAM (server and client instances) and OnePlus 5 smartphone with Qualcomm Snapdragon 835 octa-core 2.45GHz and 8GB RAM (client instance).

Table 2 Performance evaluation of our PASE implementation

The results of our measurements are summarized in Table 2 with separate timings provided for the client and server side computations. For the client side, the table contains measurements performed on both the laptop and smartphone. The registration procedure includes all steps of the \(\mathtt {Register}\) protocol. In the table, the time needed to reconstruct the symmetric key \(K\), which is accomplished in step 3a of our PASE \(\mathtt {Outsource}\) and \(\mathtt {Retrieve}\) protocols (cf. Sect. 3), is measured separately from the time needed to outsource keywords (steps 3b and 3c of \(\mathtt {Outsource}\)) and retrieve files (steps 3b and 5 of \(\mathtt {Retrieve}\)). We observe that key reconstruction time on the client side is more than twice as fast as on the server side (when measured on the same device). Note that the key reconstruction procedure is identical in both protocols and its time is independent of the used keywords. In constrast, the measurements provided for outsourcing and retrieval procedures in Table 2 cover only keyword-dependent steps. Table 2 provides average timings based on one keyword, which are computed from multiple executions of the protocol involving a set of 100 randomly generated keywords, with each keyword being between 5 and 10 characters long. For each execution, a random keyword was chosen from the set and the resulting average was computed over 1000 executions.

Based on the measurements, we can highlight that our PASE registration procedure remains well under 1s on both the laptop and the smartphone. The time for outsourcing and retrieval is clearly dominated by the time needed to reconstruct \(K\), which also remains well under 1s. The keyword-dependent computations in both protocols are very efficient, taking less than 100ms per keyword. On the client side, the outsourcing procedure is slightly more efficient than the retrieval procedure due to the additional integrity checks performed in step 5 of the \(\mathtt {Retrieve}\) protocol.

5.4.2 Performance of file encryption

We evaluated the performance of our file encryption scheme by using test files of size 100 KB, 1 MB and 10 MB on our client instances (MacBook Pro and OnePlus 5). We limited the upper bound of file size to 10MB because of JavaScript’s bottleneck on the encryption size. Each encryption round includes the time taken to encrypt the file using AES, generate the random stream of data and perform the XOR operation (cf. Sect. 5.3). On laptop, the encryption scheme during \(\mathtt {Outsource}\) averaged at 15ms for 100 KB, 64 ms for 1 MB and 476 ms for 10 MB files. During the execution, the random stream generation accounted for 8ms for 100 KB, 52 ms for 1 MB and 380 ms for 10MB files. Decrypting the file in \(\mathtt {Retrieve}\) took 15 ms for 100KB, 45 ms for 1MB and 270ms for 10MB files. On mobile, the encryption scheme during \(\mathtt {Outsource}\) averaged at 25 ms for 100 KB, 105 ms for 1 MB and 540 ms for 10 MB files. During the execution, the random stream generation accounted for 16 ms for 100 KB, 65 ms for 1 MB and 310 ms for 10 MB files. Decrypting the file in \(\mathtt {Retrieve}\) took 90 ms for 100 KB, 300 ms for 1 MB and 3s for 10MB files.

Based on our measurements, we can highlight that an encryption scheme (e.g., AES) can be practically adapted into our protocol with less computation overhead. The total time taken for encryption and decryption, at large file sizes, is clearly dominated by random stream generation. Hence, we propose a configuration setting where the random stream generation is available as an optional security enhancement for the user to choose. With this configuration, users can leverage their computational flexibility to securely encrypt and distribute files of their choice.

5.4.3 Scalability of PASE

In addition to the measurements involving one keyword per execution, we are interested in the scalability of our PASE implementation. For this purpose, we have extended our measurements to calculate an average time for keyword-dependent outsourcing and retrieval computations with up to 30 keywords (which is for example, the maximum number of hashtags allowed per image on InstagramFootnote 13). In our experiments, for each execution multiple keywords were randomly chosen from the same set of 100 keywords that were used in the experiments behind Table 2. A linear regression model was then applied to the average discrete timings to derive a linear approximation.

Fig. 5
figure 5

Scalability of keyword-dependent outsourcing and retrieval operations on the client side using MacBook Pro laptop and OnePlus 5 smartphone

Our experimental results for client-side keyword-dependent computations are plotted in Fig. 5. These timings suggest that our implementation remains scalable on commodity user devices such as laptops and smartphones. For example, client-side processing of 10 keywords in the outsourcing phase requires about 256 ms (laptop) and 455 ms (smartphone), whereas computations associated with a subset query of 10 keywords during the retrieval phase require about 289 ms (laptop) and 523 ms (smartphone). If we add constant key reconstruction costs from Table 2, then the overall time for client-side processing of 10 keywords would be about 356 ms (laptop) and 879 ms (smartphone) in outsourcing and about 389 ms (laptop) and 947 ms (smartphone) in retrieval phases.

5.4.4 Strengthening password-based authentication

The PASE protocol is proven sound by rigorous mathematical analysis, but the usage of password for authentication indirectly inherits several issues associated with passwords and acts as a single point of failure for the entire architecture. Moving beyond brute force and online attacks, passwords are vulnerable to re-usage, leakage and social engineering attacks. A study [39] on password usage states 38% reused the same password for two different online services, and 21% of them slightly modified an old one to sign up for a new service. Have I been pwned (HIBP)Footnote 14, a popular website which reports data breaches provides records over 500 million actual unique passwords leaked from various data breaches through a variety of attacks including credential stuffing and phishing. The study also shows that users with more passwords are more likely to reuse them, or use variations. The 2020 Verizon Data Breach Investigations Report (DBIR) [1] reports over 80% of breaches within hacking involve brute force or the use of lost or stolen credentials. To protect against such password weakness, the PASE protocol can be extended modularly with 2FA, a secondary authentication mechanism which provides a one-time password (OTP) or code generated or received by an authenticator (e.g., a security token or smartphone) that only the user possesses to complement the primary password used for authentication. The PASE scheme allows inclusion of additional complimentary authentication scheme without comprising the integrity of the internal PASE protocol which relies on high entropy keys generated from the primary password.

6 Conclusion

Password-Authenticated Searchable Encryption (PASE) introduced in this paper is a new concept for searchable encryption where the search over encrypted keywords can be performed solely with the help of a human-memorizable password. The main advantage over previous concepts is a simplified key management which removes the need for storing and managing high-entropy keys on the user side and makes the whole process device-agnostic. Basing searchable encryption on passwords introduces major design challenges; in particular, creating the need for a distributed server architecture to achieve security against offline dictionary attacks.

We modeled the functionality and security properties of PASE, incl. \(\mathtt {IND} \hbox {-}\mathtt {CKA}\)-security for keyword privacy and authentication for outsourcing for the search procedure and proposed a direct PASE construction those security and privacy has been proven under standard assumptions. Our direct PASE construction is an optimized version of a more general concept for building PASE protocols based on techniques underlying password-authenticated secret sharing and symmetric searchable encryption.

We evaluated the practicality of our PASE scheme through implementation of a JavaScript-based web application that can readily be executed on any (mobile) browser. The conducted performance and scalability evaluation of our implementation shows that the proposed PASE approach remains practical on commodity user devices such as laptops and smartphones.