1 Introduction

In the recent years, the field of quantum computing has developed significantly. One of its related aspects is quantum game theory that merges together ideas from quantum information [1] and game theory [2] to open up new opportunities for finding optimal strategies for many games. The concept of quantum strategy was first mentioned in [3], where a simple extensive game called PQ Penny Flip was introduced. The paper showed that one player could always win if he was allowed to use quantum strategies against the other player restricted to the classical ones. Next, Eisert Wilkens and Lewenstein proposed a quantum scheme for Prisoners Dilemma game based on entanglement [4]. Their solution leads to a Nash equilibrium that a Pareto-optimal payoff point.

Since then, many other examples of quantum games were proposed. Good overview of quantum game theory can be found in [5]. One of the latest trends is to study quantum repeated games [6, 7]. In particular, quantum repeated Prisoner’s Dilemma [8, 9] was investigated. In [8], the idea was to classically repeat the Prisoner’s Dilemma with strategy sets extended to include some special unitary strategies. That enabled one to study conditional strategies similar to ones defined in the classical repeated Prisoner’s Dilemma, for example, the “tit for tat” or Pavlov strategies.

We present a different approach taking advantage of the fact that a repeated game is a particular case of an extensive-form game. A twice repeated \(2\times 2\) game is an extensive game with five information sets for each of the two players. Instead of using a classically repeated scheme based on two entangled qubits [8], we consider a twice repeated game as a single quantum system which requires ten maximally entangled qubits. Our scheme uses the quantum framework introduced in [10] and recently generalized in [11], according to which choosing an action in an information set is identified with acting a unitary operation on a qubit.

In this paper, we examine one of the most interesting cases in quantum game theory—the problem in which one of the players has access to the full range of unitary strategies whereas the other player can only choose from unitary operators that correspond to the classical strategies. Additionally, we examine the quantum game in terms of players’ limited awareness about available strategies. We use the concept of games with unawareness [12,13,14] to check to what extend two different factors: access to quantum strategies and game perception affect the result of the game.

2 Preliminaries

In what follows, we give a brief review of the basic concepts of games with unawareness. The reader who is not familiar with this topic is encouraged to see [12]. Introductory examples and application of the notion of games with unawareness to quantum games can be found in [15, 16].

2.1 Strategic game with unawareness

A strategic form game with unawareness is defined as a family of strategic form games. The family specifies how each player perceives the game, how she perceives the other players’ perceptions of the game and so on. To be more precise, let \(G = (N, (S_{i})_{i\in N}, (u_{i})_{i\in N})\) be a strategic form game. This is the game played by the players which is also called the modeler’s game. Each player may have a restricted view of the game, i.e., she may not be aware of the full description of G. Hence, \(G_{\text {v}} = (N_{\text {v}}, ((S_{i})_{\text {v}})_{i\in N_{\text {v}}}, ((u_{i})_{\text {v}})_{i\in N_{\text {v}}})\) denotes player \({\text {v}}\)’s view of the game for \({\text {v}} \in N\). That is, the player \({\text {v}}\) views the set of players, the sets of players’ strategies, and the payoff functions as \(N_{{\text {v}}}\), \((S_{i})_{{\text {v}}}\) and \((u_{i})_{{\text {v}}}\), respectively. In general, each player also considers how each of the other players views the game. Formally, with a finite sequence of players \(v = (i_{1}, \dots , i_{n})\) there is associated a game \(G_{v} = (N_{v},((S_{i})_{v})_{i\in N_{v}}, ((u_{i})_{v})_{i\in N_{v}})\). This is the game that player \(i_{1}\) considers that player \(i_{2}\) considers that ...player \(i_{n}\) is considering. A sequence v is called a view. The empty sequence \(v = \emptyset \) is assumed to be the modeler’s view, i.e., \(G_{\emptyset } = G\). We denote an action profile \(\prod _{i\in N_{v}}s_{i}\) in \(G_{v}\), where \(s_{i} \in (S_{i})_{v}\) by \((s)_{v}\). The concatenation of two views \({\bar{v}} = (i_{1}, \dots , i_{n})\) followed by \({\tilde{v}} = (j_{1}, \dots , j_{n})\) is defined to be \(v = \hat{{\bar{v}}} {\tilde{v}} = (i_{1}, \dots , i_{n}, j_{1}, \dots , j_{n})\). The set of all potential views is \(V = \bigcup ^{\infty }_{n=0}N^{(n)}\) where \(N^{(n)} = \prod ^{n}_{j=1}N\) and \(N^{(0)} = \emptyset \).

Definition 1

A collection \(\{G_{v}\}_{v\in {\mathcal {V}}}\) where \({\mathcal {V}} \subset V\) is a collection of finite sequences of players is called a strategic-form game with unawareness and the collection of views \({\mathcal {V}}\) is called its set of relevant views if the following properties are satisfied:

  1. 1.

    For every \(v \in {\mathcal {V}}\),

    $$\begin{aligned} v{^{\hat{\,}}}{\text {v}} \in {\mathcal {V}} ~\text {if and only if}~{\text {v}} \in N_{v}. \end{aligned}$$
    (1)
  2. 2.

    For every \(v{^{\hat{\,}}}{\tilde{v}} \in {\mathcal {V}}\),

    $$\begin{aligned} v \in {\mathcal {V}}, \quad \emptyset \ne N_{v{^{\hat{\,}}}{\tilde{v}}} \subset N_{v}, \quad \emptyset \ne (S_{i})_{v{^{\hat{\,}}}{\tilde{v}}} \subset (S_{i})_{v} ~\text {for all}~i \in N_{v{^{\hat{\,}}}{\tilde{v}}}. \end{aligned}$$
    (2)
  3. 3.

    If \({v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}} \in {\mathcal {V}}}\), then

    $$\begin{aligned} {v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}} \in {\mathcal {V}} ~\text {and}~ G_{v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}}} = G_{v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}}}.} \end{aligned}$$
    (3)
  4. 4.

    For every strategy profile \((s)_{v{^{\hat{\,}}}{\tilde{v}}} = \{s_{j}\}_{j\in N_{v{^{\hat{\,}}}{\tilde{v}}}}\), there exists a completion to an strategy profile \((s)_{v} = \{s_{j}, s_{k}\}_{j\in N_{v{^{\hat{\,}}}{\tilde{v}}}, k\in N_{v}{\setminus } N_{v{^{\hat{\,}}}{\tilde{v}}}}\) such that

    $$\begin{aligned} (u_{i})_{{v{^{\hat{\,}}}{\tilde{v}}}}((s)_{v{^{\hat{\,}}}{\tilde{v}}}) = (u_{i})_{v}((s)_{v}). \end{aligned}$$
    (4)

2.2 Extended Nash equilibrium

A basic solution concept for predicting players’ behavior is a Nash equilibrium [17].

Definition 2

A strategy profile \(s^* = (s_{1}, s_{2}, \dots , s_{n})\) is a Nash equilibrium if for each player \(i\in \{1, \dots , n\}\) and each strategy \(s_{i}\) of player i

$$\begin{aligned} u_{i}(s^*) \ge u_{i}(s_{i}, s^*_{-i}), \end{aligned}$$
(5)

where \(s^*_{-i} {:}{=}(s_{j})_{j \ne i}\).

In order to define the Nash-type equilibrium for a strategic-form game with unawareness, it is needed to redefine the notion of strategy profile.

Definition 3

Let \(\{G_{v}\}_{v\in {\mathcal {V}}}\) be a strategic-form game with unawareness. An extended strategy profile (ESP) in this game is a collection of (pure or mixed) strategy profiles \(\{(\sigma )_{v}\}_{v\in {\mathcal {V}}}\) where \((\sigma )_{v}\) is a strategy profile in the game \(G_{v}\) such that for every \({v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}} \in {\mathcal {V}}}\) holds

$$\begin{aligned} {(\sigma _{{\text {v}}})_{v} = (\sigma _{{\text {v}}})_{v{^{\hat{\,}}}{\text {v}}} ~\text {as well as}~ (\sigma )_{v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}}} = (\sigma )_{v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}}}.} \end{aligned}$$
(6)

To illustrate (6), let us take the game \(G_{12}\)—the game that player 1 thinks that player 2 is considering. If player 1 assumes that player 2 plays strategy \((\sigma _{2})_{12}\) in the game \(G_{12}\), she must assume the same strategy in the game \(G_{1}\) that she considers, i.e., \((\sigma _{2})_{1} = (\sigma _{2})_{12}\).

Next step is to extend rationalizability from strategic-form games to the games with unawareness.

Definition 4

An ESP \(\{(\sigma )_{v}\}_{v\in {\mathcal {V}}}\) in a game with unawareness is called extended rationalizable if for every \(v{^{\hat{\,}}}{\text {v}} \in {\mathcal {V}}\) strategy \((\sigma _{{\text {v}}})_{v}\) is a best reply to \((\sigma _{-{\text {v}}})_{v{^{\hat{\,}}}{\text {v}}}\) in the game \(G_{v{^{\hat{\,}}}{\text {v}}}\).

Consider a strategic-form game with unawareness \(\{G_{v}\}_{v\in {\mathcal {V}}}\). For every relevant view \(v \in {\mathcal {V}}\), the relevant views as seen from v are defined to be \({\mathcal {V}}^{v} = \{{\tilde{v}} \in {\mathcal {V}}:v{^{\hat{\,}}}{\tilde{v}} \in {\mathcal {V}}\}\). Then, the game with unawareness as seen from v is defined by \(\{G_{v{^{\hat{\,}}}{\tilde{v}}}\}_{{\tilde{v}} \in {\mathcal {V}}^{v}}\). We are now in a position to define the Nash equilibrium in the strategic-form games with unawareness.

Definition 5

An ESP \(\{(\sigma )_{v}\}_{v\in {\mathcal {V}}}\) in a game with unawareness is called an extended Nash equilibrium (ENE) if it is rationalizable and for all \(v, {\bar{v}} \in {\mathcal {V}}\) such that \(\{G_{v{^{\hat{\,}}}{\tilde{v}}}\}_{{\tilde{v}} \in {\mathcal {V}}^{v}} = \{G_{\hat{{\bar{v}}} {\tilde{v}}}\}_{{\tilde{v}} \in {\mathcal {V}}^{{\bar{v}}}}\) we have that \((\sigma )_{v} = (\sigma )_{{\bar{v}}}\).

The first part of the definition (rationalizability) is similar to the standard Nash equilibrium, where it is required that each strategy in the equilibrium is a best reply to the other strategies of that profile. For example, according to Definition 4, player 2’s strategy \((\sigma _{2})_{1}\) in the game of player 1 has to be a best reply to player 1’s strategy \((\sigma _{1})_{12}\) in the game \(G_{12}\). On the other hand, in contrast to the concept of Nash equilibrium, \((\sigma _{1})_{12}\) does not have to a best reply to \((\sigma _{2})_{1}\) but to strategy \((\sigma _{2})_{121}\).

The following proposition shows that the notion of extended Nash equilibrium coincides with the standard one for strategic-form games when all views share the same perception of the game.

Proposition 1

Let G be a strategic-form game and \(\{G_{v}\}_{v \in {\mathcal {V}}}\) a strategic-form game with unawareness such that for some \(v\in {\mathcal {V}}\), we have \(G_{v{^{\hat{\,}}}{\bar{v}}} = G\) for every \({\bar{v}}\) such that \(v{^{\hat{\,}}}{\bar{v}} \in {\mathcal {V}}\). Let \(\sigma \) be a strategy profile in G. Then,

  1. 1.

    \(\sigma \) is rationalizable for G if and only if \((\sigma )_{v} = \sigma \) is part of an extended rationalizable profile in \(\{G_{v}\}_{v\in {\mathcal {V}}}\).

  2. 2.

    \(\sigma \) is a Nash equilibrium for G if and only if \((\sigma )_{v} = \sigma \) is part of on an ENE for \(\{G_{v}\}_{v\in {\mathcal {V}}}\), and this ENE also satisfies \((\sigma )_{v} = (\sigma )_{v{^{\hat{\,}}}{\bar{v}}}\).

Remark 1

We see from (3) and (6) that for every \({v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}} \in {\mathcal {V}}}\) a normal-form game \({G_{v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}}}}\) and a strategy profile \({(\sigma )_{v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}}}}\), determine the games and profiles in the form \({G_{v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\dots }{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}}}}\) and \({(\sigma )_{v{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\dots }{^{\hat{\,}}}{\text {v}}{^{\hat{\,}}}{\bar{v}}}}\), respectively, for example, \(G_{121}\) determines \(G_{122\dots 21}\). Hence, in general, a game with unawareness \(\{G_{v}\}_{v\in {\mathcal {V}}}\) and an extended strategy profile \(\{(\sigma )_{v}\}_{v\in {\mathcal {V}}}\) are defined by \(\{G_{v}\}_{v\in {\mathcal {N}} \cup \{\emptyset \}}\) and \(\{(\sigma )_{v}\}_{v\in {\mathcal {N}} \cup \{\emptyset \}}\), where

$$\begin{aligned} {\mathcal {N}} = \{v\in {\mathcal {V}} \mid v=(i_{1}, \dots , i_{n}) ~\text {with}~i_{k} \ne i_{k+1} ~\text {for all}~k\}. \end{aligned}$$
(7)

Then, we get \(\{G_{v}\}_{v\in {\mathcal {V}}}\) from \(\{G_{v}\}_{v\in {\mathcal {N}} \cup \{\emptyset \}}\) by setting \(G_{{\tilde{v}}} = G_{v}\) for \(v=(i_{1},\dots , i_{n})\in {\mathcal {N}}\) and \({\tilde{v}} = (i_{1}, \dots , i_{k}, i_{k}, i_{k+1}, \dots , i_{n}) \in {\mathcal {V}}\). For this reason, we often restrict ourselves to \({\mathcal {N}} \cup \{\emptyset \}\) throughout the paper.

3 Twice repeated \(2\times 2\) game

The concept of a finitely repeated game assumes playing a normal-form game (a stage of the repeated game) for a fixed number of times (see, for example, [18]). The players are informed about the results of consecutive stages. Let us consider a \(2\times 2\) bimatrix game

$$\begin{aligned} \begin{pmatrix} (a_{00}, b_{00}) &{} (a_{01}, b_{01}) \\ (a_{10}, b_{10}) &{} (a_{11}, b_{11}) \end{pmatrix}. \end{aligned}$$
(8)

In the two-stage \(2\times 2\) bimatrix, the game can be easily depicted as an extensive-form game (see Fig. 1). The first stage of the twice repeated \(2\times 2\) game is a part of the game where the players specify an action C or D at the information sets 1.1 and 2.1. When the players choose their actions, the result of the first stage is announced. Since they have knowledge about the results of the first stage, they can choose different actions at the second stage depending on the previous result. Hence, the next four game trees from Fig. 1 are required to describe the repeated game. Each player has five information sets at which they specify their own actions; player 1’s information sets are denoted by 1.1, 1.2, 1.3, 1.4 and 1.5, player 2’s information sets are 2.1, 2.2, 2.3, 2.4 and 2.5. Note that player 2’s information sets consist of two nodes connected by dotted lines. This is intended to show a lack of knowledge of the player 2 about the previous move of player 1. Recall that a player’s strategy is a function that assigns to each information set of that player an action available at that information set. In our example, this means that each player’s strategy specifies an action at the first stage and four actions at the second stage. For example, strategy (CCDDC) of a player in the game given in Fig. 1 says that the player chooses action C at the first stage, and depending on one of the four possible results of the first stage, he chooses actions C, D, D, C, respectively.

Fig. 1
figure 1

Twice repeated \(2\times 2\) game represented by an extensive-form game

If player 1 plays that strategy whereas player 2 chooses, for example (DCDCC), then the resulting strategy vector determines the unique path from the node 1.1 that intersects the nodes 2.1, 1.2 and 2.3 and gives the payoff outcome \((a_{01} + a_{10}, b_{01} + b_{10})\).

The players can also choose their own actions in a random way, i.e., according to some probability distribution determined by themselves. Such strategies are called behavioral strategies (see, for example, [2]).

Definition 6

A behavior strategy of a player in an extensive-form game is a function mapping each of his information sets to a probability distribution over the set of possible actions at that information set.

For example, in the case of the game given by Fig. 1, player 1’s and player 2’s behavioral strategies are determined by quintuples \((p_{1}, p_{2}, p_{3}, p_{4}, p_{5})\) and \((q_{1}, q_{2}, q_{3}, q_{4}, q_{5})\), respectively, in which \(p_{i}\) and \(q_{i}\) are the probabilities of choosing their first strategy at information set i. The payoff outcome resulting from playing by the players the general behavioral strategies is

$$\begin{aligned}&(2a_{00}, 2b_{00})p_{1}q_{1}p_{2}q_{2} + (a_{00}+ a_{01}, b_{00}+b_{01})p_{1}q_{1}p_{2}(1-q_{2})\nonumber \\&\quad +\, (a_{00}+a_{10}, b_{00}+b_{10})p_{1}q_{1}(1-p_{2})q_{2}\nonumber \\&\quad +\, (a_{00} + a_{11}, b_{00}+b_{11})p_{1}q_{1}(1-p_{2})(1-q_{2}) \nonumber \\&\quad +\, (a_{01}+a_{00}, b_{01} + b_{00})p_{1}(1-q_{1})p_{3}q_{3}\nonumber \\&\quad +\, (2a_{01}, 2b_{01})p_{1}(1-q_{1})p_{3}(1-q_{3})\nonumber \\&\quad +\, (a_{01} + a_{10}, b_{01}+b_{10})p_{1}(1-q_{1})(1-p_{3})q_{3}\nonumber \\&\quad +\, (a_{01}+a_{11}, b_{01}+b_{11})p_{1}(1-q_{1})(1-p_{3})(1-q_{3})\nonumber \\&\quad +\, (a_{10}+a_{00}, b_{10}+b_{00})(1-p_{1})q_{1}p_{4}q_{4}\nonumber \\&\quad +\, (a_{10}+a_{01}, b_{10}+b_{01})(1-p_{1})q_{1}p_{4}(1-q_{4}) \nonumber \\&\quad +\, (2a_{10}, 2b_{10})(1-p_{1})q_{1}(1-p_{4})q_{4}\nonumber \\&\quad +\, (a_{10}+a_{11}, b_{10}+b_{11})(1-p_{1})q_{1}(1-p_{4})(1-q_{4}) \nonumber \\&\quad +\, (a_{11}+a_{00}, b_{11} + b_{00})(1-p_{1})(1-q_{1})p_{5}q_{5}\nonumber \\&\quad +\, (a_{11}+a_{01}, b_{11}+b_{01})(1-p_{1})(1-q_{1})p_{5}(1-q_{5})\nonumber \\&\quad +\, (a_{11}+a_{10}, b_{11}+b_{10})(1-p_{1})(1-q_{1})(1-p_{5})q_{5}\nonumber \\&\quad +\, (2a_{11}, 2b_{11})(1-p_{1})(1-q_{1})(1-p_{5})(1-q_{5}). \end{aligned}$$
(9)

4 Construction of a twice repeated \(2\times 2\) quantum game

We propose a scheme of playing a twice repeated \(2\times 2\) game. It is based on the protocol introduced in [10], where a quantum approach to general finite extensive quantum games was considered. A two-stage \(2\times 2\) game is an example of an extensive game with ten information sets. According to the idea presented in [9], we associate choosing an action at an information set with a unitary operation performed on a qubit. As a result, each player specifies a unitary action on each of five qubits. To be more specific, let us consider a \(2\times 2\) bimatrix game (8). We define a triple

$$\begin{aligned} \Gamma _{QQ} = ({\mathcal {H}}, \{{{\mathsf {S}}}{{\mathsf {U}}}(2)^{\otimes 5}, {{\mathsf {S}}}{{\mathsf {U}}}(2)^{\otimes 5}\}, (u_{1}, u_{2})), \end{aligned}$$
(10)

where

  • \({\mathcal {H}}\) is a Hilbert space \(\left( \mathbb {C}^2\right) ^{\otimes 10}\).

  • \({{\mathsf {S}}}{{\mathsf {U}}}(2)\) is the special unitary group of degree 2. The commonly used parameterization for \(U\in {{\mathsf {S}}}{{\mathsf {U}}}(2)\) is given by

    $$\begin{aligned} \begin{pmatrix} \text {e}^{\text {i}\alpha }\cos {\frac{\theta }{2}} &{} \text {i}\text {e}^{\text {i}\beta }\sin {\frac{\theta }{2}}\\ \text {i}\text {e}^{-\text {i}\beta }\sin {\frac{\theta }{2}} &{} \text {e}^{-\text {i}\alpha }\cos {\frac{\theta }{2}} \end{pmatrix}, \theta \in [0,\pi ], \alpha , \beta \in [0, 2\pi ). \end{aligned}$$
    (11)
  • \(|\Psi _{\text {f}}\rangle \) is the final state determined by a strategy \(\bigotimes ^5_{i=1}U_{i}(\theta _{i}, \alpha _{i}, \beta _{i}) \in {{\mathsf {S}}}{{\mathsf {U}}}(2)^{\otimes 5}\) of player 1 and a strategy \(\bigotimes ^{10}_{j=6}U_{j}(\theta _{j}, \alpha _{j}, \beta _{j}) \in {{\mathsf {S}}}{{\mathsf {U}}}(2)^{\otimes 5}\) of player 2 according to the following formula:

    $$\begin{aligned} |\Psi _{\text {f}}\rangle = J^{\dag }\left( \bigotimes ^{10}_{i=1}U_{i}(\theta _{i}, \alpha _{i}, \beta _{i})\right) J|0\rangle ^{\otimes 10}, \quad J=\frac{1}{\sqrt{2}}\left( \mathbb {1}^{\otimes 10} + i\sigma ^{\otimes 10}_{x}\right) , \end{aligned}$$
    (12)
  • the payoff vector function \((u_{1}, u_{2})\) is given by

    $$\begin{aligned} (u_{1}, u_{2})\left( \bigotimes ^{10}_{i=1}U_{i}(\theta _{i}, \alpha _{i}, \beta _{i}) \right) = {\text {tr}}\left( X|\Psi _{\text {f}}\rangle \langle \Psi _{\text {f}}|\right) , \end{aligned}$$
    (13)

    where

    $$\begin{aligned} X= & {} (2a_{00}, 2b_{00})|00\rangle \langle 00|\otimes \mathbb {1}^{\otimes 3}\otimes |00\rangle \langle 00| \otimes \mathbb {1}^{\otimes 3}\nonumber \\&+\, (a_{00} + a_{01}, b_{00} + b_{01})|00\rangle \langle 00|\otimes \mathbb {1}^{\otimes 3}\otimes |01\rangle \langle 01|\otimes \mathbb {1}^{\otimes 3}\nonumber \\&+\, (a_{00} + a_{10}, b_{00} + b_{10})|01\rangle \langle 01|\otimes \mathbb {1}^{\otimes 3}\otimes |00\rangle \langle 00|\otimes \mathbb {1}^{\otimes 3} \nonumber \\&+\, (a_{00}+a_{11}, b_{00}+b_{11})|01\rangle \langle 01|\otimes \mathbb {1}^{\otimes 3} \otimes |01\rangle \langle 01|\otimes \mathbb {1}^{\otimes 3}\nonumber \\&+\, (a_{01} + a_{00}, b_{01} + b_{00})|0\rangle \langle 0| \otimes \mathbb {1} \otimes |0\rangle \langle 0|\otimes \mathbb {1}^{\otimes 2} \otimes |1\rangle \langle 1|\otimes \mathbb {1} \otimes |0\rangle \langle 0| \otimes \mathbb {1}^{\otimes 2}\nonumber \\&+\, (2a_{01}, 2b_{01})|0\rangle \langle 0| \otimes \mathbb {1} \otimes |0\rangle \langle 0|\otimes \mathbb {1}^{\otimes 2} \otimes |1\rangle \langle 1|\otimes \mathbb {1} \otimes |1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 2} \nonumber \\&+\, (a_{01}+a_{10}, b_{01} + b_{10})|0\rangle \langle 0| \otimes \mathbb {1} \otimes |1\rangle \langle 1|\otimes \mathbb {1}^{\otimes 2} \otimes |1\rangle \langle 1|\otimes \mathbb {1} \otimes |0\rangle \langle 0| \otimes \mathbb {1}^{\otimes 2} \nonumber \\&+\, (a_{01}+a_{11}, b_{01}+b_{11})|0\rangle \langle 0| \otimes \mathbb {1} \otimes |1\rangle \langle 1|\otimes \mathbb {1}^{\otimes 2} \otimes |1\rangle \langle 1|\otimes \mathbb {1} \otimes |1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 2} \nonumber \\&+\, (a_{10}+a_{00}, b_{10} + b_{00})|1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 2} \otimes |0\rangle \langle 0|\otimes \mathbb {1} \otimes |0\rangle \langle 0| \otimes \mathbb {1}^{\otimes 2} \otimes |0\rangle \langle 0| \otimes \mathbb {1} \nonumber \\&+\, (a_{10}+a_{01}, b_{10} + b_{01})|1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 2} \otimes |0\rangle \langle 0|\otimes \mathbb {1} \otimes |0\rangle \langle 0| \otimes \mathbb {1}^{\otimes 2} \otimes |1\rangle \langle 1| \otimes \mathbb {1} \nonumber \\&+\, (2a_{10}, 2b_{10})|1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 2} \otimes |1\rangle \langle 1|\otimes \mathbb {1} \otimes |0\rangle \langle 0| \otimes \mathbb {1}^{\otimes 2} \otimes |0\rangle \langle 0| \otimes \mathbb {1} \nonumber \\&+\, (a_{10}+a_{11}, b_{10} + b_{11})|1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 2}\otimes |1\rangle \langle 1| \otimes \mathbb {1} \otimes |0\rangle \langle 0| \otimes \mathbb {1}^{\otimes 2} \otimes |1\rangle \langle 1|\otimes \mathbb {1} \nonumber \\&+\, (a_{11}+a_{00}, b_{11} + b_{00})|1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 3}\otimes |0\rangle \langle 0| \otimes |1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 3} \otimes |0\rangle \langle 0|\nonumber \\&+\, (a_{11}+a_{01}, b_{11} + b_{01})|1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 3}\otimes |0\rangle \langle 0| \otimes |1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 3} \otimes |1\rangle \langle 1|\nonumber \\&+\, (a_{11}+a_{10}, b_{11} + b_{10})|1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 3}\otimes |1\rangle \langle 1| \otimes |1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 3} \otimes |0\rangle \langle 0| \nonumber \\&+\, (2a_{11}, 2b_{11})|1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 3}\otimes |1\rangle \langle 1| \otimes |1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 3} \otimes |1\rangle \langle 1|. \end{aligned}$$
    (14)

The construction (14) of the operator X results from the following reasoning. First note that the information sets 1.1, ..., 1.5 of player 1 are associated with the first five qubits, and the information sets 2.1, ..., 2.5 of player 2 are associated with the other five qubits. Now, consider, for example, the outcome \((2a_{00}, 2b_{00})\). In the classical case that payoff outcome is obtained if the players choose their first strategies at the information sets 1.1, 2.1, 1.2 and 2.2. These information sets are assigned to the first, sixth, second and seventh qubit, respectively. Therefore, the state 0 measured on those qubits results in the outcome \((2a_{00}, 2b_{00})\) in the quantum game. In similar way, we can justify the other terms of (14).

The scheme defined by (10)–(14) is an extension of the classical way of playing the game. As in the case of the standard Eisert–Wilkens–Lewenstein scheme, the model \(\Gamma _{QQ}\) determines the game equivalent to the classical one by restricting the strategy sets of the players.

Proposition 2

The game determined by

$$\begin{aligned} \left( {\mathcal {H}}, \left\{ \bigotimes ^5_{i=1}U_{i}(\theta _{i}, 0,0), \bigotimes ^{10}_{i=6}U_{i}(\theta _{i}, 0,0) \right\} , (u_{1}, u_{2})\right) \end{aligned}$$
(15)

is outcome-equivalent to the two-stage bimatrix \(2\times 2\) game.

Proof

Let us first consider the outcome \((2a_{00}, 2b_{00})\). Denote by P the projection of (14) corresponding to that outcome,

$$\begin{aligned} P = |00\rangle \langle 00|\otimes \mathbb {1}^{\otimes 3}\otimes |00\rangle \langle 00| \otimes \mathbb {1}^{\otimes 3}. \end{aligned}$$
(16)

If player 1 and 2 choose \(\bigotimes ^5_{i=1}U_{i}(\theta _{i}, 0,0)\) and \(\bigotimes ^{10}_{i=6}U_{i}(\theta _{i}, 0,0)\), respectively, the final state becomes

$$\begin{aligned} |\Psi _{\text {f}}\rangle = J^{\dag }\left( \bigotimes ^{10}_{i=1}U_{i}(\theta _{i}, 0, 0)\right) J|0\rangle ^{\otimes 10}, \end{aligned}$$
(17)

and the probability of obtaining \((2a_{00}, 2b_{00})\) is

$$\begin{aligned} {\text {tr}}(P|\Psi _{\text {f}}\rangle \langle \Psi _{\text {f}}|) = \cos ^2{\frac{\theta _{1}}{2}}\cos ^2{\frac{\theta _{2}}{2}}\cos ^2{\frac{\theta _{6}}{2}}\cos ^2{\frac{\theta _{7}}{2}}. \end{aligned}$$
(18)

So, by substituting

$$\begin{aligned} \cos ^2{(\theta _{1}/2)} = p_{1},~\cos ^2{(\theta _{2}/2)} = p_{2},~\cos ^2{(\theta _{6}/2)} = q_{1},~\cos ^2{(\theta _{7}/2)} = q_{2}, \end{aligned}$$
(19)

the right-hand side of (18) multiplied by \((2a_{00}, 2b_{00})\) is equal to the first term of (9). Similarly, the outcome \((a_{10}+a_{11}, b_{10} + b_{11})\) is associated with the projection

$$\begin{aligned} P' = |1\rangle \langle 1| \otimes \mathbb {1}^{\otimes 2}\otimes |1\rangle \langle 1| \otimes \mathbb {1} \otimes |0\rangle \langle 0| \otimes \mathbb {1}^{\otimes 2} \otimes |1\rangle \langle 1|\otimes \mathbb {1}. \end{aligned}$$
(20)

In this case,

$$\begin{aligned} {\text {tr}}(P'|\Psi _{\text {f}}\rangle \langle \Psi _{\text {f}}|) = \sin ^2{\frac{\theta _{1}}{2}}\sin ^2{\frac{\theta _4}{2}}\cos ^2{\frac{\theta _6}{2}} \sin ^2{\frac{\theta _9}{2}}. \end{aligned}$$
(21)

Substituting

$$\begin{aligned} \cos ^2{(\theta _{1}/2)} = p_{1},~ \cos ^2{(\theta _{4}/2)} = p_{4},~ \cos ^2{(\theta _{6}/2)} = q_{1},~\cos ^2{(\theta _{9}/2)} = q_{4}, \end{aligned}$$
(22)

we obtain \((1-p_{1})q_{1}(1-p_{4})(1-q_{4})\). In general, a strategy profile in the form

$$\begin{aligned} \bigotimes ^5_{i=1}U_{i}(2\arccos {\sqrt{p_{i}}}, 0,0)\otimes \bigotimes ^{10}_{i=6}U_{i}(2\arccos {\sqrt{q_{i-5}}}, 0,0) \end{aligned}$$
(23)

results in the outcome (9). \(\square \)

5 Twice-repeated quantum Prisoner’s Dilemma with unawareness

The Prisoner’s Dilemma is one of the most interesting problems in game theory. It shows how the individual rationality of the players can lead them to an inefficient result. Let us consider a general form of the Prisoner’s Dilemma

(24)

where \(T>R>P>S\). The payoff profile (RR) of (24) is more beneficial to both players than (PP). However, each player obtains a higher payoff by choosing D instead of C (in other words, the strategy C is strictly dominated by D). As a result, the rational strategy profile is (DD), and it implies the payoff P for each player. A similar scenario occurs in a case of finitely repeated Prisoner’s Dilemma game. By induction, it can be shown that playing the action D at each stage of finitely repeated Prisoner’s Dilemma constitutes the unique Nash equilibrium.

We assume that the modeler’s game \(G_{\emptyset }\) (the game that is actually played by the players) is defined by (10). Player 1 being aware of all the unitary strategies also views the quantum game, i.e., \(G_{1} = \Gamma _{QQ}\). Next, we assume that player 2 perceives the game to be the classical one. In other words, player 2 views the game of the form

$$\begin{aligned} \Gamma _{CC} = ({\mathcal {H}}, \{\{\mathbb {1}, \sigma _{x}\}^{\otimes 5}, \{\mathbb {1}, \sigma _{x}\}^{\otimes 5}\}, (u_{1}, u_{2})). \end{aligned}$$

We then assume that player 1 finds that player 2 is considering \(\Gamma _{CC}\), and higher-order views \(v\in \{21, 121, 212, \dots \}\) are associated with \(\Gamma _{CC}\). We thus obtain a game with unawareness \(\{\Gamma _{v}\}_{v\in {\mathcal {V}}_{0}}\) defined as follows:

$$\begin{aligned} \Gamma _{v} = {\left\{ \begin{array}{ll} \Gamma _{QQ} &{}\text {if}~v\in \{\emptyset , 1\},\\ \Gamma _{CC} &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(25)

In what follows, we determine the players’ rational strategies by applying the notion of extended Nash equilibrium. First, we need to formulate the lemma that specifies player 1’s best reply to the Nash equilibrium strategy of the classical twice repeated Prisoner’s dilemma. Recall that the action D corresponds to \(\text {i}\sigma _{x}\) in the quantum scheme (10). This implies that \((\text {i}\sigma _{x})^{\otimes 5}\) is a counterpart of the unique Nash equilibrium (DDDDD) in the classical game. The following result is a part of the extended Nash equilibrium.

Lemma 1

Player 1’s best reply to \((\text {i}\sigma _{x})^{\otimes 5}\) in the set \({{\mathsf {S}}}{{\mathsf {U}}}(2)^{\otimes 5}\) is of the form

$$\begin{aligned} \tau ^* = U_{z}(\gamma _{1})\otimes U_{x}(\theta _{2})U_{z}(\gamma _{2}) \otimes U_{x}(\theta _{3})U_{z}(\gamma _{3})\otimes U_{z}(\gamma _{4})\otimes U_{x}(\theta _{5})U_{z}(\gamma _{5}), \end{aligned}$$
(26)

where \(\theta _{i} \in [0,\pi /2]\), \(\sum _{i}\gamma _{i} = k\pi /2\), \(k\in 2\mathbb {Z}+1\).

The complete proof of Lemma 1 is given in “Appendix A.” Here, we derive the result of playing the strategy profile \((\tau ^*\otimes (\text {i}\sigma _{x})^{\otimes 5})\). The player 1’s payoff resulting from playing the strategy (26) against \((\text {i}\sigma _{x})^{\otimes 5}\) is

$$\begin{aligned}&u_{1}\left( \tau ^*\otimes (\text {i}\sigma _{x})^{\otimes 5}\right) = 2S\cos ^2{\left( \sum ^5_{i=1}\gamma _{i}\right) }\cos ^2{\frac{\theta _{3}}{2}}\nonumber \\&\quad + (S+P)\cos ^2{\left( \sum ^5_{i=1}\gamma _{i}\right) }\sin ^2{\frac{\theta _{3}}{2}} + 2T\sin ^2{\left( \sum ^5_{i=1}\gamma _{i}\right) }. \end{aligned}$$
(27)

Thus, player 1 obtains the maximal payoff 2T by choosing \(\sum ^5_{i=1}\gamma _{i} = k\pi /2\), \(k\in 2\mathbb {Z}+1\).

Remark 2

It is worth noting that the strategy (26) turns out to be a nontrivial extension of the quantum player’s best reply to strategy \(\text {i}\sigma _{x}\) in the one-stage Prisoner’s Dilemma. Recall that according to [4, 19], the Eisert–Wilkens–Lewenstein approach to game (24) is defined by the final state

$$\begin{aligned} |\Psi \rangle= & {} J^{\dag }(U_{1}(\theta _{1}, \alpha _{1}, \beta _{1})\otimes U_{2}(\theta _{2}, \alpha _{2}, \beta _{2}))J|00\rangle , ~\nonumber \\&J = \frac{1}{\sqrt{2}}(\mathbb {1}\otimes \mathbb {1} + \text {i}\sigma _{x}\otimes \sigma _{x}), \end{aligned}$$
(28)

and the measurement operator

$$\begin{aligned} Y = \sum _{i,j=0,1}(a_{ij}, b_{ij})|ij\rangle \langle ij|. \end{aligned}$$
(29)

In case

$$\begin{aligned} |\Psi _{1}\rangle&= J^{\dag }\left( U_{1}(\theta _{1}, \alpha _{1}, \beta _{1})\otimes \text {i}\sigma _{x}\right) J|00\rangle \end{aligned}$$
(30)
$$\begin{aligned}&= \sin {\beta _{1}}\sin {\frac{\theta _{1}}{2}}|00\rangle + \text {i}\cos {\alpha _{1}}\cos {\frac{\theta _{1}}{2}}|01\rangle \nonumber \\&\quad + \text {i}\sin {\alpha _{1}}\cos {\frac{\theta _{1}}{2}}|10\rangle - \cos {\beta _{1}}\sin {\frac{\theta _{1}}{2}}|11\rangle , \end{aligned}$$
(31)

player 1’s payoff \(u_{1}(U_{1}(\theta _{1}, \alpha _{1}, \beta _{1})\otimes \text {i}\sigma _{x}) = {\text {tr}}\left( Y|\Psi _{1}\rangle \langle \Psi _{1}|\right) = T\) if \(\theta _{1} = 0\) and \(\alpha _{1} \in \{\pi /2, 3\pi /2\}\). Thus, the set of player 1’s best replies to \(\text {i}\sigma _{x}\) is

$$\begin{aligned} \left\{ U_{1}(0, \alpha _{1}, \beta _{1}), \alpha _{1} \in \left\{ \frac{\pi }{2},\frac{3\pi }{2}\right\} , \beta \in (0,2\pi )\right\} = \left\{ U_{z}(\gamma ), \gamma \in \left\{ \frac{\pi }{2}, \frac{3\pi }{2}\right\} \right\} . \end{aligned}$$
(32)

Proposition 3

Let \(\{\Gamma _{v}\}_{v\in {\mathcal {V}}_{0}}\) be a game with unawareness defined by (25). Then, all extended Nash equilibria \(\{(\sigma )_{v}\}\) of \(\{\Gamma _{v}\}_{v\in {\mathcal {V}}_{0}}\) are of the form:

$$\begin{aligned} (\sigma )_{v} = {\left\{ \begin{array}{ll} \left( \tau ^*, (i\sigma _{x})^{\otimes 5}\right) &{}\text {if}~v\in \{\emptyset , 1\},\\ \left( (i\sigma _{x})^{\otimes 5}, (i\sigma _{x})^{\otimes 5} \right) &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(33)

Proof

Since \(\Gamma _{v} = \Gamma _{CC}\) for \(v\in {\mathcal {V}}_{0}{\setminus } \{\emptyset , 1\}\), it follows that \((\sigma )_{v}\) is a Nash equilibrium in \(\Gamma _{CC}\). We know from classical game theory that the unique Nash equilibrium in the twice repeated Prisoner’s Dilemma is (DDDDD). In terms of the EWL scheme that profile can be written as \((i\sigma _{x})^{\otimes 5}\). Therefore,

$$\begin{aligned} (\sigma )_{v} = \left( (i\sigma _{x})^{\otimes 5}, (i\sigma _{x})^{\otimes 5} \right) , \quad v \in {\mathcal {V}}_{0}{\setminus } \{\emptyset , 1\}. \end{aligned}$$
(34)

In order to prove that \((\sigma )_{1} = (\sigma _{1}, \sigma _{2})_{1} = \left( \tau ^*, (i\sigma _{x})^{\otimes 5}\right) \), we first note from the definition of extended strategy profile that

$$\begin{aligned} (\sigma _{2})_{1} = (\sigma _{2})_{12} = (i\sigma _{x})^{\otimes 5}. \end{aligned}$$
(35)

According to Definition 4, player 1’s strategy \((\sigma _{1})_{1}\) has to be a best reply to \((\sigma _{2})_{1} = (i\sigma _{x})^{\otimes 5}\) in the game \(\Gamma _{1} = \Gamma _{QQ}\). Since player 1 has access to all the unitary actions, by Lemma 1, his best reply to \((i\sigma _{x})^{\otimes 5}\) is \((\sigma _{1})_{1} = \tau ^*\) given by (26). Finally, (6) implies that

$$\begin{aligned} (\sigma _{1})_{\emptyset } = (\sigma _{1})_{1} =\tau ^* ~~\text {and}~~ (\sigma _{2})_{\emptyset } = (\sigma _{2})_{2} = (i\sigma _{x})^{\otimes 5}. \end{aligned}$$
(36)

\(\square \)

6 Higher-order unawareness

In the previous section, we considered a typical case in which one of the players is aware of quantum strategies, whereas the other player views the classical game. Then, we showed that the quantum player obtains the best possible payoff resulting from playing an extended Nash equilibrium. An interesting question that arises here is whether the strategic position of the classical player can be improved by increasing his awareness about the game. Let us consider the case that player 1 views the quantum game. In addition, player 2 is aware of using quantum strategies by player 1, (\(\Gamma '_{2} = \Gamma _{QC}\)) and he knows that player 1 views the quantum strategies (\(\Gamma '_{21} = \Gamma _{QC}\)). The formal way of describing the problem is twofold. Player 1 can perceive the game with quantum strategies for both players (\(\Gamma '_{1} = \Gamma _{QQ}\)), or he may think that he is the only one who has access to all the unitary strategies (\(\Gamma ''_{1} = \Gamma _{QC}\)). As long as player 1 finds that player 2 is considering the classical game \(\Gamma _{CC}\) (i.e., \(\Gamma '_{12} = \Gamma ''_{12} = \Gamma _{CC}\)), both ways describe the same problem. Formally, the case in which the classical player is aware of using the quantum strategies by player 1 is given by collections of games \(\{\Gamma '_{v}\}\) or \(\{\Gamma ''_{v}\}\) where

$$\begin{aligned} \Gamma _{v}' = {\left\{ \begin{array}{ll} \Gamma _{QQ} &{}\text {if}~v\in \{\emptyset , 1\}, \\ \Gamma _{QC} &{}\text {if}~ v\in \{2,21\},\\ \Gamma _{CC} &{}\text {otherwise}, \end{array}\right. } \quad \text {or}\quad \Gamma _{v}'' = {\left\{ \begin{array}{ll} \Gamma _{QC} &{}\text {if}~ v\in \{\emptyset , 1, 2,21\},\\ \Gamma _{CC} &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(37)

In order to find out the reasonable outcome of (37), we need to determine player 2’s best reply to \(\tau ^*\).

Lemma 2

Player 2’s best reply to \(\tau ^*\) in the set \(\{\mathbb {1}, \text {i}\sigma _{x}\}^{\otimes 5}\) is of the form

$$\begin{aligned} \tau _2^*= \mathbb {1} \otimes \{\mathbb {1}, i\sigma _{x}\}^{\otimes 3} \otimes \mathbb {1}. \end{aligned}$$
(38)

Proof

Since player 2’s payoff function is linear in each pure strategy of \(\{\mathbb {1}, \text {i}\sigma _{x}\}^{\otimes 5}\) when player 1’s strategy is fixed, any mixed best reply cannot lead to a higher payoff. It is therefore sufficient to compare the expected payoffs of player 2 that correspond to strategy profiles from \(\tau ^*\otimes \{\mathbb {1}, \text {i}\sigma _{x}\}^{\otimes 5}\). We obtain the following four different outcomes

Strategy profile

Player 2’s payoff

\(\tau ^* \otimes \left( \mathbb {1}\otimes \{\mathbb {1}, \text {i}\sigma _{x}\}^{\otimes 3}\otimes \mathbb {1}\right) \)

\((P+T)\sin ^2{(\theta _{5}/2)} + 2P\cos ^2{(\theta _{5}/2)}\)

\(\tau ^* \otimes \left( \mathbb {1}\otimes \{\mathbb {1}, \text {i}\sigma _{x}\}^{\otimes 3}\otimes \text {i}\sigma _{x} \right) \)

\((P+R)\sin ^2{(\theta _{5}/2)} + (P+S)\cos ^2{(\theta _{5}/2)}\)

\(\tau ^* \otimes \left( \text {i}\sigma _{x} \otimes \{\mathbb {1}, \text {i}\sigma _{x}\}^{\otimes 2} \otimes \mathbb {1}\otimes \{\mathbb {1}, \text {i}\sigma _{x}\} \right) \)

\(S+P\)

\(\tau ^*\otimes \left( \text {i}\sigma _{x} \otimes \{\mathbb {1}, \text {i}\sigma _{x}\}^{\otimes 2}\otimes \text {i}\sigma _{x} \otimes \{\mathbb {1}, \text {i}\sigma _{x}\}\right) \)

2S

From the fact that \(T>R>P>S\), we see that player 2’s best reply is given by (38) for every \(\theta _{5} \in [0,\pi /2]\). \(\square \)

Lemma 2 enables us to determine all the extended Nash equilibria in \(\{\Gamma _{v}\}\) defined by (37).

Proposition 4

Let \(\{\Gamma _{v}\}_{v\in {\mathcal {V}}_{0}}\) be a game with unawareness defined by (37). Then, all extended Nash equilibria \(\{(\sigma )_{v}\}\) of \(\{\Gamma _{v}\}_{v\in {\mathcal {V}}_{0}}\) are of the form:

$$\begin{aligned} (\sigma )_{v} = {\left\{ \begin{array}{ll} \left( \tau ^*, \tau ^*_{2}\right) &{}\text {if}~v\in \{\emptyset , 2\},\\ \left( \tau ^*, (\text {i}\sigma _{x})^{\otimes 5}\right) &{}\text {if}~v\in \{1,21\}, \\ \left( (i\sigma _{x})^{\otimes 5}, (i\sigma _{x})^{\otimes 5} \right) &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(39)

Proof

The proof proceeds along the same lines as the proof of Proposition 3. Without restriction of generality, we can assume that \(\{\Gamma _{v}\} = \{\Gamma ''_{v}\}\) according to (37). Similar arguments to those in the proof of Proposition show that

$$\begin{aligned} (\sigma )_{v} = \left( (i\sigma _{x})^{\otimes 5}, (i\sigma _{x})^{\otimes 5} \right) , \quad v \in \{12, 212, 121, \dots \} \end{aligned}$$
(40)

and

$$\begin{aligned} (\sigma )_{v} = \left( \tau ^*, (\text {i}\sigma _{x})^{\otimes 5}\right) , \quad v\in \{1, 21\}. \end{aligned}$$
(41)

Since \((\sigma _{{\text {v}}})_{v} = (\sigma _{{\text {v}}})_{v{^{\hat{\,}}}{\text {v}}}\) (see Definition 3),

$$\begin{aligned} (\sigma _{1})_{2} = (\sigma _{1})_{21} = \tau ^*. \end{aligned}$$
(42)

Now, \((\sigma _{2})_{2}\) is a best reply to \((\sigma _{1})_{2}\) in the game \(\Gamma ''_{2}\). By Lemma 2, \((\sigma _{1})_{2} = \tau ^*_{2}\). Consequently,

$$\begin{aligned} (\sigma )_{2} = \left( \tau ^*, \tau ^*_{2}\right) . \end{aligned}$$
(43)

\(\square \)

According to (39), the result of the game is \((\sigma )_{2} = (\sigma )_{\emptyset } = (\tau ^*, \tau ^*_{2})\). It corresponds to the following payoffs: \((P+S)\sin ^2(\theta _{5}/2) + 2P\cos ^2(\theta _{5}/2)\) for player 1 and \((P+T)\sin ^2(\theta _{5}/2) + 2P\cos ^2(\theta _{5}/2)\) for player 2. Since player 1 does not have the most preferred parameter \(\theta _{5}\) in \(\tau ^*\), the difference between player 2 and player 1 payoff can take any value of \((T-S)\sin ^2(\theta _{5}/2)\). If we assume that the parameter \(\theta _{5}\) is uniformly distributed over \([0,\pi ]\) then, on average, player 2 gets

$$\begin{aligned} \frac{1}{\pi }(T-S)\int ^{\pi }_{0}\sin ^2{\frac{\theta _{5}}{2}}d\theta _{5} = \frac{1}{2}(T-S) \end{aligned}$$
(44)

more than player 1.

7 Summary and conclusions

In this paper, we proposed a new scheme for a twice repeated quantum game based on the fact that it is a particular case of an extensive form game. We analyzed the scheme for a twice repeated Prisoner’s Dilemma game, with focus on the situation where players have different perception of the game described by the formalism of the games with unawareness [12].

In particular, we determined the extended Nash equilibrium for the case where one player has access to full range of quantum strategies, while the other perceives the game as a classical one. We found best replies of the quantum player to the classical equilibrium strategy. This result is an extension of the corresponding one-stage version of the game, and it similarly allows quantum player to get the best possible outcome.

We also discussed high-order unawareness, where we slightly increase game perception of the classical player, so that he knows that his opponent is actually a quantum player, while the quantum player is not aware of that knowledge of the classical player. We show that this situation improves the strategic position of the classical player. As a result of playing the extended Nash equilibrium, the difference between the classical and quantum player’s payoffs is always nonnegative and strictly positive as long as the parameter \(\theta _{5} \ne 0\) in the player 1’s equilibrium action \(\tau ^*\). Therefore, the average payoff of the classical player is grater that the payoff of the quantum player.

Our results showed that the proposed scheme is a nontrivial generalization of the well-known EWL scheme. It can be easily extended to any repeated \(2\times 2\) quantum game. Additionally, in the future it should be possible to implement our scheme on already existing quantum hardware: IBM-Q or Rigetti computing. The research based on the proposed scheme is also promising in the incoming era of quantum internet as indicated by appearing quantum network simulators such as Simulaqron, which can be used to simulate two players playing over quantum net.