Rationality of Irrational Choice: The Logic of the Prisoner’s Dilemma

Preprint

Article

Rationality of Irrational Choice: The Logic of the Prisoner’s Dilemma

Altmetrics

Downloads

Views

Comments

Eugene Kagan^*

Alexander Rybalov,Ronald Yager

Eugene Kagan^*

Alexander Rybalov,Ronald Yager

This version is not peer-reviewed

Submitted:

19 March 2024

Posted:

21 March 2024

You are already at the latest version

Alerts

Abstract

The goal of the paper is to clarify the observed irrationality of decision making in conflict situations considered as one-step games of two players. To solve such situations, we consider the asymmetry in the relation of the players to their own rewards and the rewards of the opponents. Formalization of the decision-making process is based on recently developed non-commutative operators of multivalued logic algebra. The suggested method is applied to solve the well-known Prisoners’ dilemma game and the other situations of conflict, where it results in the expected strategies.

Keywords:

Subject: Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

“An Investigation of the Laws of Thought” by Bool [1] defined the direction of rational reasoning and analysis such that the truthiness of statements is predicted by formal logic, and the chances of events are predicted by probability theory.

However, with the development of quantum mechanics and further discussions on the nature of logic and probability, it was realized that logical implications and probabilistic reasoning are not universally valid.

For example, Birkhoff and von Neumann [2] demonstrated that because of the influence of the observer the logic of quantum mechanics is not distributive, and Ramsey [3] (see also [4], Appendix I) considered the subjective probabilities, which are defined from the point of view of the subject involved into the objects’ activity and are not necessary equivalent to the objective probabilities.

Later Kahneman and Tversky [5] confirmed an irrationality of the human decision making and demonstrated that usually the people’s reasoning does not result in maximal expected reward or to minimal expected payoff. Recently Ruggeri et al. [6] justified these results in the experiments with millions of participants from different countries.

Descriptions of irrational reasoning and prediction of subjective decisions implement several methods. Some of them follow the utility theory [7] which considers the choices with respect to the utility function and relation of the decision maker to possible risks. The others implement the non-Bayesian beliefs derived from game theoretical approach to the analyzed situation [8].

The methods that follow logical analysis are based on different versions of non-standard logic from the indicated above logic of quantum mechanics [2] to the probabilistic logic [9], fuzzy logic [10,11] and the possibility theory [12]. The origins of such extensions of Boolean logic can be tracked back to the Łukasiewicz three-valued logic [13] and its further extension – the Łukasiewicz-Tarski

ℵ_{0}

-valued logic [14].

In parallel, Lambek [15] initiated the studies of non-commutative logics, which were applied for description of the structures of natural languages [16,17] and then were adopted for modeling preference relations [18,19]. These results allowed direct logical description of the statements, which’s truthiness depends on the order of the terms, and modeling the decisions with preferences; for the problems and the state-of-the-art in the field of decision-making with preferences, see, e.g., [20,21].

In this paper, we apply recently developed non-communicative logical operators [22] to the well-known decision-making problem – the Prisoners’ dilemma and demonstrate that considering the asymmetry in the prisoner’s judgements leads to the solution of the game.

2. Problem Formulation

The Prisoners’ dilemma is a game of two players,

a_{1}

and

a_{2}

, with the strategies

s_{1}

and

s_{2}

such that each player chooses the strategy without any knowledge about the strategy chosen by the other player.

The payoffs of the players in the game are defined as follows

where

u < v < x < y

. Following this table,

—: if both players choose the strategy $s_{1}$ , then each of them pays $x$ ;
—: if both players choose the strategy $s_{2}$ , then each of them pays $v$ ;
—: if the first player $a_{1}$ chooses the strategy $s_{1}$ and the second player $a_{2}$ chooses the strategy $s_{2}$ , then player $a_{1}$ pays $u$ and player $a_{2}$ pays $y$ ; and
—: if the first player $a_{1}$ chooses the strategy $s_{2}$ and the second player $a_{2}$ chooses the strategy $s_{1}$ , then player $a_{1}$ pays $y$ and player $a_{2}$ pays $u$ .

In its original form the prisoners’ dilemma is formulated as follows. Let the strategies be

s_{1}

– to keep silent and

s_{2}

– to testify, and the payoffs

u = 0

v = 1

x = 2

and

y = 3

be the years which the prisoner will serve in the prison. Then, each prisoner stands against a dilemma either to keep silent (

s_{1}

) or to testify (

s_{2}

The payoff of each prisoner depends on the choice of the other prisoner. The dilemma of the prisoner

a_{1}

—: if $a_{1}$ keeps silence and $a_{2}$ keeps silence, then each of the prisoners serves $1$ year in the prison,
—: if $a_{1}$ testifies and $a_{2}$ testifies, then each of them serves $2$ years in the prison,
—: if $a_{1}$ keeps silence but $a_{2}$ testifies, then $a_{1}$ serves $3$ years in the prison and $a_{2}$ goes free, and
—: if $a_{1}$ testifies but $a_{2}$ keeps silence, then $a_{1}$ goes free and $a_{2}$ serves $3$ years in the prison.

The dilemma of the prisoner

a_{2}

is the same.

Certainly, the optimal strategy for both prisoners is mutual silence

(s_{1}, s_{1})

. But since each of them is not aware about the choice of the other prisoner, the best response of each prisoner is to testify. Thus, the Nash equilibrium in the game is mutual testifying

(s_{2}, s_{2})

, which is not optimal.

The Prisoners’ dilemma demonstrates that even if the player is informed about optimal strategies, the chosen strategy can be irrational because of the influence of the unknown choice of the other player.

Such irrationality gave a rise to innumerous studies in communication and conscience in conflict situations aimed to investigate the strategies which lead to optimal choice; probably the most remarkable books in the field are [23,24]. For repetitive version of the game, it was found that optimal strategy of each prisoner is the tit-for-tat strategy according to which each prisoner acts as the opponent and returns to cooperation after revenge.

In the paper, we consider the problem from the opposite point of view and seek for a method which predicts rational or irrational choice of the prisoner with respect to the given payoffs of each prisoner. In other words, the problem is to define the method which demonstrates the rationality of irrational choice of the prisoner.

3. Suggested Solution

The suggested solution considers the asymmetry in the relation of the player to the own payoff or reward and to the payoff or reward of the other player. We assume that the player considers the decision of the other player as a background or a context for the own decision and makes the decision using this context.

3.1. Non-Commutative Multivalued Logic Operators

The decision-making process uses the recently developed non-communicative uninorm and absorbing norm aggregators [22] which implement the operators of the non-commutative logic algebra [19].

Let

\oplus_{θ} : [0, 1] \times [0, 1] \to [0, 1]

be the uninorm [25] with neutral or identity element

θ \in [0, 1]

and

\otimes_{ϑ} : [0, 1] \times [0, 1] \to [0, 1]

be the absorbing norm [26] with absorbing element

ϑ \in [0, 1]

. With respect to the value

θ

, the uninorm

\oplus_{1}

is the

t

-norm (or multivalued

a n d

operator) and

\oplus_{0}

is the

t

-conorm (or multivalued

o r

operator), and the absorbing norm

\otimes_{ϑ}

is a multivalued version of the Boolean

n o t x o r

operator.

The uninorm

\oplus_{θ}

and the absorbing norm

\otimes_{ϑ}

act on the interval

[0, 1]

and form an algebra [27,28]

A_{η} = 〈[0,1], \oplus_{θ}, \otimes_{ϑ}〉,

(1)

in which

\oplus_{θ}

plays a role of the summation with the zero

θ

such that

θ \oplus_{θ} x = x

and

\otimes_{ϑ}

plays a role of multiplication with the unit

ϑ

such that

ϑ \otimes_{ϑ} x = ϑ

x \in [0, 1]

. If

θ = ϑ

and

u_{θ} (x) = v_{ϑ} (x)

for any

x \in [0, 1]

, then the algebra

A_{η}

is distributive.

It was proven [29] that there exist the functions

u_{θ} : (0, 1) \to (- \infty, \infty)

and

v_{ϑ} : (0, 1) \to (- \infty, \infty)

called generator functions such that for any

x, y \in (0, 1)

x \oplus_{θ} y = u_{θ}^{- 1} (u_{θ} (x) + u_{θ} (y)),

(2)

x \otimes_{ϑ} y = v_{ϑ}^{- 1} (v_{ϑ} (x) \times v_{ϑ} (y)) .

(3)

For the boundary values

x, y \in \{0, 1\}

, it is assumed that the norms

\oplus_{θ}

and

\otimes_{ϑ}

are Boolean operators:

\oplus_{θ}

a n d

or or operator with respect to the value of

θ

and

\otimes_{ϑ}

n o t x o r

operator for any

ϑ

Generator functions

u_{θ}

and

v_{ϑ}

are monotonously increasing functions which can be defined following different assumptions. It was demonstrated [27] that the inverse generator functions

u_{θ}^{- 1}

and

v_{ϑ}^{- 1}

meet the requirements of cumulative probability distributions that relates multivalued logic algebra

A_{η}

with probability theory and probabilistic logic [9].

The non-commutative multivalued logic algebra

A_{l | η | r}

[22] extends algebra

A_{η}

using representation (2) and (3) of generator functions and confirms to definition of non-commutative logic algebras [19].

The non-commutative uninorm

\oplus_{θ_{l} | θ | θ_{r}} : [0, 1] \times [0, 1] \to [0, 1]

and absorbing norm

\otimes_{ϑ_{l} | ϑ | ϑ_{r}} : [0, 1] \times [0, 1] \to [0, 1]

are defined as follows

x \oplus_{θ_{l} | θ | θ_{r}} y = u_{θ}^{- 1} (u_{θ_{l}} (x) + u_{θ_{r}} (y)),

(4)

x \otimes_{ϑ_{l} | ϑ | ϑ_{r}} y = v_{ϑ}^{- 1} (v_{ϑ_{l}} (x) \times v_{ϑ_{r}} (y)),

(5)

where for convenience we assume that

θ_{l} \leq θ \leq θ_{r}

and

ϑ_{l} \leq ϑ \leq ϑ_{r}

. If

θ_{l} \neq θ_{r}

and

ϑ_{l} \neq ϑ_{r}

, then, respectively, the uninorm

\oplus_{θ_{l} | θ | θ_{r}}

and absorbing norm

\otimes_{ϑ_{l} | ϑ | ϑ_{r}}

are non-commutative, and if

θ = θ_{l} = θ_{r}

and

ϑ = ϑ_{l} = ϑ_{r}

, then these operators are equivalent to the norms

\oplus_{θ}

and

\otimes_{ϑ}

The logic algebra

A_{l | η | r} = 〈[0,1], \oplus_{θ_{l} | θ | θ_{r}}, \otimes_{ϑ_{l} | ϑ | ϑ_{r}}〉

(6)

with the operators defined by the uninorm

\oplus_{θ_{l} | θ | θ_{r}}

and absorbing norm

\otimes_{ϑ_{l} | ϑ | ϑ_{r}}

is the non-commutative version of the algebra

A_{η}

3.2. Application of the Non-Commutative Operators to the Prisoners’ Dilemma

Let us consider the Prisoners’ dilemma in the form of bi-matrix game [30], where the matrices

R^{1} = {(r_{i j}^{1})}_{2 \times 2} and r^{2} = {(r_{i j}^{2})}_{2 \times 2}

(7)

represent the payoffs of the first and the second player, respectively, as negative rewards. In the other words, if the players payoff is

p

, then the reward, which is received by this player is

r = - p

and vice versa.

In different versions of the game the values of the rewards can be defined arbitrarily. Then, at first, they are normalized as follows. Let

r_{m a x}^{1} = \max \{|r_{i j}^{1}|, i, j = 1, 2\} and r_{m a x}^{2} = \max \{|r_{i j}^{2}|, i, j = 1, 2\}

(8)

be maximal absolute rewards of the players. The maximal absolute reward in the game is

r_{m a x} = \max \{r_{m a x}^{1}, r_{m a x}^{2}\} .

(9)

Usually, in the Prisoners’ dilemma the payoffs and so – the rewards have the same values; hence the absolute maximal values are also equivalent:

r_{m a x} = r_{m a x}^{1} = r_{m a x}^{2}

Then, the matrices of the normalized rewards are

A^{1} = {(a_{i j}^{1})}_{2 \times 2} and A^{2} = {(a_{i j}^{2})}_{2 \times 2},

(10)

where (

i, j = 1, 2

)

a_{i j}^{1} = r_{i j}^{1} / r_{m a x} and a_{i j}^{2} = r_{i j}^{2} / r_{m a x} .

(11)

Note that the normalization preserves the signs of the rewards such that the negative rewards which are the payoffs remain negative and positive rewards remain positive.

The conducted normalization does not change the structure of the game. Together with that the values

r_{m a x}^{1}

and

r_{m a x}^{2}

provide the best rewards or the worst payoffs from which usually start the judgements aimed on better decisions.

The next normalization transforms the rewards

A^{1}

and

A^{2}

to nonnegative. For convenience, we apply the inverse generator functions such that the resulting matrices

B^{1} = {(b_{i j}^{1})}_{2 \times 2} and B^{2} = {(b_{i j}^{2})}_{2 \times 2}

(12)

include the values (

i, j = 1, 2

)

b_{i j}^{1} = u_{θ}^{- 1} (a_{i j}^{1}), b_{i j}^{2} = u_{θ}^{- 1} (a_{i j}^{2}) or b_{i j}^{1} = v_{θ}^{- 1} (a_{i j}^{1}), b_{i j}^{2} = v_{θ}^{- 1} (a_{i j}^{2}),

(13)

where

u_{θ}^{- 1}

and

v_{θ}^{- 1}

are inverse generator functions of the uninorm and absorbing norm, respectively.

Following the probabilistic interpretation of the uninorm and absorbing norm [27], the values

b_{i j}^{1}

and

b_{i j}^{2}

i, j = 1, 2

, are the probabilities that the normalized rewards are at maximum

a_{i j}^{1}

and

a_{i j}^{2}

, correspondingly. Hence, the normalized values

b_{i j}^{1}

and

b_{i j}^{2}

i, j = 1, 2

, can be interpreted as subjective believes of the players in the equitable rewards.

Such interpretation follows the line of Ramsey interpretation of probabilities [3]. In terms of the Prisoners’ dilemma, since each of the prisoners is a criminal and knows about the crime, each of them completely believes that maximal payoff is justified, and less believes in the justification of the smaller payoffs.

The game with the reward matrices

B^{1}

and

B^{2}

is equivalent to the game with the reward matrices

R^{1}

and

R^{2}

, but in contrast to the values

r_{i j}^{1}

and

r_{i j}^{2}

, which are real rewards of the players, the values

b_{i j}^{1}

and

b_{i j}^{2}

are considered as subjective beliefs of the players to obtain the corresponding rewards

r_{i j}^{1}

and

r_{i j}^{2}

To define the choice of the players’ strategies we assume that the relation of the player to the own belief to obtain certain reward differs from the relation to the belief of the opponent to obtain this reward. We consider the beliefs as the arguments of the operators

\oplus_{θ_{l} | θ | θ_{r}}

and

\otimes_{ϑ_{l} | ϑ | ϑ_{r}}

in the algebra

A_{l | η | r}

. The resulting values are the trusts

t_{i j}^{1}

and

t_{i j}^{2}

of the players in their strategies based on the beliefs

b_{i j}^{1}

and

b_{i j}^{2}

i, j = 1, 2

The trust matrices are defined by the absorbing norm as follows

T^{1} = {(t_{i j}^{1})}_{2 \times 2} and T^{2} = {(t_{i j}^{2})}_{2 \times 2}

(14)

where (

i, j = 1, 2

)

t_{i j}^{1} = b_{i j}^{1} \otimes_{ϑ_{l} | ϑ | ϑ_{r}} b_{i j}^{2} and t_{i j}^{2} = b_{i j}^{2} \otimes_{ϑ_{l} | ϑ | ϑ_{r}} b_{i j}^{1} .

(15)

Such definition assumes that the players act as opponents and implements their tit-for-tat relations. Each player considers the own belief and the belief of the opponent and forms the aggregated trust with the stress on the own belief.

The choice of the strategy is conducted using the uninorm, which aggregates the trusts of the players in their strategies. The vectors of the aggregation results are

D^{1} = (d_{1}^{1}, d_{2}^{1}) and D^{2} = (d_{1}^{2}, d_{2}^{2})

(16)

where

d_{1}^{1} = t_{11}^{1} \oplus_{ϑ_{l} | ϑ | ϑ_{r}} t_{12}^{1} and d_{2}^{1} = t_{21}^{1} \oplus_{ϑ_{l} | ϑ | ϑ_{r}} t_{22}^{1},

(17)

d_{1}^{2} = t_{11}^{2} \oplus_{ϑ_{l} | ϑ | ϑ_{r}} t_{21}^{2} and d_{2}^{2} = t_{12}^{2} \oplus_{ϑ_{l} | ϑ | ϑ_{r}} t_{22}^{2} .

(18)

Note that in the last aggregations each player considers the own trusts and aggregates them for each strategy.

Finally, the strategy chosen by each player is the strategy for which the aggregated trusts reach their maximum (ties are broken randomly)

s^{1} = arg max (d_{i}^{1}, i = 1, 2) and s^{2} = arg max (d_{i}^{2}, i = 1, 2) .

(19)

By the equation (19) the strategies are defined by the indices

s^{1}, s^{1} \in \{1, 2\}

such that the meaning of each strategy is specified by the game formulation that is to keep silence or to testify.

3.3. Example of the Prisoners’ Dilemma

To clarify the presented above solution let us consider the Prisoners’ dilemma with the payoffs

u = 0

v = 1

x = 2

and

y = 3

. The payoff matrix of this game is

P = (\begin{matrix} (1, 1) & (3, 0) \\ (0, 3) & (2, 2) \end{matrix}),

and the reward matrices of the players are

R^{1} = (\begin{matrix} - 1 & - 3 \\ 0 & - 2 \end{matrix}) and R^{2} = (\begin{matrix} - 1 & 0 \\ - 3 & - 2 \end{matrix}) .

Maximal absolute reward in both matrices is

r_{m a x} = r_{m a x}^{1} = r_{m a x}^{2} = 3

; hence, the normalized rewards are

A^{1} = (\begin{matrix} - 1 / 3 & - 1 \\ 0 & - 2 / 3 \end{matrix}) and A^{2} = (\begin{matrix} - 1 / 3 & 0 \\ - 1 & - 2 / 3 \end{matrix}) .

To define the players’ beliefs, assume that the uninorm and absorbing norm are defined by the same generator function

w_{η} = u_{η} = v_{η}

w_{η} (x) = - \ln (x^{- 1 / η} - 1), x \in (0,1),

(20)

with the parameter

η = θ = ϑ

. Consequently, the inverse function is

w_{η}^{- 1} (ξ) = 1 / {(1 + \exp (- ξ))}^{η}, ξ \in (- \infty, \infty) .

(21)

The left-side and right-side values of the parameters are defined by the linear transform

θ_{l} = η_{l} = η / 2 and η_{r} = (η + 1) / 2 .

(22)

Let

η = 0.5

; then

η_{l} = 0.25

and

η_{r} = 0.75

which satisfy the values of the subjective false and subjective truth [31]. Then, the beliefs matrices defined by the equations (12) and (13) are

B^{1} = (\begin{matrix} 0.42 & 0.27 \\ 0.5 & 0.34 \end{matrix}) and B^{2} = (\begin{matrix} 0.42 & 0.5 \\ 0.27 & 0.34 \end{matrix}) .

Analysis of these matrices together with the payoff matrices

P^{1}

and

P^{2}

shows that subjectively each player is nearly sure that the payoff will be

3

years served in prison, less sure that the payoff will be

2

years, nearly unsure that the payoff will be

1

year and unsure that the payoff will be

0

. Note that both the payoffs and the beliefs are defined separately for each player.

Now let us calculate the trusts of each player, which depend both on the own belief and the belief of the other player. Applying the absorbing norm with the generator function (20) and its inverse function (21) with the parameters

η = 0.5

η_{l} = 0.25

and

η_{r} = 0.75

, we obtain

T^{1} = (\begin{matrix} 0.22 & 0.06 \\ 0.41 & 0.24 \end{matrix}) and T^{2} = (\begin{matrix} 0.22 & 0.41 \\ 0.06 & 0.24 \end{matrix}) .

The trusts aggregated by the uninorm with the same generator function and the parameters are

D^{1} = (0.02, 0.18) and D^{2} = (0.02, 0.18) .

As a result, each player chooses the second strategy – to testify

s^{1} = 2 and s^{2} = 2,

which coincides with the indicated above the Nash equilibrium that is not optimal.

4. Two Other Examples

Let us consider the other examples of the bi-matrix games. Below, we define the matrix of the game and further calculations without additional comments.

The battle of sexes [4]. In the game, the players choose which concert to attend – Stravinsky (strategy

1

) or Bach (strategy

2

). The first player prefers the concert of Bach (strategy

2

), and the second – the concert of Stravinsky (strategy

1

), and both prefer to attend any concert together.

The reward matrices of the players are

R^{1} = (\begin{matrix} 2 & 0 \\ 0 & 3 \end{matrix}) and R^{2} = (\begin{matrix} 3 & 0 \\ 0 & 2 \end{matrix}),

which result in the beliefs matrices

B^{1} = (\begin{matrix} 0.66 & 0.5 \\ 0.5 & 0.73 \end{matrix}) and B^{2} = (\begin{matrix} 0.73 & 0.5 \\ 0.5 & 0.66 \end{matrix}),

and the trust matrices

T^{1} = (\begin{matrix} 0.38 & 0.23 \\ 0.23 & 0.56 \end{matrix}) and T^{2} = (\begin{matrix} 0.56 & 0.23 \\ 0.23 & 0.38 \end{matrix}) .

Then, the aggregated trusts are

D^{1} = (0.15, 0.27) and D^{2} = (0.27, 0.15),

and resulting strategies are

s^{1} = 2 and s^{2} = 1,

as it was declared.

The zero-sum game. In this abstract game we assume that the reward matrices of the players are

R^{1} = (\begin{matrix} 2 & - 1 \\ - 3 & 2 \end{matrix}) and R^{2} = (\begin{matrix} - 2 & 1 \\ 3 & - 2 \end{matrix}) .

Then, the beliefs matrices

B^{1} = (\begin{matrix} 0.66 & 0.42 \\ 0.27 & 0.66 \end{matrix}) and B^{2} = (\begin{matrix} 0.34 & 0.58 \\ 0.73 & 0.34 \end{matrix}) .

The trust matrices are

T^{1} = (\begin{matrix} 0.46 & 0.10 \\ 0.01 & 0.46 \end{matrix}) and T^{2} = (\begin{matrix} 0.03 & 0.37 \\ 0.51 & 0.03 \end{matrix}),

and the aggregated trusts are

D^{1} = (0.09, 0.01) and D^{2} = (0.03, 0.02) .

Then, resulting strategies are

s^{1} = 1 and s^{2} = 1 .

The presented examples demonstrate that the suggested method correctly specifies the strategies of the players in the cases of the decisions, which sound irrational. In other words, it demonstrates the rationality of irrational choices of the players and can be used for explanation of the made decisions and for forecasting subjective decisions, which will be made in the future.

5. Discussion

The goal of the paper is to clarify the principles of decision making in situations where the choices of the agents do not follow usual principles of rationality. We suggest to use recently developed non-commutative operators of multi-valued logic algebra in the decision-making with irrational decisions. We apply these operators for specification of the strategies in the well-known two Prisoners’ dilemma game.

The used uninorm and absorbing norm operators aggregate the subjective beliefs of the players to obtain certain rewards such that the arguments of the aggregators have different influence on the resulting value. In certain sense such aggregation of the beliefs follows a line of using the utility function [7]. However, in contrast, to the utility function, which is defined arbitrary, the suggested aggregators are the part of formally defined logic algebra and are related with the probability distributions that allows their consideration in wider and, at the same time, more formal framework.

The presented procedure starts with specification of players’ beliefs, which are based on the normalized rewards. Here we use maximal absolute rewards (see equations (8) and (9)). The other possibility is to use the sums

r_{s u m}^{1} = \sum_{i j} |r_{i j}^{1}|

and

r_{s u m}^{2} = \sum_{i j} |r_{i j}^{1}|

of the absolute rewards and to define

r_{m a x} = \max \{r_{s u m}^{1}, r_{s u m}^{2}\}

, which is more natural from the probabilistic point of view, but is hardly interpreted in the considered framework.

Also, instead of defining beliefs using the inverse generator functions (see equation (13)), simple formulas

b_{i j}^{1} = (a_{i j}^{1} + 1) / 2

and

b_{i j}^{2} = (a_{i j}^{2} + 1) / 2

i, j = 1, 2

, can be used. However, despite formal correctness, the use of such formulars can be hardly interpreted. Since inverse generator functions are probability density functions, they specify the probabilities of the appropriate events which are the levels of knowledge or beliefs of the players, while the indicated formulas have not such interpretation.

Note that the same simple formulars are used in the definition of the left-side and right-side values of the parameters (see equation (22)), and since here the interpretation is not required, the use of such formulas is justified.

The considered example of Prisoners’ dilemma and additional two games demonstrate that the suggested method results in the strategies which are chosen by the players. Such verification, certainly, does not provide complete justification or proof of the method, but explains the choices and confirms the asymmetry in the consideration of their own rewards and the rewards of the opponents.

6. Conclusion

In the paper, we suggest a method of decision-making under uncertainty which resolves an observed irrationality of the judgements. The method is applied to the one-step games of two players where it successfully predicts the players’ choices.

The method utilizes asymmetry in the relation of the player to the own reward and the reward of the opponent that is formalized using the non-commutative operators of multivalued logic algebra.

The obtained results explain the appearing irrationality in the players’ judgements and demonstrate the rationality of irrational choices.

References

Bool, G. An Investigation of the Laws of Thought, on Which are Founded the Mathematical Theories of Logic and Probabilities; Walton and Maberly: London, UK, 1854. [Google Scholar]
Birkhoff, G.; von Neumann, J. The logic of quantum mechanics. Annals of Mathematics 1936, 37, 823–843. [Google Scholar] [CrossRef]
Ramsey, F.R. Truth and probability. In The Foundations of Mathematics and other Logical Essays; 1926; pp. 156–198. [Google Scholar]
Luce, R.D.; Raiffa, H. Games and Decisions; John Wiley & Sons: New York, 1957. [Google Scholar]
Kahneman, D.; Tversky, A. Prospect theory: An analysis of decision under risk. Econometrica 1979, 47, 263–292. [Google Scholar] [CrossRef]
Ruggeri, K.; Ali, S.; Berge, M.L.; Bertoldo, G.; Bjørndal, L.D.; Cortijos-Bernabeu, A.; Davison, C.; Demić, E.; Esteban-Serna, C.; Friedemann, M.; et al. Replicating patterns of prospect theory for decision under risk. Nature Human Behavior 2020, 4, 622–633. [Google Scholar] [CrossRef]
Friedman, M.; Savage, L. The utility analysis of choices involving risks. J. Political Economy 1948, 56, 279–304. [Google Scholar] [CrossRef]
Wald, A. Statistical decision functions. The Annals of Mathematical Statistics 1949, 20, 165–205. [Google Scholar] [CrossRef]
Nilsson, N.J. Probabilistic logic. Artificial Intelligence 1986, 28, 71–87. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy sets. Information and Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy logic and approximate reasoning. Synthese 1975, 30, 407–428. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H. Possibility Theory; Plenum: New York, NY, 1988. [Google Scholar]
Łukasiewicz, J. On three-valued logic. Ruch Filozofia 1920, 5, 169–171. [Google Scholar]
Łukasiewicz, J.; Tarski, A. Untersuchungen über den Aussagenkalkül. Comptes Rendus des Séances de la Société des Sciences et des Lettres de Varsovie Classe III 1930, 23, 30–50. [Google Scholar]
Lambek, J. The mathematics of sentence structure. American Mathematical Monthly 1958, 65, 154–170. [Google Scholar] [CrossRef]
Schmerling, S. Asymmetric conjunction and rules of conversation. In Syntax and Semantics. Speech Acts; Cole, P., Morgan, J., Eds.; Academic Press: New York, NY, USA, 1975; Volume 3, pp. 211–231. [Google Scholar]
Na, Y.; Huck, G. On extracting from asymmetrical structures. In The Joy of Grammar: A Festschrift in Honor of James; McCawley, D., Brentari, D., Larson, G., MacLeod, L., Eds.; John Benjamins: Amsterdam, The Netherlands, 1992; pp. 119–136. [Google Scholar]
Yager, R.; Rybalov, A. Non-commutative self-identity aggregation. Fuzzy Sets Syst. 1997, 85, 73–82. [Google Scholar] [CrossRef]
Ciungu, L. Non-Commutative Multi-Valued Logic Algebras; Springer: Cham, Switzerland; Heidelberg, Germany, 2014. [Google Scholar]
Fodor, J.; De Baets, B.; Perny, P. (Eds.) Preferences and Decisions under Incomplete Knowledge; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
Greco, S.; Pereira, R.; Squillante, M.; Yager, R.; Kacprzyk, J. (Eds.) Preferences and Decisions. Models and Applications; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Kagan, E.; Novoselsky, A.; Ramon, D.; Rybalov, A. Non-Commutative logic for collective decision-making with perception bias. Robotics 2023, 12, 76. [Google Scholar] [CrossRef]
Axelrod, R. The Evolution of Cooperation; Basic Books: NY, 1984. [Google Scholar]
Rapoport, A. Strategy and Conscience; Harper and Row: NY, 1964. [Google Scholar]
Yager, R.; Rybalov, A. Uninorm aggregation operators. Fuzzy Sets and Systems 1996, 80, 111–120. [Google Scholar] [CrossRef]
Batyrshin, I.; Kaynak, O.; Rudas, I. Fuzzy modeling based on generalized conjunction operations. IEEE Trans. Fuzzy Systems 2002, 10, 678–683. [Google Scholar] [CrossRef]
Kagan, E.; Rybalov, A.; Siegelmann, H.; Yager, R. Probability-generated aggregators. Int. J. Intelligent Systems 2013, 28, 709–727. [Google Scholar] [CrossRef]
Fodor, J.; Rudas, I.; Bede, B. Uninorms and absorbing norms with applications to image processing. In Proceedings of the Information Conference SISY, 4th Serbian-Hungarian Joint Symposium on Intelligent Systems, Subotica, Serbia, 29–30 September 2006; pp. 59–72. [Google Scholar]
Fodor, J.; Yager, R.; Rybalov, A. Structure of uninorms. Int. J. Uncertainty, Fuzziness and Knowledge-Based Systems 1997, 5, 411–427. [Google Scholar] [CrossRef]
Owen, G. Game Theory; Academic Press: San Diego, CA, 1995. [Google Scholar]
Kagan, E.; Rybalov, A.; Yager, R. Subjective Markov process with fuzzy aggregations. In Proceedings of the 12th International Conference Agents and Artificial Intelligence ICAART 2020, Valetta, Malta, 22–24 February 2020; Volume 2, pp. 386–394. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Rationality of Irrational Choice: The Logic of the Prisoner’s Dilemma

Abstract

1. Introduction

2. Problem Formulation

3. Suggested Solution

3.1. Non-Commutative Multivalued Logic Operators

3.2. Application of the Non-Commutative Operators to the Prisoners’ Dilemma

3.3. Example of the Prisoners’ Dilemma

4. Two Other Examples

5. Discussion

6. Conclusion

References

MDPI Initiatives

Important Links

Subscribe