1. Introduction
“An Investigation of the Laws of Thought” by Bool [
1] defined the direction of rational reasoning and analysis such that the truthiness of statements is predicted by formal logic, and the chances of events are predicted by probability theory.
However, with the development of quantum mechanics and further discussions on the nature of logic and probability, it was realized that logical implications and probabilistic reasoning are not universally valid.
For example, Birkhoff and von Neumann [
2] demonstrated that because of the influence of the observer the logic of quantum mechanics is not distributive, and Ramsey [
3] (see also [
4], Appendix I) considered the subjective probabilities, which are defined from the point of view of the subject involved into the objects’ activity and are not necessary equivalent to the objective probabilities.
Later Kahneman and Tversky [
5] confirmed an irrationality of the human decision making and demonstrated that usually the people’s reasoning does not result in maximal expected reward or to minimal expected payoff. Recently Ruggeri et al. [
6] justified these results in the experiments with millions of participants from different countries.
Descriptions of irrational reasoning and prediction of subjective decisions implement several methods. Some of them follow the utility theory [
7] which considers the choices with respect to the utility function and relation of the decision maker to possible risks. The others implement the non-Bayesian beliefs derived from game theoretical approach to the analyzed situation [
8].
The methods that follow logical analysis are based on different versions of non-standard logic from the indicated above logic of quantum mechanics [
2] to the probabilistic logic [
9], fuzzy logic [
10,
11] and the possibility theory [
12]. The origins of such extensions of Boolean logic can be tracked back to the Łukasiewicz three-valued logic [
13] and its further extension – the Łukasiewicz-Tarski
-valued logic [
14].
In parallel, Lambek [
15] initiated the studies of non-commutative logics, which were applied for description of the structures of natural languages [
16,
17] and then were adopted for modeling preference relations [
18,
19]. These results allowed direct logical description of the statements, which’s truthiness depends on the order of the terms, and modeling the decisions with preferences; for the problems and the state-of-the-art in the field of decision-making with preferences, see, e.g., [
20,
21].
In this paper, we apply recently developed non-communicative logical operators [
22] to the well-known decision-making problem – the Prisoners’ dilemma and demonstrate that considering the asymmetry in the prisoner’s judgements leads to the solution of the game.
2. Problem Formulation
The Prisoners’ dilemma is a game of two players, and , with the strategies and such that each player chooses the strategy without any knowledge about the strategy chosen by the other player.
The payoffs of the players in the game are defined as follows
where . Following this table,
- —
if both players choose the strategy , then each of them pays ;
- —
if both players choose the strategy , then each of them pays ;
- —
if the first player chooses the strategy and the second player chooses the strategy , then player pays and player pays ; and
- —
if the first player chooses the strategy and the second player chooses the strategy , then player pays and player pays .
In its original form the prisoners’ dilemma is formulated as follows. Let the strategies be – to keep silent and – to testify, and the payoffs , , and be the years which the prisoner will serve in the prison. Then, each prisoner stands against a dilemma either to keep silent () or to testify ().
The payoff of each prisoner depends on the choice of the other prisoner. The dilemma of the prisoner is
- —
if keeps silence and keeps silence, then each of the prisoners serves year in the prison,
- —
if testifies and testifies, then each of them serves years in the prison,
- —
if keeps silence but testifies, then serves years in the prison and goes free, and
- —
if testifies but keeps silence, then goes free and serves years in the prison.
The dilemma of the prisoner is the same.
Certainly, the optimal strategy for both prisoners is mutual silence . But since each of them is not aware about the choice of the other prisoner, the best response of each prisoner is to testify. Thus, the Nash equilibrium in the game is mutual testifying , which is not optimal.
The Prisoners’ dilemma demonstrates that even if the player is informed about optimal strategies, the chosen strategy can be irrational because of the influence of the unknown choice of the other player.
Such irrationality gave a rise to innumerous studies in communication and conscience in conflict situations aimed to investigate the strategies which lead to optimal choice; probably the most remarkable books in the field are [
23,
24]. For repetitive version of the game, it was found that optimal strategy of each prisoner is the tit-for-tat strategy according to which each prisoner acts as the opponent and returns to cooperation after revenge.
In the paper, we consider the problem from the opposite point of view and seek for a method which predicts rational or irrational choice of the prisoner with respect to the given payoffs of each prisoner. In other words, the problem is to define the method which demonstrates the rationality of irrational choice of the prisoner.
3. Suggested Solution
The suggested solution considers the asymmetry in the relation of the player to the own payoff or reward and to the payoff or reward of the other player. We assume that the player considers the decision of the other player as a background or a context for the own decision and makes the decision using this context.
3.1. Non-Commutative Multivalued Logic Operators
The decision-making process uses the recently developed non-communicative uninorm and absorbing norm aggregators [
22] which implement the operators of the non-commutative logic algebra [
19].
Let
be the uninorm [
25] with neutral or identity element
and
be the absorbing norm [
26] with absorbing element
. With respect to the value
, the uninorm
is the
-norm (or multivalued
operator) and
is the
-conorm (or multivalued
operator), and the absorbing norm
is a multivalued version of the Boolean
operator.
The uninorm
and the absorbing norm
act on the interval
and form an algebra [
27,
28]
in which
plays a role of the summation with the zero
such that
and
plays a role of multiplication with the unit
such that
,
. If
and
for any
, then the algebra
is distributive.
It was proven [
29] that there exist the functions
and
called generator functions such that for any
For the boundary values , it is assumed that the norms and are Boolean operators: is or or operator with respect to the value of and is operator for any .
Generator functions
and
are monotonously increasing functions which can be defined following different assumptions. It was demonstrated [
27] that the inverse generator functions
and
meet the requirements of cumulative probability distributions that relates multivalued logic algebra
with probability theory and probabilistic logic [
9].
The non-commutative multivalued logic algebra
[
22] extends algebra
using representation (2) and (3) of generator functions and confirms to definition of non-commutative logic algebras [
19].
The non-commutative uninorm
and absorbing norm
are defined as follows
where for convenience we assume that
and
. If
and
, then, respectively, the uninorm
and absorbing norm
are non-commutative, and if
and
, then these operators are equivalent to the norms
and
.
The logic algebra
with the operators defined by the uninorm
and absorbing norm
is the non-commutative version of the algebra
.
3.2. Application of the Non-Commutative Operators to the Prisoners’ Dilemma
Let us consider the Prisoners’ dilemma in the form of bi-matrix game [
30], where the matrices
represent the payoffs of the first and the second player, respectively, as negative rewards. In the other words, if the players payoff is
, then the reward, which is received by this player is
and vice versa.
In different versions of the game the values of the rewards can be defined arbitrarily. Then, at first, they are normalized as follows. Let
be maximal absolute rewards of the players. The maximal absolute reward in the game is
Usually, in the Prisoners’ dilemma the payoffs and so – the rewards have the same values; hence the absolute maximal values are also equivalent: .
Then, the matrices of the normalized rewards are
where (
)
Note that the normalization preserves the signs of the rewards such that the negative rewards which are the payoffs remain negative and positive rewards remain positive.
The conducted normalization does not change the structure of the game. Together with that the values and provide the best rewards or the worst payoffs from which usually start the judgements aimed on better decisions.
The next normalization transforms the rewards
and
to nonnegative. For convenience, we apply the inverse generator functions such that the resulting matrices
include the values (
)
where
and
are inverse generator functions of the uninorm and absorbing norm, respectively.
Following the probabilistic interpretation of the uninorm and absorbing norm [
27], the values
and
,
, are the probabilities that the normalized rewards are at maximum
and
, correspondingly. Hence, the normalized values
and
,
, can be interpreted as subjective believes of the players in the equitable rewards.
Such interpretation follows the line of Ramsey interpretation of probabilities [
3]. In terms of the Prisoners’ dilemma, since each of the prisoners is a criminal and knows about the crime, each of them completely believes that maximal payoff is justified, and less believes in the justification of the smaller payoffs.
The game with the reward matrices and is equivalent to the game with the reward matrices and , but in contrast to the values and , which are real rewards of the players, the values and are considered as subjective beliefs of the players to obtain the corresponding rewards and .
To define the choice of the players’ strategies we assume that the relation of the player to the own belief to obtain certain reward differs from the relation to the belief of the opponent to obtain this reward. We consider the beliefs as the arguments of the operators and in the algebra . The resulting values are the trusts and of the players in their strategies based on the beliefs and , .
The trust matrices are defined by the absorbing norm as follows
where (
)
Such definition assumes that the players act as opponents and implements their tit-for-tat relations. Each player considers the own belief and the belief of the opponent and forms the aggregated trust with the stress on the own belief.
The choice of the strategy is conducted using the uninorm, which aggregates the trusts of the players in their strategies. The vectors of the aggregation results are
where
Note that in the last aggregations each player considers the own trusts and aggregates them for each strategy.
Finally, the strategy chosen by each player is the strategy for which the aggregated trusts reach their maximum (ties are broken randomly)
By the equation (19) the strategies are defined by the indices such that the meaning of each strategy is specified by the game formulation that is to keep silence or to testify.
3.3. Example of the Prisoners’ Dilemma
To clarify the presented above solution let us consider the Prisoners’ dilemma with the payoffs
,
,
and
. The payoff matrix of this game is
and the reward matrices of the players are
Maximal absolute reward in both matrices is
; hence, the normalized rewards are
To define the players’ beliefs, assume that the uninorm and absorbing norm are defined by the same generator function
with the parameter
. Consequently, the inverse function is
The left-side and right-side values of the parameters are defined by the linear transform
Let
; then
and
which satisfy the values of the subjective false and subjective truth [
31]. Then, the beliefs matrices defined by the equations (12) and (13) are
Analysis of these matrices together with the payoff matrices and shows that subjectively each player is nearly sure that the payoff will be years served in prison, less sure that the payoff will be years, nearly unsure that the payoff will be year and unsure that the payoff will be . Note that both the payoffs and the beliefs are defined separately for each player.
Now let us calculate the trusts of each player, which depend both on the own belief and the belief of the other player. Applying the absorbing norm with the generator function (20) and its inverse function (21) with the parameters
,
and
, we obtain
The trusts aggregated by the uninorm with the same generator function and the parameters are
As a result, each player chooses the second strategy – to testify
which coincides with the indicated above the Nash equilibrium that is not optimal.
4. Two Other Examples
Let us consider the other examples of the bi-matrix games. Below, we define the matrix of the game and further calculations without additional comments.
The battle of sexes [
4]. In the game, the players choose which concert to attend – Stravinsky (strategy
) or Bach (strategy
). The first player prefers the concert of Bach (strategy
), and the second – the concert of Stravinsky (strategy
), and both prefer to attend any concert together.
The reward matrices of the players are
which result in the beliefs matrices
and the trust matrices
Then, the aggregated trusts are
and resulting strategies are
as it was declared.
The zero-sum game. In this abstract game we assume that the reward matrices of the players are
Then, the beliefs matrices
and the aggregated trusts are
Then, resulting strategies are
The presented examples demonstrate that the suggested method correctly specifies the strategies of the players in the cases of the decisions, which sound irrational. In other words, it demonstrates the rationality of irrational choices of the players and can be used for explanation of the made decisions and for forecasting subjective decisions, which will be made in the future.
5. Discussion
The goal of the paper is to clarify the principles of decision making in situations where the choices of the agents do not follow usual principles of rationality. We suggest to use recently developed non-commutative operators of multi-valued logic algebra in the decision-making with irrational decisions. We apply these operators for specification of the strategies in the well-known two Prisoners’ dilemma game.
The used uninorm and absorbing norm operators aggregate the subjective beliefs of the players to obtain certain rewards such that the arguments of the aggregators have different influence on the resulting value. In certain sense such aggregation of the beliefs follows a line of using the utility function [
7]. However, in contrast, to the utility function, which is defined arbitrary, the suggested aggregators are the part of formally defined logic algebra and are related with the probability distributions that allows their consideration in wider and, at the same time, more formal framework.
The presented procedure starts with specification of players’ beliefs, which are based on the normalized rewards. Here we use maximal absolute rewards (see equations (8) and (9)). The other possibility is to use the sums and of the absolute rewards and to define , which is more natural from the probabilistic point of view, but is hardly interpreted in the considered framework.
Also, instead of defining beliefs using the inverse generator functions (see equation (13)), simple formulas and , , can be used. However, despite formal correctness, the use of such formulars can be hardly interpreted. Since inverse generator functions are probability density functions, they specify the probabilities of the appropriate events which are the levels of knowledge or beliefs of the players, while the indicated formulas have not such interpretation.
Note that the same simple formulars are used in the definition of the left-side and right-side values of the parameters (see equation (22)), and since here the interpretation is not required, the use of such formulas is justified.
The considered example of Prisoners’ dilemma and additional two games demonstrate that the suggested method results in the strategies which are chosen by the players. Such verification, certainly, does not provide complete justification or proof of the method, but explains the choices and confirms the asymmetry in the consideration of their own rewards and the rewards of the opponents.
6. Conclusion
In the paper, we suggest a method of decision-making under uncertainty which resolves an observed irrationality of the judgements. The method is applied to the one-step games of two players where it successfully predicts the players’ choices.
The method utilizes asymmetry in the relation of the player to the own reward and the reward of the opponent that is formalized using the non-commutative operators of multivalued logic algebra.
The obtained results explain the appearing irrationality in the players’ judgements and demonstrate the rationality of irrational choices.
References
- Bool, G. An Investigation of the Laws of Thought, on Which are Founded the Mathematical Theories of Logic and Probabilities; Walton and Maberly: London, UK, 1854. [Google Scholar]
- Birkhoff, G.; von Neumann, J. The logic of quantum mechanics. Annals of Mathematics 1936, 37, 823–843. [Google Scholar] [CrossRef]
- Ramsey, F.R. Truth and probability. In The Foundations of Mathematics and other Logical Essays; 1926; pp. 156–198. [Google Scholar]
- Luce, R.D.; Raiffa, H. Games and Decisions; John Wiley & Sons: New York, 1957. [Google Scholar]
- Kahneman, D.; Tversky, A. Prospect theory: An analysis of decision under risk. Econometrica 1979, 47, 263–292. [Google Scholar] [CrossRef]
- Ruggeri, K.; Ali, S.; Berge, M.L.; Bertoldo, G.; Bjørndal, L.D.; Cortijos-Bernabeu, A.; Davison, C.; Demić, E.; Esteban-Serna, C.; Friedemann, M.; et al. Replicating patterns of prospect theory for decision under risk. Nature Human Behavior 2020, 4, 622–633. [Google Scholar] [CrossRef]
- Friedman, M.; Savage, L. The utility analysis of choices involving risks. J. Political Economy 1948, 56, 279–304. [Google Scholar] [CrossRef]
- Wald, A. Statistical decision functions. The Annals of Mathematical Statistics 1949, 20, 165–205. [Google Scholar] [CrossRef]
- Nilsson, N.J. Probabilistic logic. Artificial Intelligence 1986, 28, 71–87. [Google Scholar] [CrossRef]
- Zadeh, L.A. Fuzzy sets. Information and Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
- Zadeh, L.A. Fuzzy logic and approximate reasoning. Synthese 1975, 30, 407–428. [Google Scholar] [CrossRef]
- Dubois, D.; Prade, H. Possibility Theory; Plenum: New York, NY, 1988. [Google Scholar]
- Łukasiewicz, J. On three-valued logic. Ruch Filozofia 1920, 5, 169–171. [Google Scholar]
- Łukasiewicz, J.; Tarski, A. Untersuchungen über den Aussagenkalkül. Comptes Rendus des Séances de la Société des Sciences et des Lettres de Varsovie Classe III 1930, 23, 30–50. [Google Scholar]
- Lambek, J. The mathematics of sentence structure. American Mathematical Monthly 1958, 65, 154–170. [Google Scholar] [CrossRef]
- Schmerling, S. Asymmetric conjunction and rules of conversation. In Syntax and Semantics. Speech Acts; Cole, P., Morgan, J., Eds.; Academic Press: New York, NY, USA, 1975; Volume 3, pp. 211–231. [Google Scholar]
- Na, Y.; Huck, G. On extracting from asymmetrical structures. In The Joy of Grammar: A Festschrift in Honor of James; McCawley, D., Brentari, D., Larson, G., MacLeod, L., Eds.; John Benjamins: Amsterdam, The Netherlands, 1992; pp. 119–136. [Google Scholar]
- Yager, R.; Rybalov, A. Non-commutative self-identity aggregation. Fuzzy Sets Syst. 1997, 85, 73–82. [Google Scholar] [CrossRef]
- Ciungu, L. Non-Commutative Multi-Valued Logic Algebras; Springer: Cham, Switzerland; Heidelberg, Germany, 2014. [Google Scholar]
- Fodor, J.; De Baets, B.; Perny, P. (Eds.) Preferences and Decisions under Incomplete Knowledge; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
- Greco, S.; Pereira, R.; Squillante, M.; Yager, R.; Kacprzyk, J. (Eds.) Preferences and Decisions. Models and Applications; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Kagan, E.; Novoselsky, A.; Ramon, D.; Rybalov, A. Non-Commutative logic for collective decision-making with perception bias. Robotics 2023, 12, 76. [Google Scholar] [CrossRef]
- Axelrod, R. The Evolution of Cooperation; Basic Books: NY, 1984. [Google Scholar]
- Rapoport, A. Strategy and Conscience; Harper and Row: NY, 1964. [Google Scholar]
- Yager, R.; Rybalov, A. Uninorm aggregation operators. Fuzzy Sets and Systems 1996, 80, 111–120. [Google Scholar] [CrossRef]
- Batyrshin, I.; Kaynak, O.; Rudas, I. Fuzzy modeling based on generalized conjunction operations. IEEE Trans. Fuzzy Systems 2002, 10, 678–683. [Google Scholar] [CrossRef]
- Kagan, E.; Rybalov, A.; Siegelmann, H.; Yager, R. Probability-generated aggregators. Int. J. Intelligent Systems 2013, 28, 709–727. [Google Scholar] [CrossRef]
- Fodor, J.; Rudas, I.; Bede, B. Uninorms and absorbing norms with applications to image processing. In Proceedings of the Information Conference SISY, 4th Serbian-Hungarian Joint Symposium on Intelligent Systems, Subotica, Serbia, 29–30 September 2006; pp. 59–72. [Google Scholar]
- Fodor, J.; Yager, R.; Rybalov, A. Structure of uninorms. Int. J. Uncertainty, Fuzziness and Knowledge-Based Systems 1997, 5, 411–427. [Google Scholar] [CrossRef]
- Owen, G. Game Theory; Academic Press: San Diego, CA, 1995. [Google Scholar]
- Kagan, E.; Rybalov, A.; Yager, R. Subjective Markov process with fuzzy aggregations. In Proceedings of the 12th International Conference Agents and Artificial Intelligence ICAART 2020, Valetta, Malta, 22–24 February 2020; Volume 2, pp. 386–394. [Google Scholar]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).