Asymmetric Probabilistic Tsetlin Machine for Pattern Recognition

Preprint

Article

Asymmetric Probabilistic Tsetlin Machine for Pattern Recognition

Altmetrics

Downloads

123

Views

Comments

Negar Elmisadr^*,Mohamed Bachir Belaid,

Anis Yazidi^*

Negar Elmisadr^*,Mohamed Bachir Belaid,

Anis Yazidi^*

This version is not peer-reviewed

Submitted:

11 September 2023

Posted:

12 September 2023

You are already at the latest version

Alerts

Abstract

This article introduces a novel approach, termed the Asymmetric Probabilistic Tsetlin (APT) Machine, which incorporates the Stochastic Point Location (SPL) algorithm with the Asymmetric Steps technique into the Tsetlin Machine (TM). APT introduces stochasticity into the state transitions of Tsetlin Automata (TA) by leveraging the SPL algorithm, thereby enhancing pattern recognition capabilities. To enhance random search processes, we introduced a decaying normal distribution into the procedure. Meanwhile, the Asymmetric Steps approach biases state transition probabilities towards specific input patterns, further elevating operational efficiency. The efficacy of the proposed approach is assessed across diverse benchmark datasets for classification tasks. The performance of APT is compared with traditional machine learning algorithms and other Tsetlin Machine models, including the Asymmetric Tsetlin (AT) Machine, characterized by deterministic rules for Asymmetric transitions, and the Classical Tsetlin (CT) Machine, employing deterministic rules for symmetric transitions. Strikingly, the introduced APT methodology demonstrates highly competitive outcomes compared to established machine learning methods. Notably, both APT and AT exhibit state-of-the-art performance, surpassing the Classical Tsetlin Machine, emphasizing the efficacy of asymmetric models for achieving superior outcomes. Remarkably, APT exhibits even better performance than AT, particularly in handling complex datasets.

Keywords:

Subject: Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Artificial Intelligence (AI) and Machine Learning (ML) techniques have transformed the field of pattern classification. Pattern classification involves the process of assigning a label or category to an input based on its features. This has numerous applications in computer vision, speech recognition, natural language processing, and more [1]. Machine learning algorithms are well-suited for pattern classification tasks, as they can learn to recognize patterns in data. This is achieved by analyzing large amounts of training data and extracting features that are relevant to the classification task. Once the features have been extracted, the machine learning algorithm can use them to make predictions about new, unseen data [2]. The ability of machine learning algorithms to learn from data and make predictions has led to numerous applications in various domains, and the continued development of machine learning algorithms is expected to advance the field of pattern classification further. The introduction of learning automata has further enhanced the capabilities of AI and ML in pattern classification [3]. Learning automata are a type of artificial intelligence that can learn and adapt to their environment. They can make decisions based on feedback from their environment and adjust their behavior accordingly. In the context of classification tasks, learning automata can be used to optimize the parameters of a classification model based on feedback from the data, improving its performance over time [4].

One type of learning automata is the Tsetlin Automaton(TA), which Michael Tsetlin first introduced in the 1960s [5]. The Tsetlin Automata are specifically designed for pattern recognition tasks and can effectively classify input data by identifying inherent patterns within the data. These automata function by deconstructing the patterns into distinct components and then employing a set of automata to accurately recognize each of these components [6]. Tsetlin Automata are inherently interpretable, which means that it is possible to understand how the classification decision was made [7,8]. Several studies have explored the use of Tsetlin automata for pattern recognition tasks. A literature review reveals that Tsetlin automata have shown promising results in various applications, including image classification, text classification, and speech recognition. Granmo, et al [9] , evaluated the performance of Tsetlin automata in image classification tasks. The authors compared the results of Tsetlin automata with other classification algorithms, including artificial neural networks and support vector machines. They found that Tsetlin automata performed competitively, with the other algorithms. Bhattari, et al [10], explored the use of Tsetlin automata for text classification tasks. The authors proposed a modified version of Tsetlin automata that could handle text data directly. They evaluated the performance of their proposed algorithm on several benchmark datasets and found that it outperformed several other classification algorithms. In a different application, Bakar et al. [11], used Tsetlin automata for speech recognition tasks. The authors proposed a hybrid approach that combined Tsetlin automata with deep learning models. They evaluated their approach on a speech recognition dataset and found that it achieved competitive accuracy rates compared to traditional deep learning models. Additionally, several studies have explored the use of different types of reinforcement learning techniques to improve the learning process of TA. Rahimi et al. [12] introduced a reinforcement learning approach to TA that improved its performance on several benchmark datasets. However, these approaches have limitations when dealing with complex patterns in noisy and high-dimensional data. The Tsetlin Machine(TM) is an extension of the Tsetlin Automaton and a recent addition to the field of TA. The TM is a rule-based machine learning that extracts human-interpretable patterns from data using propositional reasoning. It is specifically designed for binary classification tasks and has exhibited promising results in handling complex pattern recognition challenges [13,14,15]. Furthermore, the Tsetlin machine is typically trained using a different approach called the Tsetlin Automata Learning (TAL) algorithm. This algorithm possesses the capability to learn from both positive and negative feedback, rendering it highly compatible with reinforcement learning endeavors. In reinforcement learning, the agent learns to interact with the environment and take the reward over time by selecting actions based on its current state [16]. Deterministic policies always select the same action in a given state, potentially leading to suboptimal behavior, if the agent is unable to find the global optimum. Whereas stochastic policies select actions based on a probability distribution over possible actions. Furthermore, in many real-world environments, the outcomes of actions are inherently stochastic or uncertain, and a deterministic policy may not be suitable.

Stochastic policies allow the agent to account for this uncertainty and adapt to changes in the environment, making them more robust and flexible [17,18]. Stochastic point location (SPL) is a technique used in reinforcement learning to explore action spaces efficiently. SPL is a computational algorithm that helps an agent to locate a point on a line in an environment. It involves randomly selecting a point within the action space and taking an action based on that point. Repetition of this process multiple times enables the agent to explore different regions of the action space while also exploiting the known regions, striking a balance between exploration and exploitation [19,20]. It is designed to help the agent efficiently search for the optimal action by reducing the number of computational resources required for exploration [21]. By improving exploration efficiency, SPL can help reinforcement learning agents quickly find optimal solutions to complex problems. In general, the SPL algorithm is a probabilistic search algorithm that can be widely used in machine learning tasks. SPL can be used to generate features or identify important regions of an image or object, which can then be used for downstream tasks such as classification or segmentation [22,23,24]. In more detail, Haran et al. [25] present an efficient algorithm for locating a query point in a planar subdivision, such as a Voronoi diagram or Delaunay triangulation. The algorithm is based on a stochastic point location (SPL) technique, which involves randomly walking from a starting point to a point in the subdivision that is close enough to the query point. Granmo and Oommen propose an approach based on SPL to address resource allocation problems in noisy environments in their work [26]. This method calculates the probability of polling a resource from two available options at each time step. In the field of cybersecurity, SPL is employed to represent deviations from expected behaviors, facilitating the efficient detection of network anomalies, as described in [27]. SPL has been modified in [28] to align with the Random Walk-based Algorithm. The hierarchical SPL scheme has been broadened and made more general in [29]. Grady et al.[30] present a fast algorithm for segmenting images using random walks. The algorithm uses a graph representation of the image, where each pixel is a node, and the edges connect neighboring pixels. A random walk process is used to propagate labels from a seed point to the rest of the image, and the resulting label probabilities are used to segment the image. Tong et al. [31] present a technique for identifying "center-piece" subgraphs in large networks. These subgraphs represent the most influential or central nodes in the network and are useful for tasks such as community detection and recommendation. The technique is based on a random walk process, where a random walker starts from a random node and moves around the network with the goal of visiting nodes that have high centrality.

Yazidi et al. [32], proposed a new variant of SPL, to find the root of a stochastic equation, where the equation may have one or more unknowns and the solution is found through a process of iterative search and pruning using an adaptive d-ary search approach. In their approach, they consider a non-constant probability and non-symmetry responses of the "environment", which makes a more accurate and efficient approach to stochastic root-finding problems than in previous cases where probability was assumed to be constant and symmetric.

Within this research paper, the main motivation is to delve into the application of an asymmetric and probabilistic approach to the Tsetlin Automata, in a pursuit to enhance pattern recognition capabilities within the Tsetlin machine. This study is fundamentally driven to answer the following research questions:

Does integrating the stochastic point location (SPL) algorithm enhance the efficiency and accuracy of Tsetlin Automata?
In what way does the probabilistic and asymmetric state transition approach enhance Tsetlin Automata’s ability to recognize complex patterns within high-dimensional and noisy data?

The approach of the Asymmetric Probabilistic Tsetlin Machine (APT) represents a substantial enhancement to the automata’s prowess in tackling the intricacies of recognizing multifaceted patterns within data of high dimensionality and noise. Conventional Tsetlin Automata traditionally adhere to deterministic state transition rules, a constraint that can impede their ability to accurately categorize intricate data that is rife with noise. By skillfully incorporating the asymmetric stochastic point location algorithm into the Tsetlin Automata framework, an innovative probabilistic and asymmetric facet is infused into the state transition process. This infusion, in turn, empowers more efficacious and comprehensive exploration of the expansive landscape of patterns.

Employing the asymmetric approach proves especially advantageous in scenarios where the significance of specific input patterns supersedes others, thus necessitating prioritized recognition by the automata. By tilting the state transition probabilities in favor of these pivotal input patterns, the automata acquires the capacity to efficiently recognize and classify them within the dataset. In parallel, the probabilistic angle equips the automata to traverse the pattern space more adeptly by assimilating insights from environmental feedback. This adaptive mechanism facilitates dynamic adjustments in state transition probabilities, enabling the automata to learn and seamlessly adapt to the evolving patterns within the dataset.

The paper is organized as follows: Section 2 provides a comprehensive explanation of the Materials and Methods, encompassing the underlying theory of Tsetlin Automata (TA) and Tsetlin Machine (TM), including their architecture and learning process. Additionally, it delves into the approach adopted in this study, which involves incorporating an asymmetric and stochastic method, integrated with the SPL algorithm, into Tsetlin Automata to facilitate state transition. Section 3 presents the evaluation results obtained using the proposed approach on different datasets. Finally, in Section 4, we summarize the conclusions from the research and highlight the key contributions of the proposed approach.

2. Materials and Methods

2.1. Asymmetric Stochastic Point Location

The Stochastic Point Location(SPL) problem, sometimes called Stochastic Search on the Line, involves a learning mechanism (LM) to find the optimal value of a parameter

λ

. We assume an unknown optimal choice

λ^{*}

and aim to study the learning process. The LM tries to determine

λ

with feedback from an intelligent environment (E), indicating if

λ

is too small or too large. The environment’s responses are stochastic. It might give incorrect feedback, suggesting changes opposite to what’s needed. The probability of getting correct feedback must be > 0.5 (i.e.,

p > 0.5

) for

λ

to converge to

λ^{*}

. This probability is the environment’s "effectiveness." When

λ

is less than

λ^{*}

, the environment correctly suggests raising

λ

with probability p, but might wrongly suggest lowering it with probability (1 - p). Similarly, when

λ

is greater than

λ

, the environment may suggest lowering

λ

with probability p and raising it with probability (1 - p). Further details of the SPL algorithm can be found in [33].

The Asymmetric-SPL method presents a variation of the Stochastic Point Location (SPL) approach, introducing an asymmetric transition rule that allows for distinct step sizes when moving in the right and left directions. This asymmetry is quantified by two integers, denoted as ’a’ and ’b’, which signify the number of steps taken when moving right and left, respectively, thereby capturing the directional bias inherent in the environment.

Let us denote the probability of moving to the right as p(

λ (n)

), and correspondingly, the probability of moving to the left as 1 - p(

λ (n)

). The update rule governing these transitions is formulated as follows:

For a suggested movement of ’a’ steps to the right by the environment E with a probability of p(

λ (n)

), the parameter

λ

at time step n+1 is updated as:

λ (n + 1) \leftarrow λ (n) + a

(1)

Conversely, when E suggests ’b’ steps to the left with a probability of 1 - p(

λ (n)

), the update for

λ

becomes:

λ (n + 1) \leftarrow λ (n) - b

(2)

When a transition occurs with a probability of p to move towards the right, the anticipated number of rightward steps over numerous transitions is calculated as

a * p

. This value is derived from the average quantity of steps moved rightward per transition. By multiplying this value with the total count of transitions, the projected number of rightward steps can be approximated. Similarly, when a transition takes place with a probability of

1 - p

to move towards the left, the anticipated number of leftward steps following a substantial number of transitions becomes

b * (1 - p)

. This corresponds to the mean number of steps moved leftward per transition. By multiplying this value with the total transition count, the projected number of leftward steps can be estimated.

Given the distinct probabilities of moving right and left at the optimal point within the Asymmetric-SPL algorithm, it is essential to ensure equilibrium by making the total number of rightward steps equal to the total number of leftward steps. This is necessary due to the prevailing directional preference or bias in the environment, causing the system to exhibit a greater inclination towards one direction over the other.

To satisfy this equilibrium condition, we can establish the following relationship:

a * p = b * (1 - p) .

(3)

This equation signifies that the projected number of steps moved towards the right should be equivalent to the projected number of steps moved towards the left, thus ensuring a harmoniously balanced system.

2.2. Asymmetric Probabilistic Tsetlin automata

The schematic of the Asymmetric transition between the states of Tsetlin automata is drawn in Figure 1.

The illustration depicts a two-action Tsetlin Automaton (TA) characterized by

2 N

memory states, where N is a variable spanning the range

[1, \infty)

. Within the state range from 1 to N, situated on the left-hand side of the state space, the TA selects Action 1 (Exclude). Conversely, when the system resides in the state range from N + 1 to

2 N^{″}

, positioned on the right side of the state space, the TA’s choice is Action 2 (Include). In each interaction with the environment, the TA executes one of the available actions.

By using the asymmetric SPL method to change the states of the Tsetlin Automata, the estimated form provided by the algorithm would be used to determine the action taken by the Tsetlin Automata. Once the action is taken, the Tsetlin Automata receives feedback as a reward or penalty, and its state is updated accordingly. If the TA gets a penalty, it will move "b" steps towards the opposite side of the current action. On the contrary, if the TA receives a reward as a response, it will switch to a “deeper” state by moving "a" steps to the left or right end of the state chain, depending on whether the current action is Action 1 or Action 2.

2.3. Asymmetric Probabilistic Tsetlin Machine

The Tsetlin Machine(TM), consists of multiple Tsetlin automata that are organized in layers. Figure 2, shows the architecture of TM, conceptually decomposed into five layers to recognize subpatterns in the data and categorize them into classes [34]. The function of each of these layers is described in the following.

Input Layer

The first layer of the Tsetlin Machine consists of a set of binary features or input variables, which are represented as literals (positive statements) and negated literals (negative statements). The input variables can be represented as follows:

Let

X = x_{1}, x_{2}, x_{3}, \dots, x_{n}

be a set of binary input variables n, where xi can take the values 0 or 1. Each input variable xi can be represented as a literal or a negated literal as follows:

Literal:

x_{i}

, Negated Literal:

\neg x_{i}

Collectively, the elements in the augmented feature vector are as follows:

L = [x_{1}, x_{2}, \dots, x_{n}, \neg x_{1}, \neg x_{n}, \dots, \neg x_{n}] = [l_{1}, l_{2}, \dots, l_{2 o}]

. These literals and negated literals can be combined using logical operators such as AND, OR, and NOT to form logical expressions. The first layer uses these logical expressions as clauses to represent patterns in the data.

Clauses Layer

There are m conjunctive clauses that capture the subpatterns associated with classes 1 and 0. The value m is set by the user. Accordingly, given that more complex problems might require larger m. All the clauses receive the same augmented feature vector L, assembled at the input layer. However, to perform the conjunction, only a fraction of the literals are selected, with the conjunction performed as follows:

c_{j} = \underset{k \in I j}{⋀} l_{k}

(4)

Notice how the composition of a clause varies from another clause depending on the indexes of the included literals in the set

I_{j} \subseteq 1, \dots, 2 o

State Layer

The State Layer in the Tsetlin machine architecture stores the states of the clauses in the memory matrix and updates those states while processing an input pattern. The state update rule depends on the input pattern and the bias parameters associated with each clause, which can be learned during training.

clause output layer

Once the TA decisions are available, the output of the clause can be easily calculated. Since clauses are conjunctive, a single literal of value 0 is enough to convert the output of the clause to 0 if its corresponding TA has decided to include it in the clause. For ease of understanding, we introduce the set

I_{x}^{1}

, which contains the indexes of the literals of value 1. Then, the output of clause j can be expressed as:

c_{j} = \{\begin{matrix} 1, & if I_{j} \subseteq I_{x}^{1} \\ 0, & otherwise \end{matrix}

(5)

The clause outputs, computed as above, are now stored in the vector

C

, i.e.,

C = (c_{j}) \in {0, 1}^{m}

Classification layer

The TM structure is shown in Figure 2—Classifies data into two classes. Therefore, the subpatterns associated with each class have to be learned separately. For this purpose, the clauses are divided into two groups, where one group learns the subpattern of class 1 while the other learns the subpatterns of class 0. For simplicity, Clauses with odd indexes are assigned with positive polarity (

c_{j}^{+}

), and they are supposed to capture subpatterns of output

y = 1

. Clauses with even index, on the other hand, are assigned with negative polarity (

c_{j}^{-}

), and they are supposed to capture subpatterns of output

y = 0

-the clauses which recognize subpattern output 1. We need to sum each class’s clause outputs and assign the sample to the class with the highest sum. A higher sum means that more sub-patterns have been identified from the designated class, and the input sample has a higher chance of being of that class. Hence, with v being the difference in clause output.

v = \sum_{j} c_{j}^{+} - \sum_{j} c_{j}^{-}

(6)

The output of the TM is decided as follows:

y \hat{} = \{\begin{matrix} 1, v \geq 0 \\ 0, v < 0 \end{matrix}

(7)

2.3.1. Learning Procedure

to learn which literals to include is based on two types of reinforcement: Type I and Type II.

Type I feedback produces frequent patterns, while Type II feedback increases the discrimination power of the patterns. Which is described in detail in the following.

Type I Feedback: Reduce False Negatives

Type I feedback is formulated to enhance the true positive outputs of clauses while mitigating false negative outputs. In order to bolster a clause’s true positive output (where the clause output should be 1), Type I Feedback reinforces the "Include" actions of Tsetlin Automata (TAs) corresponding to literal values of 1. Concurrently, within the same clause, Type I Feedback amplifies the "Exclude" actions of TAs linked to literal values of 0. To address instances of incorrect negative clause outputs (where the clause output should be 0), a gradual erasure of the currently identified pattern is initiated. This is executed by intensifying the "Exclude" actions of TAs, irrespective of their corresponding literal values. As a result, clauses with positive polarity necessitate Type I feedback when

y = 1

, while clauses with negative polarity demand Type I feedback when

y = 0

The classical Tsetlin machine incorporates an inaction probability, indicating the likelihood of Tsetlin Automata remaining in their present states. To achieve an asymmetric transition, crucial for enhanced performance, the inaction probability is replaced by reward or penalty probabilities, depending on which one is more likely. the elimination of the inaction probability introduces an asymmetry in the transition probabilities. Consequently, the probability of transitioning from one state to another becomes imbalanced. This imbalance stems from the distinction in transitioning likelihoods based on whether an action encompasses or excludes a specific clause. The result is a dynamic shift in the machine’s behavior, impacting the system’s responsiveness and adaptability during its operation. Table 1 displays the modified feedback type I table with the necessary adjustments for achieving asymmetric transitions.

"S" is a user-defined parameter that governs the granularity of the clauses’ capturing ability and their production rate.

transition probability

The behavior of the Tsetlin Automata in response to input features is significantly influenced by the transition probabilities between its states, which play a critical role. These probabilities hinge on various factors, including the current state of the Tsetlin Automata, the literal value, and the current clause value. However, there is a potential vulnerability in Tsetlin machines where they might become trapped in suboptimal states if their exploration of the state space is insufficient.

TO mitigate this issue, the introduction of randomness into state transitions emerges as a viable solution. This adjustment allows the Tsetlin machine to traverse through diverse states, thereby preventing it from becoming trapped in unfavorable states due to inadequate exploration. The incorporation of randomness is achieved through the utilization of a standard normal (standard Gaussian) random variable denoted as

ϵ

. This variable possesses an average of zero and its standard deviation diminishes as epochs progress, a trend determined by the exponential function.

ϵ \sim N (0, σ (e_{i})), where σ (e_{i}) = σ (0) e x p (- d . e_{i})

(8)

where

σ (0)

represents the initial standard deviation, d stands for the rate of decay, and

e_{i}

corresponds to the ith epoch. This formulation signifies that the standard deviation of randomness decreases over epochs, resulting in a reduction of randomness in state transitions over time. Consequently, the Tsetlin machine gradually shifts its focus from exploration towards exploiting the optimal state.

Through the incorporation of this decaying standard deviation-based randomness, the Tsetlin machine effectively balances its exploration and exploitation tendencies. This equilibrium is achieved by introducing variability that is inversely proportional to the progression of epochs, thus adapting the machine’s behavior over time.

As a result of incorporating this randomness search mechanism, the updated probabilities for both reward and penalty can be expressed as:

\begin{matrix} P_{n e w} (r e w a r d) = p (r e w a r d) + N (0, e x p (- d . e_{i}) \\ P_{n e w} (p e n a l t y) = p (p e n a l t y) + N (0, e x p (- d . e_{i}) \end{matrix}

(9)

Engaging in a meticulous comparison of likelihood probabilities allows us to extract precise, quantitative insights regarding the relative occurrences of specific events. This analytical method not only empowers us to base our decisions and assessments on accurate probability values but also facilitates the establishment of a comprehensive understanding of the dynamics between these events. To achieve a precise assessment of whether the reward supersedes the penalty (i.e.,

P (r e w a r d > p e n a l t y)

), or inversely, the likelihood of the penalty outweighing the reward (i.e.,

P (p e n a l t y > r e w a r d)

), we adopt a well-defined methodology. This approach, which takes the actual probability values into consideration, gains significant relevance when our goal is to comprehensively grasp the interplay between these events. This comprehensive assessment is achieved by following a specific procedure that involves statistical measures, notably the utilization of the cumulative distribution function (CDF) for the normal distribution

Upon observation from Table 1, it becomes apparent that we have precisely two distinct values for the probabilities, namely

(s - 1) / s

and

1 / s

. Let’s define the variables as follows:

\begin{matrix} X = (s - 1) / s + N (0, e x p (- d . e_{i})) \\ Y = 1 / s + N (0, e x p (- d . e_{i})) . \end{matrix}

To calculate the p(

X > Y

), we can leverage the property that the difference between two independent normal random variables follows a normal distribution. This allows us to easily calculate its mean and variance based on the means and variances of the original random variables. The mean and variance of X are:

\begin{matrix} m e a n (X) = (s - 1) / s + m e a n (N (0, e x p (- d * e_{i}))) \\ = (s - 1) / \\ v a r (X) = v a r (N (0, e x p (- d . e i))) = e x p (- 2 d . e_{i}) \end{matrix}

Similarly, the mean and variance of Y are:

\begin{matrix} m e a n (Y) = 1 / s + m e a n (N (0, e x p (- d * e_{i}))) = 1 / s \\ v a r (Y) = v a r (N (0, e x p (- d . e_{i}))) = e x p (- 2 d . e_{i}) \end{matrix}

Now, we can find the mean and variance of the difference Z = X - Y as follows:

\begin{matrix} m e a n (Z) = m e a n (X) - m e a n (Y) = (s - 1) / s - 1 / s \\ = (s - 2) / s \\ v a r (Z) = v a r (X) + v a r (Y) = 2 e x p (- 2 d . e_{i}) \end{matrix}

Therefore, Z is a normal random variable with mean(z) and var(z).

Now, we can calculate the probability of the inequality as follows:

\begin{matrix} P (X - Y > 0) = P (Z > 0) \end{matrix}

Using the mean and variance of Z, we can standardize it by subtracting its mean and dividing by its standard deviation:

\begin{matrix} Z_{s t a n d a r d i z e d} = (Z - m e a n (Z)) / \sqrt{v a r (Z)} \end{matrix}

Substituting the values of mean(Z) and var(Z), we get:

\begin{matrix} Z_{s t a n d a r d i z e d} = (X - Y - (s - 2) / s / \sqrt{2 e x p (- 2 d . e_{i})} \end{matrix}

Now, we can rewrite the inequality as:

\begin{matrix} Z_{s t a n d a r d i z e d} > (s - 2) / s / \sqrt{2 e x p (- 2 d . e_{i})} \end{matrix}

Finally, we can use the cumulative distribution function (CDF) of the normal distribution to calculate the probability:

\begin{matrix} P (X - Y > 0) = P (Z > 0) \\ = P (Z_{standardized} > \frac{(s - 2) / s}{\sqrt{2 exp (- 2 d \cdot e_{i})}}) \\ = 1 - Φ (- \frac{(s - 2) / s}{\sqrt{2 exp (- 2 d \cdot e_{i})}}) \end{matrix}

Where

ϕ

is the CDF of the standard normal distribution. Hence, we can state the following:

P (\frac{s - 1}{s} + N (0, exp (- d \cdot e_{i})) > \frac{1}{s} + N (0, exp (- d \cdot e_{i}))) = 1 - Φ (- \frac{(s - 2) / s}{\sqrt{2 exp (- 2 d \cdot e_{i})}}) = α

(10)

The outcome of this computation delivers the precise probability of either the reward or penalty prevailing over the other. In this specific context, the parameter

α

serves as a gauge of the relative potency or dominance between the probabilities tied to the reward and penalty. This metric facilitates a probabilistic evaluation of the more probable event, furnishing indispensable insights for decision-making and the system’s state transitions.

With reference to Table 1 and in light of Equation (3), the state update protocol for Tsetlin Automata can be deduced by taking into account the feedback sourced from the environment and the consequent action associated with the respective clause. These protocols are subsequently consolidated in Figure 3.

In the Asymmetric Probabilistic Tsetlin Machine paradigm, the parameters "s," "a," and "b" hold paramount significance, as they intricately shape the system’s behavior. Parameter "s," often referred to as feedback strength, finely controls the feedback magnitude directed at each Tsetlin Automaton during the learning phase. This adjustment process reinforces favorable decisions while discouraging repetition of incorrect ones. Crucially, the value of "s" significantly impacts the magnitude of these adjustments. Opting for higher "s" values fosters accelerated learning but may introduce instability due to oscillatory behavior. Conversely, lower "s" values encourage a more gradual and stable learning process, albeit at a slower pace.

In this context, the interplay between "s," "a," and "b" becomes evident. The parameters "a" and "b" not only dictate the feedback strength but also influence the rate of transition among Tsetlin Automaton states. Their roles are intertwined, amplifying their collective importance. Recognizing the profound impact of these parameters on system performance, it is crucial to thoughtfully tailor "a" and "b" according to the specific problem and dataset at hand, aiming to achieve the optimal balance between exploration and exploitation.

To mitigate the challenge of parameter tuning and streamline the optimization process, we can link the number of transition steps to the variable "s." By setting the variable "a" to the largest integer less than or equal to s - 1, denoted as ⌊ s - 1 ⌋, its value can be determined.

Consequently, the value of the variable "b" can be computed using Equations (3). By doing so, this method effectively simplifies both the training and optimization phases within the system.

Type II feedback

Type II feedbac aims to reduce false positive clause outputs. It focuses on turning a clause output from 1 to 0 when it should be 0. This feedback type includes a literal of value 0 in the clause to achieve this. Clauses with positive polarity need Type II feedback when the desired output (y) is 0, and clauses with negative polarity need it when y is 1, as they do not want to vote for the opposite class.

Adaptations to the learning process, such as introducing asymmetric transition probabilities and implementing a decay-based standard deviation for randomness, are specifically targeted toward Type I feedback. These adjustments are essential to enhance performance by minimizing false negatives. In contrast, Type II feedback operates with a distinct objective and does not necessitate these particular modifications. In Type II feedback, the mere incorporation of a literal with a value of 0 within the clause proves adequate for achieving the desired reduction in false-positive clause outputs. As a result, the procedure outlined for Type II feedback can be summarized in Table 2.

Therefore, the estate transition rules follow the classical Tsetlin machine when excluding randomness and employing a single-step transition in both directions.

3. Results

In this section, the effectiveness of the Asymmetric Probabilistic Tsetlin (APT) Machine is evaluated by comparing it to various other Tsetlin machine models. Specifically, the evaluation encompasses the Asymmetric Tsetlin (AT) Machine, characterized by deterministic rules for asymmetric transitions, as well as the Classical Tsetlin Machine (CT), which employs deterministic rules for symmetric state transitions. This comparative analysis utilizes three widely recognized benchmark datasets: the Iris dataset, the Mushroom dataset, and the MNIST dataset.

Moreover, we extend our evaluation against several other typical machine learning algorithms, including support vector machine (SVM), decision tree (DT), random forest (RF), and K-Nearest Neighbours (K-NN). The reported results in this section are obtained from 100 independent experiment trials.Throughout the experiments, APT, AT, and CT were configured with identical settings for common parameters. Each trial set involved random partitioning of the data into training (80%) and testing (20%) sets. The source code for implementing the proposed algorithm is accessible at: https://github.com/elmisadr/APT_2023.

3.1. Iris Dataset

Iris Dataset1 is a widely used benchmark dataset in the field of machine learning, and it contains measurements of iris flowers from three different species: setosa, versicolor, and virginica. Specifically, the dataset includes the sepal length, sepal width, petal length, and petal width of 150 iris flowers, with 50 flowers from each species. To binarize the features, we can use a threshold value to convert the continuous numerical features into binary feature labels. Upon binarizing the features and encoding the class labels, a binary dataset is generated, ready to be employed as input for the Tsetlin Machine.

To achieve this, a threshold value of 0.5 is selected to convert the continuous numerical features into binary features, The Tsetlin Machine models are trained for 100 epochs, with 800 clauses and 4 features in the input data. A threshold value of 1 is used to determine the final output of the Tsetlin machine, and the s parameter is set to 4. Furthermore, the individual Tsetlin Automata each have 100 states. Figure 4 presents the performance of Asymmetric probabilistic Tsetlin (APT), Asymmetric Tsetlin without randomness (AT), and Classical Tsetlin (CT), on the iris dataset, for both the training and test data sets.

The figures show that the Asymmetric probabilistic Tsetlin (APT) and Asymmetric Tsetlin without randomness (AT) outperform the classical Tsetlin (CT) in terms of both accuracy and rate of convergence. The use of asymmetric transition in APT and AT leads to faster convergence and more accurate results. This is because the asymmetric transition allows for more effective exploitation of the information contained in the data. However, it is interesting to note that the Asymmetric Tsetlin without randomness (AT) is slightly more accurate compared to APT. This could be because the iris dataset used in the experiment is not very complex, and thus the additional randomness introduced by APT may not have as much of an impact.

In order to offer an extensive comparison of the Asymmetric Probabilistic Tsetlin Machine (APT) applied to the binarized iris dataset, we contrasted its performance against several established machine learning models. For SVM, we used a linear kernel with a regularization parameter of C=1. For DT, we set the maximum depth of the tree to 5 and the minimum number of samples required to split an internal node to 2. For RF, we used 100 decision trees and set the maximum depth of each tree to 5. For K-NN, we used k=5 nearest neighbors. Table 3 presents the results of the comparison.

The results indicate that APT demonstrates strong performance across most metrics, outperforming traditional methods such as DT, RF, and Classical TM. APT achieves an accuracy of 0.96, a precision of 0.85, a recall of 0.91, and an F1 score of 0.88. This suggests that APT can accurately classify instances with high confidence, correctly identify most positive instances, and achieve a good balance between precision and recall. Although the APT achieved a high level of performance in the classification task, it is worth noting that the AT showed slightly better accuracy, achieving the highest accuracy score among all classifiers, tied with KNN. Moreover, it is important to highlight that KNN and SVM classifiers also performed well in most of the evaluation metrics. KNN achieves accuracy of 0.97 and highest F1 score among all classifiers, while SVM achieves the highest recall. On the other hand, DT and RF have lower performance compared to the other classifiers in most metrics, which may be due to their tendency to overfit the training data.

3.2. Mushroom Dataset

The Mushroom dataset2 is a commonly used dataset for classification tasks. The goal is to predict whether a mushroom is edible or poisonous. The dataset consists of 8,124 instances, each with 22 features, describing various characteristics of the mushroom such as cap shape, cap color, gill color, and stalk surface, and is labeled such that 4,048 instances are edible (class 0) and 4,076 instances are poisonous (class 1). Many of these characteristics are categorical, meaning that they are not represented by numerical values. To use this data in a machine learning model, we need to convert these categorical values to a binary format. One popular method for doing this is called one-hot encoding.

The models are trained for 100 epochs, with 50 clauses and 22 features in the input data. The threshold value for determining the final output of the Tsetlin machine is set to 15, and the s-parameter is set to 5 . Furthermore, the individual Tsetlin Automata each has 300 states. The performance of the models are shown in Figure 5.

The results show that both the Asymmetric probabilistic Tsetlin (APT) and Asymmetric Tsetlin without randomness (AT) outperform the classical Tsetlin model. In comparison to the performance gap observed in the iris dataset, the difference in performance between APT and AT appears to be more competitive. Notably, APT exhibited better performance and outperformed AT. One possible explanation for this result is that the probabilistic transition used in APT is particularly effective for datasets with complex features, such as the mushroom dataset, where it may not be possible to define a simple deterministic rule for classification.

Table 4 presents a comparison of APT’s classification performance with that of other methods. In the case of the SVM, we employed a radial basis function kernel with a regularization parameter of

C = 1

. For DT, we set the maximum tree depth to 3 and the minimum number of samples required for node splitting to 2. In the case of RF, we constructed 100 decision trees, each with a maximum depth of 3. For K-NN, we used

k = 3

as the number of nearest neighbors.

Looking at the accuracy metric, APT outperforms SVM and DT, which have an accuracy of

0.84

and

0.82

, respectively. RF and KNN perform better than SVM and DT, but APT has a slightly higher accuracy of

0.88

compared to RF and KNN’s accuracy of

0.89

and

0.90

, respectively. Therefore, in terms of accuracy, APT is better than SVM and DT, and it performs slightly better than RF and KNN. Here, APT has the highest precision of

0.85

, followed by KNN and RF, both having a precision of

0.88

and

0.86

, respectively. SVM and DT have the lowest precision of

0.82

and

0.80

, respectively. Therefore, in terms of precision, APT is better than SVM and DT, and it performs similarly to KNN and RF. KNN and RF have the highest recall of

0.93

and

0.92

, respectively, followed by APT with a recall of

0.91

. SVM and DT have a recall of

0.88

and

0.87

, respectively, which is lower than the other methods. Therefore, in terms of recall, KNN and RF are better than APT, but APT still performs better than SVM and DT. APT has an F1 score of

0.88

, which is the same as its accuracy, and it is slightly better than SVM and DT’s F1 scores of

0.85

and

0.83

, respectively. KNN and RF have the highest F1 score of

0.90

. Therefore, in terms of the F1 score, KNN and RF are better than APT, while APT is better than SVM and DT. In general, APT performs well compared to the other classification methods, with high accuracy, precision, recall, and F1 score. However, KNN and RF perform slightly better than APT in some metrics, such as recall and F1 score.

3.3. MNIST Dataset

MNIST Dataset is a collection of handwritten digit images. We used TensorFlow, a popular Python library, to easily access and download the MNIST dataset. Each image in the dataset is 28 pixels wide and 28 pixels tall, and each pixel is represented by an integer value ranging from 0 to 255, indicating the grayscale intensity of the pixel. By applying a threshold to the pixel values, such that any pixel with a grayscale intensity above the threshold is set to 1, and any pixel with a grayscale intensity below the threshold is set to 0. This would result in a binary image where each pixel is either black (0) or white (1). The threshold value for this method can be chosen using various techniques such as trial and error, or, It is also possible to use techniques such as Otsu’s method to automatically calculate the threshold value based on the intensity distribution of the pixels in the entire dataset. We utilized Otsu’s method to automatically determine the threshold value for converting the continuous numerical features of the MNIST dataset into binary feature labels. Otsu’s method calculates the threshold that divides the image intensity histogram into foreground and background classes, minimizing the variance between the two classes.

After binarizing the images, we would need to preprocess the data further to create binary feature vectors that can be used as input to the Tsetlin machine. One common approach is to flatten the binary image into a one-dimensional vector, concatenating the rows or columns of the image into a single long vector. This would result in a feature vector with 28*28=784 binary values, corresponding to each pixel in the image.

In this implementation, the models were trained for 500 epochs, with 8000 clauses and 784 features in the input data. The threshold value for determining the final output of the Tsetlin machine was set to 800, and the s-parameter was set to 5. The individual Tsetlin Automata each has 256 states.

The performance evaluation results of the Tsetlin models are presented in Figure 6.

The results demonstrate that the Asymmetric probabilistic Tsetlin (APT) model outperforms both the Asymmetric Tsetlin without randomness (AT) and the classical Tsetlin (CT) models in the MNIST dataset. This is likely due to the fact that the APT model allows the model to make decisions based on a probability distribution, which is particularly effective in handling the high-dimensional and noisy features of the MNIST dataset. In contrast, the deterministic rule-based approach used in AT and CT may not be able to effectively capture the variations in the dataset. The results also show that the Asymmetric transition mechanism used in AT can lead to improved performance compared to the classical symmetric approach used in CT.

Table 5 presents a comparison of APT’s classification performance with other methods on the MNIST dataset. SVM employed a radial basis function kernel with a regularization parameter of C=1 and a gamma value of 0.1. For Decision Trees (DT), the tree depth was limited to 15, and node splitting required a minimum of 5 samples. Random Forest (RF) consisted of 100 decision trees, each with a maximum depth of 15. K-Nearest Neighbors (K-NN) considered 5 as the number of nearest neighbors.

Looking at the table, we can see that the proposed Asymmetric Probabilistic Tsetlin(APT) Machine has an accuracy score that is tied for the highest among all the classifiers, along with the SVM method. In terms of precision, the APT classifier has the second-highest score after SVM, with a precision score of

0.85

. This indicates that the APT classifier has a better ability to correctly classify positive instances, with fewer false positives than most of the other classifiers. For recall, the APT classifier has the highest score of all the classifiers, with a score of

0.91

. This means that the APT classifier is better at correctly identifying all relevant instances, with fewer false negatives. Finally, the F1-score, also indicates that the APT classifier performs well. It ranks second in F1-score, following closely behind SVM, with a score of

0.88

. The results suggest that the proposed APT classifier performs competitively with the best-performing models, particularly in terms of accuracy and recall. It also outperforms most of the classifiers in terms of precision and F1-score, indicating that it has the potential to be a strong performer for classification tasks on the MNIST dataset.

4. Conclusions

In this paper, we proposed a novel approach called the Asymmetric Probabilistic Tsetlin(APT) Machine that enhances the performance of Tsetlin Machine in managing complex pattern recognition tasks by incorporating the Asymmetric-SPL algorithm. We decided to establish the transition probability by considering the rewards and penalties derived from feedback Type I and Type II. In order to reinforce true positives more rapidly and accurately, we made necessary modifications to the rules of feedback Type I, enabling an asymmetric transition. Additionally, we incorporated a decaying normal distribution to enhance the random search aspect of the process. To minimize the number of hyperparameters requiring configuration, we have determined that it is best to set the transition steps based on the hyperparameter "s" in feedback Type I, which is used to calculate the probabilities of reward and penalty. Finally, we evaluated our approach using the three benchmark datasets that are commonly used for classification tasks: the Iris dataset, the Mushroom dataset, and the MNIST dataset. We conducted a comprehensive comparison of our approach with various traditional machine learning classifiers, as well as with the classical Tsetlin Machine and Asymmetric Tsetlin Machine without incorporating randomness. The evaluation criteria used for comparison included accuracy, F1 score, precision, and recall. In comparison among Tsetlin Machine models, the results demonstrate that the asymmetric models exhibit state-of-the-art performance when compared to the classical Tsetlin Machine. The inclusion of randomness specifically enables more efficient and effective exploration of the pattern space, thereby enhancing the Tsetlin Machine’s capability to recognize complex patterns within high-dimensional and noisy data. Moreover, The APT has been shown to produce highly competitive results when compared to traditional machine learning techniques. Based on the evaluation results, we can conclude that the incorporation of the Asymmetric-SPL algorithm with the Tsetlin Automata improves their performance in handling complex pattern recognition tasks. The proposed approach is flexible and can be applied to various classification tasks, making it a valuable addition to the field of machine learning and pattern recognition.

Author Contributions

In this work, N.E. designed the concept, conducted the simulations, and composed and refined the manuscript. Additionally, A.Y. and M.-B.B. provided technical and scientific guidance throughout the project, engaging in investigation, comprehensive review, and thorough manuscript revision. All co-authors have reviewed and agreed to the final version of the manuscript.

Funding

This work was accomplished through the AI Lab within the Department of Computer Science at Oslomet University.

Conflicts of Interest

The authors declare no conflict of interest.

References

Escobar, C.A.; Morales-Menendez, R. Machine learning techniques for quality control in high conformance manufacturing environment. Advances in Mechanical Engineering 2018, 10, 1687814018755519. [Google Scholar] [CrossRef]
Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN computer science 2021, 2, 160. [Google Scholar] [PubMed]
Barto, A.G.; Anandan, P. Pattern-recognizing stochastic learning automata. IEEE Transactions on Systems, Man, and Cybernetics 1985, 360–375. [Google Scholar] [CrossRef]
Guo, H.; Li, S.; Li, B.; Ma, Y.; Ren, X. A new learning automata-based pruning method to train deep neural networks. IEEE Internet of Things Journal 2017, 5, 3263–3269. [Google Scholar] [CrossRef]
Omslandseter, R.O.; Jiao, L.; Oommen, B.J. Enhancing the Speed of Hierarchical Learning Automata by Ordering the Actions-A Pioneering Approach. In Proceedings of the Australasian Joint Conference on Artificial Intelligence; 2022; pp. 775–788. [Google Scholar]
Granmo, O.-C.; Glimsdal, S.; Jiao, L.; Goodwin, M.; Omlin, C.W.; Berge, G.T. The convolutional Tsetlin machine. arXiv preprint arXiv:1905.09688 2019.
Granmo, O.-C. The Tsetlin Machine–A Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic. arXiv preprint arXiv:1804.01508 2018.
Abeyrathna, K.D.; Granmo, O.C.; Shafik, R.; Jiao, L.; Wheeldon, A.; Yakovlev, A.; Lei, J.; Goodwin, M. A multi-step finite-state automaton for arbitrarily deterministic Tsetlin Machine learning. Expert Systems 2021. [Google Scholar] [CrossRef]
Phoulady, A.; Granmo, O.-C.; Gorji, S.R.; Phoulady, H.A. The weighted tsetlin machine: compressed representations with weighted clauses. arXiv preprint arXiv:1911.12607 2019.
Bhattarai, B.; Granmo, O.-C.; Jiao, L. ConvTextTM: An explainable convolutional Tsetlin machine framework for text classification. In Proceedings of the Proceedings of the Thirteenth Language Resources and Evaluation Conference; 2022; pp. 3761–3770. [Google Scholar]
Bakar, A.; Rahman, T.; Shafik, R.; Kawsar, F.; Montanari, A. Adaptive Intelligence for Batteryless Sensors Using Software-Accelerated Tsetlin Machines. In Proceedings of the Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems; 2022; pp. 236–249. [Google Scholar]
Rahimi Gorji, S.; Granmo, O.-C.; Wiering, M. Explainable Reinforcement Learning with the Tsetlin Machine. In Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems; 2021; pp. 173–187. [Google Scholar]
Abeyrathna, K.D.; Granmo, O.-C.; Jiao, L.; Goodwin, M. The Regression Tsetlin Machine: A Tsetlin Machine for Continuous Output Problems. In Proceedings of the Progress in Artificial Intelligence, Cham; 2019; pp. 268–280. [Google Scholar]
Przybysz, E.; Bhattarai, B.; Persia, C.; Ozaki, A.; Granmo, O.-C.; Sharma, J. Verifying Properties of Tsetlin Machines. arXiv arXiv:2303.14464,2023.
Saha, R.; Granmo, O.C.; Goodwin, M. Using Tsetlin machine to discover interpretable rules in natural language processing applications. Expert Systems 2023, 40, e12873. [Google Scholar] [CrossRef]
Nowé, A.; Verbeeck, K.; Peeters, M. Learning automata as a basis for multi-agent reinforcement learning. In Proceedings of the International Workshop on Learning and Adaption in Multi-Agent Systems; 2005; pp. 71–85. [Google Scholar]
Lecerf, U. Robust learning for autonomous agents in stochastic environments. Sorbonne université. 2022. [Google Scholar]
Cox Jr, L.A. Confronting deep uncertainties in risk analysis. Risk Analysis: An International Journal 2012, 32, 1607–1629. [Google Scholar] [CrossRef] [PubMed]
Abolpour Mofrad, A.; Yazidi, A.; Lewi Hammer, H. On solving the SPL problem using the concept of probability flux. Applied Intelligence, 2019, 49, 2699–2722. [Google Scholar] [CrossRef]
Gullapalli, V. A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks 1990, 3, 671–692. [Google Scholar] [CrossRef]
Yazidi, A.; Granmo, O.-C.; Oommen, B.J.; Goodwin, M. A hierarchical learning scheme for solving the stochastic point location problem. In Proceedings of the Advanced Research in Applied Artificial Intelligence: 25th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2012, Dalian, China, 2012. Proceedings 25,2012, June 9-12; pp. 774–783.
Zhang, J.; Qiu, P.; Zhou, M. Extension of Stochastic Point Location for Multimodal Problems. IEEE Transactions on Cybernetics 2022. [Google Scholar] [CrossRef] [PubMed]
Yazidi, A.; Oommen, B.J. The Theory and Applications of the Stochastic Point Location Problem. In Proceedings of the 2017 International Conference on New Trends in Computing Sciences (ICTCS), 2017, 11-13 Oct; pp. 333–341.
1. Hosseinijou, S.A.; Bashiri, M. Stochastic models for transfer point location problem. The International Journal of Advanced Manufacturing Technology, 2012, 58, 211–225. [Google Scholar] [CrossRef]
Haran, I.; Halperin, D. An experimental study of point location in planar arrangements in CGAL. Journal of Experimental Algorithmics (JEA) 2009, 13, 2.3–2.32. [Google Scholar] [CrossRef]
Granmo, O.-C.; Oommen, B.J. Solving stochastic nonlinear resource allocation problems using a hierarchy of twofold resource allocation automata. IEEE Transactions on Computers 2010, 59, 545–560. [Google Scholar] [CrossRef]
Karthik, V. A NOVEL SURVEY ON LOCATION BASED NODE DETECTION AND IDENTIFYING THE MALICIOUS ACTIVITY OF NODES IN SENSOR NETWORKS. International Journal of Computer Engineering & Technology (IJCET), 2017, 8, 61–72. [Google Scholar]
Guo, Y.; Ge, H.; Huang, J.; Li, S. A general strategy for solving the stochastic point location problem by utilizing the correlation of three adjacent nodes. In Proceedings of the 2016 IEEE First International Conference on Data Science in Cyberspace (DSC); 2016; pp. 215–221. [Google Scholar]
Zhang, J.; Wang, Y.; Wang, C.; Zhou, M. Symmetrical hierarchical stochastic searching on the line in informative and deceptive environments. IEEE transactions on cybernetics 2016, 47, 626–635. [Google Scholar] [CrossRef]
Grady, L. Random walks for image segmentation. IEEE transactions on pattern analysis and machine intelligence 2006, 28, 1768–1783. [Google Scholar] [CrossRef]
Tong, H.; Faloutsos, C. Center-piece subgraphs: problem definition and fast solutions. In Proceedings of the Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006; pp. 404–413.
Yazidi, A.; Oommen, B.J. A novel technique for stochastic root-finding: Enhancing the search with adaptive d-ary search. Information Sciences 2017, 393, 108–129. [Google Scholar] [CrossRef]
Oommen, B.J. Stochastic searching on the line and its applications to parameter learning in nonlinear optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 1997, 27, 733–739. [Google Scholar] [CrossRef] [PubMed]
Abeyrathna, K.D. Novel Tsetlin Machine Mechanisms for Logic-based Regression and Classification with Support for Continuous Input, Clause Weighting, Confidence Assessment, Deterministic Learning, and Convolution. 2022. [Google Scholar]

1	UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data.
2	https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data

Figure 1. A two-action Tsetlin automata with the asymmetric steps of transition.

Figure 2. Structure of TM.

Figure 3. State updating rules in APT.

Figure 4. Train and Test accuracies for iris dataset in APT, CT, and AT.

Figure 5. Train and Test accuracies for the mushroom dataset in APT, CT, and AT.

Figure 6. Train and Test accuracies for the Mnist dataset in APT, CT, and AT.

Table 1. Modified Type I Feedback In Tsetlin Machine.

Clause value	1	1	0	0
Literal value	1	0	1	0

P(reward)	$(s - 1) / s$	NA	$1 / s$	$1 / s$
P(penalty)	$1 / s$	NA	$(s - 1) / s$	$(s - 1) / s$
P(reward)	$1 / s$	$(s - 1) / s$	$(s - 1) / s$	$(s - 1) / s$
P(penalty)	$(s - 1) / s$	$1 / s$	$1 / s$	$1 / s$

Table 2. Type II Feedback In Tsetlin Machine.

Clause value	1	1	0	0
Literal value	1	0	1	0

P(reward)	0	NA	0	0
P(inaction)	1	NA	1	1
P(penalty)	0	NA	0	0
P(reward)	0	0	0	0
P(inaction)	1	0	1	1
P(penalty)	0	1	0	0

Table 3. comparison of APT’s classification performance with other methods on the Iris dataset.

Table 4. Comparison of APT’s classification performance with other methods for the Mushroom dataset.

Table 5. comparison of APT’s classification performance with other methods on the MNIST dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Asymmetric Probabilistic Tsetlin Machine for Pattern Recognition

Abstract

1. Introduction

2. Materials and Methods

2.1. Asymmetric Stochastic Point Location

2.2. Asymmetric Probabilistic Tsetlin automata

2.3. Asymmetric Probabilistic Tsetlin Machine

2.3.1. Learning Procedure

3. Results

3.1. Iris Dataset

3.2. Mushroom Dataset

3.3. MNIST Dataset

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe