The first layer of the Tsetlin Machine consists of a set of binary features or input variables, which are represented as literals (positive statements) and negated literals (negative statements). The input variables can be represented as follows:
Notice how the composition of a clause varies from another clause depending on the indexes of the included literals in the set .
The State Layer in the Tsetlin machine architecture stores the states of the clauses in the memory matrix and updates those states while processing an input pattern. The state update rule depends on the input pattern and the bias parameters associated with each clause, which can be learned during training.
The TM structure is shown in
Figure 2—Classifies data into two classes. Therefore, the subpatterns associated with each class have to be learned separately. For this purpose, the clauses are divided into two groups, where one group learns the subpattern of class 1 while the other learns the subpatterns of class 0. For simplicity, Clauses with odd indexes are assigned with positive polarity (
), and they are supposed to capture subpatterns of output
. Clauses with even index, on the other hand, are assigned with negative polarity (
), and they are supposed to capture subpatterns of output
-the clauses which recognize subpattern output 1. We need to sum each class’s clause outputs and assign the sample to the class with the highest sum. A higher sum means that more sub-patterns have been identified from the designated class, and the input sample has a higher chance of being of that class. Hence, with v being the difference in clause output.
2.3.1. Learning Procedure
to learn which literals to include is based on two types of reinforcement: Type I and Type II.
Type I feedback produces frequent patterns, while Type II feedback increases the discrimination power of the patterns. Which is described in detail in the following.
Type I Feedback: Reduce False Negatives
Type I feedback is formulated to enhance the true positive outputs of clauses while mitigating false negative outputs. In order to bolster a clause’s true positive output (where the clause output should be 1), Type I Feedback reinforces the "Include" actions of Tsetlin Automata (TAs) corresponding to literal values of 1. Concurrently, within the same clause, Type I Feedback amplifies the "Exclude" actions of TAs linked to literal values of 0. To address instances of incorrect negative clause outputs (where the clause output should be 0), a gradual erasure of the currently identified pattern is initiated. This is executed by intensifying the "Exclude" actions of TAs, irrespective of their corresponding literal values. As a result, clauses with positive polarity necessitate Type I feedback when , while clauses with negative polarity demand Type I feedback when .
The classical Tsetlin machine incorporates an inaction probability, indicating the likelihood of Tsetlin Automata remaining in their present states. To achieve an asymmetric transition, crucial for enhanced performance, the inaction probability is replaced by reward or penalty probabilities, depending on which one is more likely. the elimination of the inaction probability introduces an asymmetry in the transition probabilities. Consequently, the probability of transitioning from one state to another becomes imbalanced. This imbalance stems from the distinction in transitioning likelihoods based on whether an action encompasses or excludes a specific clause. The result is a dynamic shift in the machine’s behavior, impacting the system’s responsiveness and adaptability during its operation.
Table 1 displays the modified feedback type I table with the necessary adjustments for achieving asymmetric transitions.
"S" is a user-defined parameter that governs the granularity of the clauses’ capturing ability and their production rate.
transition probability
The behavior of the Tsetlin Automata in response to input features is significantly influenced by the transition probabilities between its states, which play a critical role. These probabilities hinge on various factors, including the current state of the Tsetlin Automata, the literal value, and the current clause value. However, there is a potential vulnerability in Tsetlin machines where they might become trapped in suboptimal states if their exploration of the state space is insufficient.
TO mitigate this issue, the introduction of randomness into state transitions emerges as a viable solution. This adjustment allows the Tsetlin machine to traverse through diverse states, thereby preventing it from becoming trapped in unfavorable states due to inadequate exploration. The incorporation of randomness is achieved through the utilization of a standard normal (standard Gaussian) random variable denoted as
. This variable possesses an average of zero and its standard deviation diminishes as epochs progress, a trend determined by the exponential function.
where
represents the initial standard deviation,
d stands for the rate of decay, and
corresponds to the ith epoch. This formulation signifies that the standard deviation of randomness decreases over epochs, resulting in a reduction of randomness in state transitions over time. Consequently, the Tsetlin machine gradually shifts its focus from exploration towards exploiting the optimal state.
Through the incorporation of this decaying standard deviation-based randomness, the Tsetlin machine effectively balances its exploration and exploitation tendencies. This equilibrium is achieved by introducing variability that is inversely proportional to the progression of epochs, thus adapting the machine’s behavior over time.
As a result of incorporating this randomness search mechanism, the updated probabilities for both reward and penalty can be expressed as:
Engaging in a meticulous comparison of likelihood probabilities allows us to extract precise, quantitative insights regarding the relative occurrences of specific events. This analytical method not only empowers us to base our decisions and assessments on accurate probability values but also facilitates the establishment of a comprehensive understanding of the dynamics between these events. To achieve a precise assessment of whether the reward supersedes the penalty (i.e., ), or inversely, the likelihood of the penalty outweighing the reward (i.e., ), we adopt a well-defined methodology. This approach, which takes the actual probability values into consideration, gains significant relevance when our goal is to comprehensively grasp the interplay between these events. This comprehensive assessment is achieved by following a specific procedure that involves statistical measures, notably the utilization of the cumulative distribution function (CDF) for the normal distribution
Upon observation from
Table 1, it becomes apparent that we have precisely two distinct values for the probabilities, namely
and
. Let’s define the variables as follows:
To calculate the p(
), we can leverage the property that the difference between two independent normal random variables follows a normal distribution. This allows us to easily calculate its mean and variance based on the means and variances of the original random variables. The mean and variance of X are:
Similarly, the mean and variance of Y are:
Now, we can find the mean and variance of the difference Z = X - Y as follows:
Therefore, Z is a normal random variable with mean(z) and var(z).
Now, we can calculate the probability of the inequality as follows:
Using the mean and variance of Z, we can standardize it by subtracting its mean and dividing by its standard deviation:
Substituting the values of mean(Z) and var(Z), we get:
Now, we can rewrite the inequality as:
Finally, we can use the cumulative distribution function (CDF) of the normal distribution to calculate the probability:
Where
is the CDF of the standard normal distribution. Hence, we can state the following:
The outcome of this computation delivers the precise probability of either the reward or penalty prevailing over the other. In this specific context, the parameter serves as a gauge of the relative potency or dominance between the probabilities tied to the reward and penalty. This metric facilitates a probabilistic evaluation of the more probable event, furnishing indispensable insights for decision-making and the system’s state transitions.
With reference to
Table 1 and in light of Equation (
3), the state update protocol for Tsetlin Automata can be deduced by taking into account the feedback sourced from the environment and the consequent action associated with the respective clause. These protocols are subsequently consolidated in
Figure 3.
In the Asymmetric Probabilistic Tsetlin Machine paradigm, the parameters "s," "a," and "b" hold paramount significance, as they intricately shape the system’s behavior. Parameter "s," often referred to as feedback strength, finely controls the feedback magnitude directed at each Tsetlin Automaton during the learning phase. This adjustment process reinforces favorable decisions while discouraging repetition of incorrect ones. Crucially, the value of "s" significantly impacts the magnitude of these adjustments. Opting for higher "s" values fosters accelerated learning but may introduce instability due to oscillatory behavior. Conversely, lower "s" values encourage a more gradual and stable learning process, albeit at a slower pace.
In this context, the interplay between "s," "a," and "b" becomes evident. The parameters "a" and "b" not only dictate the feedback strength but also influence the rate of transition among Tsetlin Automaton states. Their roles are intertwined, amplifying their collective importance. Recognizing the profound impact of these parameters on system performance, it is crucial to thoughtfully tailor "a" and "b" according to the specific problem and dataset at hand, aiming to achieve the optimal balance between exploration and exploitation.
To mitigate the challenge of parameter tuning and streamline the optimization process, we can link the number of transition steps to the variable "s." By setting the variable "a" to the largest integer less than or equal to s - 1, denoted as ⌊ s - 1 ⌋, its value can be determined.
Consequently, the value of the variable "b" can be computed using Equations (3). By doing so, this method effectively simplifies both the training and optimization phases within the system.
Type II feedback
Type II feedbac aims to reduce false positive clause outputs. It focuses on turning a clause output from 1 to 0 when it should be 0. This feedback type includes a literal of value 0 in the clause to achieve this. Clauses with positive polarity need Type II feedback when the desired output (y) is 0, and clauses with negative polarity need it when y is 1, as they do not want to vote for the opposite class.
Adaptations to the learning process, such as introducing asymmetric transition probabilities and implementing a decay-based standard deviation for randomness, are specifically targeted toward Type I feedback. These adjustments are essential to enhance performance by minimizing false negatives. In contrast, Type II feedback operates with a distinct objective and does not necessitate these particular modifications. In Type II feedback, the mere incorporation of a literal with a value of 0 within the clause proves adequate for achieving the desired reduction in false-positive clause outputs. As a result, the procedure outlined for Type II feedback can be summarized in
Table 2.
Therefore, the estate transition rules follow the classical Tsetlin machine when excluding randomness and employing a single-step transition in both directions.