The Targeted Opinion Word Extraction (TOWE) challenge is effectively a sequence labeling task. Given a sentence
W consisting of
N words:
, where
is the specified target word (
), our objective is to assign a label
to each word
. This labeling results in a sequence
that accurately identifies opinion words relevant to
. In line with prior research [
1], we employ the BIO tagging scheme for the labels
li in TOWE, categorizing each as either the
Beginning,
Inside, or
Outside of an opinion phrase. The Syntactic-Enhanced Deep Learning Model (SEDLM) for TOWE comprises four integral components: (i) Sentence Encoding, (ii) Syntax-Model Consistency, (iii) Graph Convolutional Neural Networks, and (iv) Representation Regularization.
3.2. Syntax-Model Consistency
As highlighted earlier, this component leverages the dependency tree of W to compute syntax-based opinion possibility scores for the words, guiding the model’s representation learning through their alignment with model-based possibility scores. We hypothesize that words closer to in the dependency tree are more likely to express opinions about the target. Therefore, we calculate the distance for each word to in the dependency tree, and subsequently derive the syntax-based score using: .
To achieve syntax-model consistency, SEDLM produces model-based scores
for
in
W. These scores are obtained using Ordered-Neuron Long Short-Term Memory Networks (ON-LSTM) [
11], an advanced LSTM variant. The consistency is reinforced by incorporating the Kullback-Leibler divergence
between the syntax-based and model-based scores into the overall loss function, as specified in Equation .
ON-LSTM differs from traditional LSTM by implementing master input and forget gates, enabling differential access to neurons in the hidden vectors for each word, depending on its contextual significance. This approach allows SEDLM to assign a higher number of active neurons—and hence, a higher importance score—to words that carry more contextual information for TOWE, as reflected in . The model-based possibility scores are derived from these informativeness scores, facilitating the integration of syntactic information into ON-LSTM’s structure for enhanced word representation in TOWE. We denote the hidden vectors produced by ON-LSTM as for the input sequence vector X.
3.3. Graph Convolutional Networks
This section delves into the extraction of pivotal context words to refine the representation vectors
H for sentence
W, focusing on the syntactic relationships among the words in the context of TOWE. As previously mentioned, for any given word
, two sets of context words are crucial: (i)
’s syntactic neighbors, and (ii) the syntactic neighbors of the target word
. These context words should significantly influence the representation of
for accurate opinion word prediction. Our approach involves constructing two importance score matrices of dimensions
, representing the weights of contextual contributions from each word
to the representation of
. One matrix emphasizes the syntactic neighbors of
, while the other focuses on neighbors of the target word
. These matrices are then integrated and processed through a Graph Convolutional Network (GCN) model [
12] for enhanced representation learning.
For the syntactic neighbors of the current words, we employ the adjacency matrix of the dependency tree as the first importance score matrix. The adjacency matrix entries are set to 1 if there is a direct connection between and in the dependency tree, or if . To account for the target word’s neighbors, we calculate the second importance score matrix based on the syntactic distances of words from the target, and . The score is computed using a feed-forward network and the sigmoid function, considering various combinations of these distances. We then merge these matrices into a single matrix A for GCN processing, using a weighted sum approach controlled by a parameter .
The GCN model takes the ON-LSTM hidden vectors H and processes them using the combined adjacency matrix A. This process enriches each word’s representation with information from its syntactic context, enhancing the capability of the model in predicting opinion words. Specifically, the GCN involves several layers, and at each layer, the representation vector for word is updated using a ReLU activation function, normalized by the sum of the adjacency matrix entries. The final representation vector for each word after processing through the GCN layers is denoted by . Ultimately, for each word , we concatenate its representation vectors from ON-LSTM and GCN to form a comprehensive feature vector . This vector is then utilized in a feed-forward network with a softmax function to predict the probability distribution over possible opinion labels for . The model is trained using a negative log-likelihood function .
3.4. Representation Regularization
In TOWE, we categorize words in
W into three groups: the target word
, the target-oriented opinion words
, and the remaining words
. Post-processing by abstraction layers like ON-LSTM and GCN, the representation vectors for these groups should reflect distinct semantic roles. Specifically, vectors for
and
should align closely in terms of sentiment polarity. To enforce this representation distinction, we introduce a triplet loss term
, which encourages the similarity of vectors for
and
while differentiating them from
. For
, we use its GCN-derived representation vector
. To aggregate representations for
and
, which may involve multiple words, we employ a max-pooling strategy over their GCN vectors. However, to preserve the syntactic order and structure, we generate pruned trees from the dependency tree, centered around the target word for
and
. These pruned trees help maintain the syntactic context in the representations. GCN is applied to these pruned trees to generate the aggregated vectors
and
. The final loss function for SEDLM combines the prediction loss
with the KL divergence
and the regularization term
, balanced by parameters
and
. This integrated approach aims to optimize the model’s performance in identifying target-oriented opinion words by leveraging syntactic dependency features.
In our study, we approach the refinement of the representation vector for the target word, denoted as , by extracting it from the final layer of the Graph Convolutional Network (GCN), specifically . However, when dealing with the sets and , which consist of multiple words, a nuanced aggregation strategy is required to formulate the unified representation vectors and . Typically, a max-pooling operation is employed on the GCN-derived vectors for each word within these sets, providing a baseline aggregation method. However, this technique does not account for the structural and sequential nuances of the words within and , and it lacks a targeted approach to word representation, particularly regarding the target word.
To address these limitations, we propose a method that retains the syntactic structures among the words in both and for a more tailored representation. This involves constructing pruned trees from the original dependency tree of the sentence W, specifically oriented towards the words in and . These pruned trees are then utilized by the GCN model to generate representation vectors that reflect the syntactic relationships and target-word centric focus. The pruned tree for is formed by creating an adjacency matrix , where connections are established based on the shortest dependency paths between the target word and words in . Similarly, an adjacency matrix is created for , following the same principle.
Applying the GCN model to these adjacency matrices along with the ON-LSTM vectors H results in two sequences of hidden vectors, corresponding to and , respectively. The final representation vectors for these sets, and , are then obtained by selecting the GCN-produced hidden vectors of the target word for each set. This method ensures that both and are directly comparable with , providing a more coherent and unified representation framework.
The overall loss function for our SEDLM model combines the prediction loss with the Kullback-Leibler divergence and the representation regularization term , balanced by trade-off parameters and . This composite loss function aims to optimize the model’s performance in accurately identifying target-oriented opinion words through an integrated approach that capitalizes on syntactic dependency features.