1. Introduction
Natural Language Processing (NLP) encompasses a broad array of tasks, with sentence modeling standing as a foundational element in disciplines such as Sentence Classification [
1], Paraphrase Identification [
2,
3], Question Answering [
4,
5], Sentiment Analysis [
6,
7], and Semantic Similarity Scoring [
8,
9]. The crux of these tasks lies in the effective representation of word meanings, typically achieved through sophisticated neural embedding techniques [
10]. These embeddings serve as the building blocks for deriving sentence semantics, traditionally approached through either Bag-of-Words (BOW) [
11] or sequential models [
7,
12]. While the BOW model simplifies sentences to mere word collections, employing basic vector operations for composition [
11], sequential models view text as a linear progression of words, leveraging deep learning techniques like Long Short Term Memory (LSTM) networks [
13] and Convolutional Neural Networks (CNN) [
6,
14] for semantic matching tasks.
Despite their widespread usage, these traditional models often struggle to capture the complex, non-linear dependencies prevalent in natural languages. This is a critical shortcoming, as the nuanced interplay of words in a sentence often holds the key to its true meaning. To address this gap, tree-structured models have been developed, utilizing architectures that mirror the parse trees of sentences [
7,
8,
15,
16,
17,
18,
19] or employing latent trees designed for specific linguistic tasks [
20,
21]. Among these, Tree-RNNs, particularly those grounded in grammatical structures, offer a more sophisticated approach, organizing neural network units along the nodes of either a binarized constituency parse tree (CT) or a dependency parse tree (DT). The distinction between CT-RNN and DT-RNN architectures lies primarily in their approach to compositional parameters [
7]. The CT-RNN model places word embeddings exclusively at leaf nodes, with parent nodes emerging from the combination of left and right child nodes. Conversely, in the DT-RNN setup, each node is associated with a word embedding and receives an unordered list of child nodes as inputs. Empirical studies by [
7] have shown that DT-LSTM models excel in tasks like semantic relatedness scoring, while CT-LSTM models are better suited for sentiment classification.
However, a significant limitation of these tree-based models is their oversight of the type of word relationships, a factor that plays a vital role in the overall semantic understanding of a sentence. Words in a sentence can form various grammatical relationships, reflected in the sentence’s dependency tree structure. These relationships, known as Typed Dependencies [27] (encompassing both Stanford typed dependencies and Universal Dependencies), are instrumental in contributing to the sentence’s semantic fabric.
To illustrate the importance of these typed dependencies, consider the sentences: 1) "Dogs chased cats in the garden" and 2) "Cats chased dogs in the garden." In both instances, the word-pair (Dogs, chased) maintains a parent-child relationship within their respective dependency trees. However, the typed dependency in the first sentence is "dobj" (direct object), indicating "Dogs" as the object of the action "chased", while in the second sentence, it is "nsubj" (nominal subject), signifying "Dogs" as the subject executing the action "chased". This subtle difference in grammatical relations is pivotal in shaping the overall meaning of the sentences. Yet, models that do not consider these distinctions fail to accurately capture the nuanced semantics of such sentences.
The implications of typed dependencies extend beyond semantic interpretation to sentiment analysis. Sentiment analysis aims to ascertain the sentiment polarity of a text, typically categorizing it as positive, negative, or neutral. For instance, phrases like "white blood cells destroying an infection" and "an infection destroying white blood cells" [
19],
despite sharing the same words and tree structure, convey starkly different sentiments. The former is imbued with a positive connotation, while the latter carries a negative sentiment. Models relying solely on bag-of-words techniques or parse structures devoid of dependency type considerations are incapable of distinguishing between these semantic and sentiment nuances. This highlights the necessity for a deep learning model that is not only aware of but also actively incorporates typed dependencies.
Recent research efforts [28,29,30,31] have attempted to address the limitations of classic deep learning models in handling syntactic categories by incorporating POS and constituency tags. However, the exploration into the role of typed dependencies remains relatively uncharted. A notable exception is the work of [32] with their Semantic Dependency Tree-RNN model (SDT-RNN), a recursive neural network formulated based on the dependency tree. The SDT-RNN model distinguishes itself by training separate feed-forward neural networks for each dependency type, a method that, while promising, presents considerable data intensity and training complexity challenges.
In the realm of Dependency-based Long Short-Term Memory (D-LSTM) networks, [31] introduced an
-weighted supporting component, derived from the subject, object, and predicate of the sentence, to augment the basic sentence representation generated by standard LSTM. Similarly, the Part-of-Speech based LSTM (pos-LSTM) model by [42] calculates the supporting component from the hidden representation of constituent words and their tag-specific weights. Intriguingly, the pos-LSTM model achieved optimal results when it utilized only the hidden representation of nouns in the sentence to compute the supporting component, underscoring the predominant role of noun words in sentence semantics. Both the D-LSTM and pos-LSTM models, while making strides in their respective domains, are not inherently syntax-aware, as they follow a sequential approach and employ grammatical structures solely for identifying semantic roles or POS tags. Despite this, their improvements over the baseline Manhattan LSTM (MaLSTM) model [
12] are noteworthy.
Building on these insights, [
8] proposed the Tag-Guided Hyper Tree-LSTM (TG-HTreeLSTM) model, which consists of a primary Tree-LSTM complemented by a hyper Tree-LSTM network. The hyper Tree-LSTM network utilizes a hypernetwork to dynamically predict the parameters of the main Tree-LSTM, leveraging POS tag information at each node of the constituency parse tree. Similarly, the Structure-Aware Tag Augmented Tree-LSTM (SATA TreeLSTM) [28] employs an additional tag-level Tree-LSTM to supply auxiliary syntactic information to its word-level Tree-LSTM. These models have demonstrated significant improvements over their tag-unaware counterparts in sentiment analysis tasks, reinforcing the potential benefits of integrating syntactic information into deep learning models.
Motivated by these developments, our research aims to delve deeper into the role of grammatical relationships, particularly typed dependencies, in the semantic composition of language. This endeavor is guided by two primary objectives: 1) To conceptualize and introduce a versatile LSTM architecture capable of discerning and utilizing the type of relationships between elements in a sequence. 2) To develop an advanced deep neural network model, the Semantic Dependency Tree-LSTM (SDT-LSTM), optimized for learning a more nuanced semantic representation of sentences by harnessing both the dependency parse structure and the intricacies of typed dependencies between words.
In pursuit of these objectives, we have innovated an additional neural network module, the "Relation Gate," and integrated it into the LSTM architecture. This module serves as a regulatory mechanism, modulating the information flow between LSTM units based on an additional control parameter, termed ’relation’. Leveraging this "Relation-Gated LSTM" (R-LSTM), we have crafted the SDT-LSTM model, a sophisticated approach for computing sentence representations. This model, drawing inspiration from [
7]’s Dependency Tree-LSTM, has demonstrated superior performance in two key sub-tasks: Semantic Relatedness Scoring and Sentiment Analysis.
The contributions of this paper are threefold:
The introduction of the Relation-Gated LSTM (R-LSTM) architecture, a novel framework that incorporates an additional control input to regulate the LSTM hidden state based on the type of relationship between words.
The development of the Semantic Dependency Tree-LSTM (SDT-LSTM) model, which utilizes R-LSTMs for learning sentence semantic representations over dependency parse trees, with a specific focus on typed dependencies.
An in-depth qualitative analysis that sheds light on the pivotal role of typed dependencies in enhancing language understanding, particularly in the context of semantic relatedness and sentiment analysis.
The subsequent sections of this paper are organized in the following manner:
Section 2 delves into a comprehensive exploration of Long Short-Term Memory (LSTM) networks and elaborates on the nuances of the Dependency Tree-LSTM architecture. In
Section 3, we meticulously detail the construction and underlying principles of the Semantic Relationship-Guided LSTM (SRG-LSTM), along with a thorough description of the Semantic Dependency Tree-LSTM (SDT-LSTM) model’s framework.
Section 4 is dedicated to elucidating our experimental methodology and presenting the results obtained, effectively demonstrating the superior performance of our newly proposed models. The paper reaches its culmination in
Section 5, where we reflect on the insights gained from our research and propose potential avenues for future investigations in the realm of information extraction through the lens of syntactic dependency features.
5. Results and Discussion
5.1. Semantic Relatedness Assessment with SRG-LSTM and SDT-LSTM
In evaluating the efficacy of our SRG-LSTM and SDT-LSTM models for semantic relatedness scoring, we utilized the Pearson Correlation Coefficient and Mean-Squared Error (MSE) metrics to compare the actual relatedness score y with the predicted score . Our models were benchmarked against existing deep learning frameworks that employ LSTMs, dependency trees, or both, for semantic analysis. These frameworks are categorized as follows: the baseline mean vector approach, sequential models using LSTMs/GRUs (Gated Recurrent Units), tree-structured neural networks (tree-NNs) that leverage dependency trees, and models that integrate grammatical information into sequential models. SRG-LSTM and SDT-LSTM are part of the fourth category, utilizing both dependency structure and type.
Table 2 demonstrates that LSTM models surpass traditional RNN models, owing to the LSTM’s superior sequence information retention capabilities. Hierarchical models like DT-LSTMs and DT-GRUs outperform sequential ones, underscoring the advantage of tree-structured representations in semantic understanding.
The D-LSTM and various pos-LSTM models, which incorporate grammatical roles into sentence representation, showed enhancements over the MaLSTM model. Notably, the pos-LSTM-n, focusing solely on nouns, outperformed models considering a broader range of grammatical roles. This suggests that certain grammatical elements play a more pivotal role in semantic composition.
In our category, the SRG-LSTM and SDT-LSTM models, which consider both the structure and type of dependencies, show a marked improvement over other models, including the SDT-RNN. The SDT-RNN, originally designed for image captioning tasks, does not significantly outperform the DT-RNN in relatedness scoring, likely due to its complexity and the size of the network. Conversely, the SRG-LSTM and SDT-LSTM models, by representing each dependency type with a lower-dimensional vector, achieve better performance with fewer parameters, highlighting the efficiency of our approach.
Table 3 showcases how SRG-LSTM and SDT-LSTM retrieve the most semantically similar sentences for given queries from the SICK test dataset. The SRG-LSTM and SDT-LSTM models demonstrate a heightened ability to discern semantic nuances, as evidenced by their differentiated scoring for sentences with subtle variations.
Table 4 presents a selection of sentence pairs from the SICK dataset along with their corresponding actual (G) and predicted (S) similarity scores. This table illustrates the model’s sensitivity to syntactic variations, showcasing its ability to maintain robustness against changes in sentence structure while accurately reflecting semantic shifts.
The detailed examination of these results, particularly in the context of SRG-LSTM and SDT-LSTM, underscores the significance of considering both dependency structure and type in semantic relatedness tasks. Our models’ ability to efficiently integrate this information sets a new benchmark in the field, as reflected in the improved performance metrics.
Table 4 analyzes sentence pairs from the SICK dataset where the SRG-LSTM and SDT-LSTM’s predicted scores show the greatest divergence from human ratings. In the first two pairs, the models underestimate the semantic relatedness, indicating a challenge in encoding subtle semantic nuances, such as understanding that
sliding on a rail is a type of
skateboarding stunt, or that
taking the ball equates to
fighting for the ball in basketball. The remaining pairs show overestimations of relatedness, suggesting that the models may overemphasize certain semantic similarities.
5.2. Typed Dependency Embeddings Analysis
As detailed in
Section 5.1, the SRG-LSTM and SDT-LSTM models incorporate specialized relation gates that learn task-specific embeddings for typed dependencies.
An analysis of the magnitudes of these typed dependency embeddings (
Table 5) reveals their relative contributions to sentence meaning composition. This analysis confirms intuitive expectations: dependencies such as direct object, nominal modifier, adjectival modifier, and nominal subject are crucial, while others like goes-with and adjectival clause are less influential. Interestingly, the ’root’ dependency does not rank at the top.
Examining sentence structures, we noticed that in active sentences, dependencies such as ’nsubj’ and ’dobj’ are prominent, while in passive constructions, these shift to ’nmod’ and ’nsubjpass’. The model’s embedding vectors
reflect a consistent pattern in these transformations, suggesting that SRG-LSTM and SDT-LSTM effectively capture changes in grammatical relationships across different sentence voices.
5.3. Sentiment Analysis Proficiency
Our evaluation of sentiment analysis capabilities (
Table 6) indicates that the SRG-LSTM and SDT-LSTM models surpass standard LSTM and DT-LSTM models in accuracy. The performance is competitive with bidirectional LSTMs but slightly lower than that of CT-LSTMs (Constituency Tree-LSTMs).
The SRG-LSTM and SDT-LSTM models demonstrate a Precision of 88.4%, Recall of 84.8%, and F1-score of 86.6% in detecting positive sentiments. These metrics outshine the DT-LSTM, which shows a Precision of 87.62%, Recall of 82.28%, and F1-score of 85.39%. This comparative analysis clearly illustrates the superiority of our models in sentiment recognition. The optimal hyperparameters for these models were determined empirically.
Table 7 presents examples from the SST test dataset, highlighting the accurate and erroneous sentiment predictions by the SRG-LSTM and SDT-LSTM models. The table categorizes predictions into true positives, true negatives, false positives, and false negatives, offering insights into the models’ performance nuances.
6. Conclusion and Prospects for Future Research
This study has yielded several key advancements in the field of natural language processing. Firstly, we have introduced the Syntactic Relation Gated LSTM (SRG-LSTM) architecture, a novel approach that enables the network to learn distinct gating vectors for various types of relationships in input sequences. Secondly, the Syntactic Dependency Tree LSTM (SDT-LSTM) model has been proposed, leveraging both the dependency parse structure and the grammatical relations between words to enhance sentence semantic modeling. Finally, we have conducted a qualitative analysis exploring the impact of typed dependencies in language comprehension.
Our experimental results demonstrate that the SDT-LSTM model surpasses the DT-LSTM in both performance and learning efficiency for tasks such as semantic relatedness scoring and sentiment analysis. The SDT-LSTM exhibits a heightened ability to discern nuanced relationships within sentence pairs, achieving a closer alignment with human semantic interpretation compared to other contemporary methods.
The exploration of dependency types in semantic composition within a deep learning framework is a relatively uncharted territory. The computational models proposed in this research are poised to catalyze further investigations in this direction. Future endeavors will involve applying the SDT-LSTM model to a variety of NLP tasks, including paraphrase detection, natural language inference, question answering, and image captioning, where dependency tree-based models have shown significant promise.
A comprehensive analysis of the typed dependency embeddings learned by the SDT-LSTM model has unveiled intriguing insights into linguistic comprehension. These embeddings, from a linguistic standpoint, merit further exploration to deepen our understanding of language processing mechanisms.
In conclusion, the SRG-LSTM architecture introduced in this work presents a versatile concept that can be adapted to a range of domains. Given the prominence of LSTMs in numerous sequence modeling applications, the SRG-LSTM offers a compelling alternative for tasks that require modeling not just nodes but also the diverse types of links connecting them. Our findings suggest that SRG-LSTMs are adept at learning relationships between LSTM units, and the potential of incorporating a relation gate in other gated architectures, such as Tree-GRUs, is an exciting avenue for future exploration.
Table 1.
Hyperparameter range used for model tuning, with the optimal values in bold.
Table 1.
Hyperparameter range used for model tuning, with the optimal values in bold.
Parameter |
SICK-R |
SST |
Learning rate |
0.01/0.05/0.1/0.2/0.25/0.3 |
0.01/0.05/0.1/0.2/0.25/0.3 |
Batch size |
25/50/100 |
25/50/100 |
Memory dimension |
120/150/100 |
165/168/170/175 |
Weight decay |
0.0001 |
0.0001 |
Optimizer |
adagrad/adam/nadam |
adagrad/adam/nadam |
Table 2.
Semantic relatedness scoring comparison of SRG-LSTM and SDT-LSTM with other LSTM models. Values are derived from existing literature.
Table 2.
Semantic relatedness scoring comparison of SRG-LSTM and SDT-LSTM with other LSTM models. Values are derived from existing literature.
Model |
Pearson’s r
|
MSE |
Mean vectors |
0.7577 |
0.4557 |
Sequential LSTM |
0.8528 |
0.2831 |
Bidirectional LSTM (Bi-LSTM) |
0.8567 |
0.2736 |
GRU Model |
0.8595 |
0.2689 |
MaLSTM |
0.8177 |
– |
Dependency Tree-RNN (DT-RNN) |
0.7923 |
0.3848 |
DT-LSTM |
0.8676 |
0.2532 |
DT-GRU |
0.8672 |
0.2573 |
Dependency-aware LSTM (D-LSTM) |
0.8270 |
0.3527 |
Positional LSTM (pos-LSTM-n) |
0.8263 |
– |
Positional LSTM (pos-LSTM-v) |
0.8149 |
– |
Positional LSTM (pos-LSTM-nv) |
0.8221 |
– |
Positional LSTM (pos-LSTM-all) |
0.8173 |
– |
Semantic Dependency Tree-RNN (SDT-RNN) |
0.7900 |
0.3848 |
SRG-LSTM and SDT-LSTM |
0.8731 |
0.2427 |
Table 3.
Illustrative examples of sentences retrieved by SRG-LSTM and SDT-LSTM from the SICK test set. denotes scores by Dependency Tree-LSTM, and by SRG-LSTM and SDT-LSTM.
Table 3.
Illustrative examples of sentences retrieved by SRG-LSTM and SDT-LSTM from the SICK test set. denotes scores by Dependency Tree-LSTM, and by SRG-LSTM and SDT-LSTM.
Query and Retrieved Sentences |
|
|
Sample Query 1 |
|
|
Sentence A |
4.48 |
4.81 |
Sentence B |
4.48 |
4.66 |
Sentence C |
4.48 |
4.57 |
Sample Query 2 |
|
|
Sentence D |
4.12 |
4.51 |
Sentence E |
4.11 |
4.15 |
Sentence F |
4.16 |
4.11 |
Sample Query 3 |
|
|
Sentence G |
4.79 |
4.87 |
Sentence H |
4.82 |
4.88 |
Sentence I |
4.85 |
4.87 |
Table 4.
Selection of sentence pairs from the SICK test dataset with notable discrepancies between the predicted score (S) and ground truth (G).
Table 4.
Selection of sentence pairs from the SICK test dataset with notable discrepancies between the predicted score (S) and ground truth (G).
Sentence 1 |
Sentence 2 |
S |
G |
A skateboarder is performing a stunt |
A skateboarder is sliding on a rail |
2.43 |
4.0 |
A basketball player is lying on the court with the ball being taken by another player |
Two players are fighting for the ball on the basketball court |
2.7 |
4.7 |
A motorcyclist is showing off tricks |
The motorcyclist is being tricked by a performer |
3.94 |
2.6 |
A gathering of five elderly people indoors |
Five young people are hanging out indoors |
4.64 |
3.4 |
A dog is leaping onto a diving board |
A dog is jumping on a trampoline |
4.12 |
2.9 |
Table 5.
Universal Dependencies ranked by their magnitude (M) as learned by the SRG-LSTM and SDT-LSTM models.
Table 5.
Universal Dependencies ranked by their magnitude (M) as learned by the SRG-LSTM and SDT-LSTM models.
Dependencies |
Examples |
Notation |
M |
Direct object |
Chef prepared the meal
|
dobj(prepared,meal) |
9.16 |
Nominal modifier |
Meal was cooked by the chef
|
nmod(cooked,chef) |
7.62 |
Adjectival modifier |
Sam likes spicy food
|
amod(food,spicy) |
7.31 |
Nominal subject |
Chefprepared the meal |
nsubj(prepared,chef) |
7.27 |
Conjunction |
Bill is tall and kind
|
conj(tall,kind) |
6.97 |
Negation modifier |
Bill is not a scientist
|
neg(scientist,not) |
6.90 |
Case marking |
I saw a cat under the table
|
case(table,under) |
6.76 |
Table 6.
Performance comparison of SRG-LSTM and SDT-LSTM with other LSTM models for binary sentiment classification on the SST dataset. The values are sourced from prior studies.
Table 6.
Performance comparison of SRG-LSTM and SDT-LSTM with other LSTM models for binary sentiment classification on the SST dataset. The values are sourced from prior studies.
Model |
Accuracy(%) |
LSTM |
84.9 |
Bi-LSTM |
87.5 |
2-layer LSTM |
86.3 |
2-layer Bi-LSTM |
87.2 |
CT-LSTM |
88.0 |
DT-LSTM |
85.7 |
SRG-LSTM & SDT-LSTM |
86.9 |
Table 7.
Sentiment analysis examples from the Stanford Sentiment Treebank (SST) with predicted sentiment score (S) and actual label (G). 0 denotes negative and 1 positive sentiment.
Table 7.
Sentiment analysis examples from the Stanford Sentiment Treebank (SST) with predicted sentiment score (S) and actual label (G). 0 denotes negative and 1 positive sentiment.
Input Sentences |
S |
G |
"I wish I could enjoy the weekend, but I was relieved when it ended." |
0 |
0 |
"Despite high expectations, the movie barely managed to move me." |
0 |
0 |
"Quirky yet endearing, the film captures the essence of its theme." |
1 |
1 |
"A deep dive into the complexities of love and sacrifice." |
1 |
1 |
"Starts as an intriguing exploration but ends up as an underwhelming gimmick." |
1 |
0 |
"Tedious for anyone except the most devoted fans." |
1 |
0 |
"The film resonates with raw emotion, leaving a lasting impression." |
0 |
1 |
"Subtle performances that deserve recognition and acclaim." |
0 |
1 |