1. Introduction
In today's dynamic and interconnected world, the significance of information spans various critical domains, including legal, political, commercial, or individual perspectives and many more. Recognizing the pivotal role that opinions play in shaping decisions and influencing outcomes, there is a growing need for automated tools to analyze sentiments effectively. Regarding this case, sentiment analysis emerges as a significant participant. Sentiment mining, or Sentiment Analysis, is a comprehensive natural language processing approach that can identify and classify textual data's emotional tone and subjective content. People are beginning to communicate their thoughts more quickly and in a shorter time, making the manual processing of many viewpoints quite tricky. Therefore, sentiment analysis proved extremely useful in this field [
1,
2,
3]. By employing the sentiment analysis technique, stakeholders can also navigate the intricate layers of precedents and decisions, enhancing their capacity for nuanced interpretation and contributing to more informed decision-making and policy formulation [
2].
Recently, a significant amount of research has been conducted on opinion mining and sentiment analysis by applying machine learning and deep learning in various domains [
4,
5,
6]. Opinion and sentiment analysis activities have been improved with the application of several neural networks, such as Convolutional Neural Networks (CNNs), GRU (Gated Recurrent Unit) or LSTM (Long Short Term Memory), and Recurrent Neural Networks (RNNs) [
7]. Additionally, machine learning and deep learning models excel in analyzing short texts, leveraging abundant datasets from social networks to identify opinions quickly. However, tackling longer documents presents a more intricate challenge, given the higher word count and complex semantic links between sentences. Researchers are increasingly invested in developing advanced analysis techniques to extract nuanced points of view on specific subjects from this substantial data mass. Navigating through the intricacies of longer documents, they aim to enhance sentiment analysis accuracy and gain deeper insights into complex topics, reflecting the evolving landscape of text analysis. From a legal perspective, there is a discernible trend towards integrating cutting-edge technologies such as machine learning and sentiment analysis to enhance the analytical capabilities of legal practitioners. Rhanoui, et al. [
8] utilized the CNN-BiLSTM model to analyze press articles and blog posts and reported almost 90.66% accuracy. Similarly, Tripathy, et al. [
9] a hybrid machine model was employed to classify document-level sentiment and claim positive feedback. Hence, this technological infusion holds particular promise in Canadian Maritime Case Law, where the complexities of legal texts and the need for precise forecasting of court decisions pose significant challenges [
10].
Thus, this study aims to use a novel combination of deep learning techniques, namely Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, to glean emotional insights from Canadian maritime case law papers. This technique fills a gap in the literature by providing a novel strategy for using sentiment analysis in the law, and it does so by focusing on the application of deep learning to the study of Canadian maritime case law [13]. Legal sentiment analysis has the potential to completely alter how lawyers and judges examine massive collections of case law. The document's emotional tone, judgments, and sentiment dynamics are insightful for attorneys, judges, politicians, and academics. This study introduces deep learning models for examining Canadian maritime case law, which may pick up on subtleties of feeling that more conventional approaches would otherwise miss. Explored are the potential effects of these cutting-edge computational methods on legal analytics, policy formation, and the creation of AI-powered legal instruments.
The process begins with presenting case law, followed by an emotional evaluation of the findings. An extensive literature review explores the topic of sentimental analysis and its relevance to the legal profession. Then, the process involves gathering data to develop an ML model and analyze experiment outcomes aligning with prior research[
14].
4. Proposed Model: CNN - LSTM and Doc2vec for Document-Level Sentiment Analysis
Cutting-edge methods in document-level sentiment analysis, like CNN-LSTM and Doc2Vec (see
Figure 3), leverage advanced techniques to extract valuable insights and sentiment information from extensive texts like reviews, articles, and reports. These methods aim to decipher the text's underlying meaning and emotional nuances by employing deep learning and vector representations.
Although more commonly associated with image processing, CNNs can also be effectively trained for text analysis. In this context, CNNs are crucial in mining textual data to discern sentiment at the document level. Their functionality involves passing the input text through convolutional filters, adept at identifying local patterns and characteristics that may serve as sentiment indicators. The strength of CNN lies in its ability to pinpoint key phrases or sequences of words inside texts, thereby contributing significantly to the overall sentiment of the document.
Long short-term memory (LSTM) RNNs excel in modeling sequential data, making them highly effective for understanding text's natural flow and context. In document-level sentiment analysis, LSTMs are crucial for extracting word and sentence dependencies, allowing them to capture evolving attitudes throughout lengthy texts. Their ability to selectively retain and forget information over extended sequences ensures consistent and nuanced sentiment analysis, making LSTMs indispensable in natural language processing.
Gates:
LSTMs use three types of gates: i) the forget gate (f), ii) the input gate and iii) the output gate (o).
These gates control the flow of information into and out of the cell state
a. Cell State
The cell state represents the memory of the LSTM. It can be updated and modified using the gates.
The cell state is updated using the forget gate, input gate, and a new candidate cell state.
b. Hidden State:
The hidden state carries information about the current time step's input and the previous hidden state.
It is used to make predictions and updated using the output gate.
Here, σ represents the sigmoid activation function.
c. Input Gate
d. Candidate Cell State
e. Update Cell State
This equation combines the old and new candidate cells based on forget and input gates.
f. Output Gate
g. Hidden State
The output gate controls the information that is passed to the hidden state. However, here represents the input at time step t, presents the hidden state at time step t-1. Similarly, and represent Weight matrices for the gates and Bias vectors for the gates respectively. On the other hand, stands for the sigmoid activation function, whereas presents the hyperbolic tangent activation function.
In deep learning, LSTM models are pivotal for language modeling, translate texts, and speech recognition due to their selective updating and retrieval processes. Combining convolutional neural network (CNN) with long short-term memory (LSTM) models proves a powerful approach for sentiment analysis, offering nuanced evaluations of entire articles, including positivity, negativity, or neutrality. In contrast to Word2Vec, Doc2Vec takes a holistic approach, representing entire manuscripts. This allows it to recognize emotional context and sentiments, even without labels or phrases, by analyzing papers comprehensively and understanding intricate linkages between words.
4.1. Model Overview and Motivation
Providing a comprehensive overview of the model and articulating the reasons behind its development are imperative steps in establishing the study's context, substantiating its significance, and justifying its importance.
The sentiment analysis model employed or under construction involves a sophisticated architecture comprising key components and methodologies pivotal to its functionality. The utilization of advanced techniques such as Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, or any other deep learning approaches. The utilization of advanced techniques, such as model construction, underscores its design's intricacies. An exhaustive elucidation of the model's development major components is paramount, necessitating a detailed exploration from the initial data preparation phase to the subsequent training and testing stages [
27]. This comprehensive breakdown ensures a thorough understanding of the model's structure and the rationale behind the strategic integration of specific neural network architectures.
In delving into sentiment analysis within the context of Canadian marine case law, this model seeks to illuminate the distinctive features of this legal domain, emphasizing its uniqueness compared to broader legal corpora. The motivation for this specialized approach stems from the discernible gaps in the existing literature, where the intricacies of maritime law often remain underserved. By addressing these voids, the model aims to introduce fresh perspectives and techniques, showcasing innovative approaches to sentiment analysis. The practical applications of this model in real-world scenarios promise to enhance legal research, decision-making, and policy formulation within Canadian maritime case law. Furthermore, the article meticulously navigates the challenges and complexities inherent in sentiment analysis of legal documents, shedding light on the shortcomings of current methodologies. Through its straightforward presentation of the model's structure and a compelling justification[
28], this study not only fills an existing void in the understanding of sentiment in Canadian maritime case law but also has the potential to significantly influence the broader field of sentiment analysis within legal contexts.
4.2. Document Representation
Effective sentiment analysis in Canadian marine case law requires transforming legal texts into a suitable format for deep learning programs. The initial phase includes data cleansing, information removal, and standardization. Additionally, tokenization breaks down text into individual words, and word embedding represents them as numerical vectors. We use methods like averaging or TF-IDF weighting or more complex algorithms like Doc2Vec to produce document-level representations (see
Figure 4).
In Canada's maritime case legislation, sequence padding is vital for efficiently processing legal documents with varying durations. Despite challenges like intricate legalese and lengthy paperwork, sequence padding is indispensable for sentiment analysis. It helps represent legal documents, enabling deep learning to discern emotional patterns within the previously unstructured legal material.
Like CBOW, Skip-Gram predicts every other word as an output after receiving a word as input.
The Continuous Bag of Words (CBOW) algorithm is a word prediction method considering the context. This paradigm is highly effective because it uses few resources and can be as simple as a single word or a group of words. This method is based on computing the negative logarithmic probability of a word in relation to its context.
Word2Vec (Skip-gram with Negative Sampling):
Skip-gram:
Maximize the probability of the context words given the target word.
The loss function (negative log-likelihood):
Negative sampling (approximation to improve efficiency):
Maximize the probability of the true context words and minimize the probability of randomly sampled "negative" context words.
for true context words
for negative samples
Minimize the squared difference between word vectors and their dot products.
where
is the co-occurrence count of words w_i and w_j in the corpus,
is a weighting function,
represents the model parameters.
FastText:
FastText introduces subword information and computes word vectors based on subword embeddings.
The subword vectors are combined to represent a word.
The equation for the word vector of a word is based on the summation of its subword vectors.
4.3. Convolution Layer
In Canadian maritime case law sentiment analysis, the Conv1D layer is crucial in CNN+LSTM models. Renowned for extracting intricate sentiment patterns from complex legal texts by identifying local textual patterns and navigating the length and intricacy of documents. (see
Figure 5). By spotlighting these linguistic nuances, conv1D filters enhance the model’s capacity to distinguish between positive and negative sentiments [
29].
"LeNet" and "AlexNet," two prominent CNNs, share linear neuron model principles. CNNs, unlike traditional MLPs, incorporate weight sharing and restricted connection in convolutional layers.
However, integrating Conv1D with LSTM layers proves a powerful tool for discerning emotions in complex legal documents. Conv1D’s localized receptive model captures particular segments and global feelings, aided by max-pooling layers to preserve excessive feature loss. This approach benefits Canadian marine case law, providing a more accessible understanding of rulings and accommodating diverse perspectives.
The 1D forward propagation (1D-FP) expressions in each CNN layer are as follows:
The input is denoted by the symbol
, where
is the bias of the k th neuron at layer l,
is the output of the i th neuron at layer l-1, and
is the kernel from the i th neuron at layer l-1 to the k th neuron at layer l.
With l = 1 as input and l as output, the back-propagation procedure begins at the MLP-layer. There are NL distinct types of data in the repository. In the output layer, we represent the mean-squared error (MSE) between an input vector p and its target and output vectors, t p and [
, yNL L]:
's derivative by each network parameter may be calculated using the delta error, k l = E xk l. To be more precise, the chain rule of derivatives may be used to update not just the bias of the current neuron but also the weights of all of the neurons in the layer above.
Figure 6.
CNNs back-propagation and forward propagation.
Figure 6.
CNNs back-propagation and forward propagation.
CNNs with several layers use both back-propagation and forward propagation.
Through forward and reverse propagation, the last hidden CNN layer is linked to the first hidden MLP layer (see
Figure 7)
- 1)
Initialize weights and biases (e.g., randomly, ~U(-0.1, 0.1)) of the network.
- 2)
-
For each BP iteration DO:
- a.
For each PCG beat in the dataset, DO:
BP: Compute delta error at the output layer and back-propagate it to first hidden layer to compute the delta errors.
PP: Post-process to compute the weight and bias.
4.4. Activation Layer
In the context of sentiment analysis applied to Canadian marine case law texts using a CNN+LSTM architecture, the activation layer, also known as the activation function, is a pivotal element. The model's ability to capture complicated connections and produce precise predictions is greatly enhanced by adding this non-linear laNyer. By introducing non-linearity, the model becomes adept at discerning complex patterns, enabling more accurate predictions and nuanced insights into sentiment from the nuanced language of legal documents.
One of the most important parts of deep learning models is the activation layer, which helps to interpret complex legal documents' sentiment patterns and other nuanced emotional expressions in the data they include. This essential layer is the linchpin for capturing and learning from recurring structures, decision-making, and controlling gradient flow within the model. Despite its undeniable significance, the activation layer is not without challenges, with the specter of saturation looming as a potential impediment to the deep learning model’s learning speed and overall effectiveness. Nevertheless, its indispensability remains unassailable, as the success of deep learning models in the nuanced domain of sentiment analysis within legal texts is intricately tied to the adept functioning of the activation layer, as underscored by empirical evidence [
30].
4.5. Regularization
Combining deep learning methods, like CNN with LSTM models for sentiment analysis in Canadian marine case law, heavily employs regularization techniques to counteract overfitting. The complexity of legal language patterns makes accurate representation critical. An issue is overfitting, which occurs when a model performs exceptionally well on training data but poorly on new data. To achieve robust sentiment analysis in this field, regularization is essential for ensuring the model can handle both the complexities of the training data and the wide variety of legal text patterns.
4.6. Optimization
Sentiment analysis is crucial in Canadian maritime case law to understand the complex feelings expressed in legal documents. Combining CNN and LSTM models is one example of a complex model used to improve sentiment analysis. These models undergo optimization as part of the training process, which entails fine-tuning many parameters. The main goal is to improve the model's forecast accuracy by reducing its loss function. The optimization of sentiment analysis models becomes crucial in the complex world of legal discourse to guarantee a detailed comprehension of the emotions undercurrents in marine case law. The model's sentiment detection capabilities are enhanced, and the intricacies of legal language are more accurately reflected through this repeated tuning process.
4.7. BiLSTM Layer
Data is consistently organized sequentially in this layer, facilitating a structured representation. The layer explores connections between inputs and outcomes, while at its entry point, outcomes from maximum pooling operations are concatenated. This concatenation enhances the model's capacity to capture diverse patterns, promoting a robust understanding of input data and contributing to effective learning and predictions.