1. Introduction
In the real world, emergency events[
1] such as traffic accidents, fires, forest fires, earthquakes, and health safety seriously threaten the safety of human life and property, and therefore are widely concerned by people from all walks of life. In the context of the Internet information era, the rapid spread and fermentation of information about emergency events through online media will breed public opinion events and affect social public safety. Accurate and efficient access to structured information of emergency events can help relevant staff to achieve early detection and early treatment of emergency events, curb the generation of public opinion events, and maintain social public safety.
The goal of event extraction is to identify pre-specified types of events and the corresponding event arguments from plain text. A large number of previous studies have focused on sentence-level event extraction[
2,
3,
4], and most of these studies were based on the evaluation of ACE[
5]. These approaches based on sentence-level event extraction make predictions within a sentence and are unable to extract events between sentences. In the real world, the information of event arguments cannot be fully obtained from a single sentence, as shown in
Figure 1, the event argument of the "Accident" event "重庆永川吊水洞煤矿" and "12月4日17时许" are distributed in two different sentences, s1 and s2, so we devoted ourselves to the study of DEE. In order to extract structured event information from documents, researchers have proposed a large number of model methods and datasets for model training and validation in previous work, which are presented below in terms of DEE datasets and DEE models and methods.
A wide range of research scholars have done a lot of work on DEE datasets to be able to train and validate DEE model methods, constructing a large number of DEE data, such as the MUC-4[
6] dataset, which consists of 1,700 documents annotated using an associated role-population template; Twitter dataset, which was constructed by collecting and annotating English tweets posted in December 2010, including 20 event types and 1,000 tweets; WIKIEVENTS dataset, published by Li et al.[
7] as a document-level benchmark dataset, uses English Wikipedia articles as the data source; Yang et al.[
8] conducted experiments on four types of financial events, namely, equity freeze events, equity pledge events, equity repurchase events, and equity increase events, and a total of 2,976 announcements were tagged. Although, a large number of DEE datasets have been constructed by domestic and foreign researchers in previous works, on the one hand, most of these datasets are English datasets which cannot be trained and validated for Chinese DEE model methods, and on the other hand, there is no DEE dataset constructed for the field of emergency events. Therefore, it is the current priority to construct Chinese document-level emergency event extraction datasets and solve the problem of missing datasets.
In terms of DEE models and methods, a large number of scholars have focused on the two challenges of argument scattering and multiple events. In particular, argument scattering refers to the fact that the event arguments of an event are scattered in multiple sentences of a document, for example, in
Figure 1, the event arguments of the "Rescure" event, "重庆永川吊水洞煤矿" and "30个小时", are distributed in two different sentences, s1 and s5 of the document; multiple events means that a document may contain several different events, for example,
Figure 1 contains "InjureDead", "Rescure" and "Accident " three different events.
In a previous study, Yang et al.[
8] proposed that the DCFEE model first extracts trigger words and arguments in a sentence-by-sentence manner, and then uses a convolutional neural network to classify each sentence to determine whether it is a key sentence. Meanwhile, in order to obtain the complete event arguments, an argument complementation strategy is proposed to obtain arguments from the surrounding sentences of the sentence in which the key event is located for complementation. Zheng et al.[
9] redesigned the DEE task to treat the DEE task as a table-filling task using a trigger-word-free approach to populate candidate entities into a predefined event table. Specifically, they modeled DEE as a continuous prediction paradigm in which arguments are predicted in a predefined role order and multiple events are also extracted in a predefined event order. The method accomplishes the DEE task goal using trigger-word-free extraction, but since the arguments are predicted in a predefined role order during the process of argument identification, the former argument identification results do not take into account the latter argument identification results, which leads to error propagation problems. Yang et al.[
10] proposed an end-to-end model in which multiple events as well as event arguments information are extracted simultaneously from the document in a parallel manner using a multi-grain decoder after the overall document representation is obtained using multiple encoders. Based on some previous work, we can divide the DEE task into three subtasks: candidate entity extraction, event type detection and argument identification. Among them, candidate entity extraction is to extract entities related to events from the text; event type detection is to determine the types of events present in the text; and argument identification is to identify the arguments belonging to an event among the candidate entities. Candidate entity extraction, as the first subtask of DEE, affects the effectiveness of the two subsequent subtasks. Previous work has been devoted to solving the arguments scattering and multiple events problems, while ignoring the role overlapping that exist in the first subtask, which greatly affects the performance of the two subsequent subtasks as well as the overall DEE task. Role overlapping refer to the phenomenon of candidate entities playing multiple roles in the same event or in multiple different events. For example, in
Figure 1, the entity "12人" plays the role of "InjureDead" and "DeadPerson" in the event of "InjureDead"; the entity "重庆永川吊水洞煤矿" plays the role of "RescurePlace" in the "Rescure" event and "HappenPlace" role in the "Accident" event.
To cope with the above-mentioned problems of missing datasets and role overlapping, we have done the following two things. On the one hand, in order to cope with the lack of datasets, we define a framework for unexpected event extraction by analyzing and summarizing information of Chinese emergency events, and construct a Chinese document-level emergency event extraction dataset CDEEE. We defined 4 event types and 19 role types in this dataset and annotated each of the three problems of argument scattering, multiple events, and role overlapping. Finally, we annotated the CDEEE dataset consisting of 5,000 documents and 10,755 events. On the other hand, to cope with the role overlapping problem, we propose the DEE model RODEE for the role overlapping problem. In this model, we first use the pre-trained language model RoBERTa[
11] to embed the text representation and then encode it using Transformer to obtain the text representation, which gives us an overall understanding of the text. Specifically, we design two separate models to represent the start position information and end position information of the candidate entities, and use multiplicative attention to interact the two to obtain the scoring matrix, so as to predict the candidate entities and assist in the event extraction task.
Overall, our main contributions are in the following three areas:
We constructed a Chinese document-level emergency event extraction dataset CDEEE using manual annotation. In the annotation process of the dataset, we annotated the role overlapping problem in addition to the arguments scattering problem and the multiple events problem.
We propose RODEE, a DEE model for the role overlapping problem, which first uses two independent matrices to represent the start position information and end position information of candidate entities, and then uses multiplicative attention to obtain the score matrix for prediction of candidate entities with the role overlapping problem, and finally assists in the event extraction task.
We compare the RODEE model approach with the existing DEE model approach on the CDEEE dataset, and the experimental results show that the RODEE model approach outperforms the existing DEE model approach.
4. Proposed method
The model in this paper consists of three main parts: candidate entity extraction, event type detection and arguments identification as three sub-tasks. First, the text embedding and text representation are obtained by pre-trained language models RoBERTa and Transformer[
29], and then the head and tail position information of candidate entities are obtained using two independent modules, and the head and tail position information of candidate entities are interacted with each other with the help of multiplicative attention mechanism to obtain the score matrix for candidate entity prediction. Then, the candidate entity representation is fused with the sentence representation and the document representation is obtained for event type detection using Transformer. Finally, we decode the event and role information using a multi-granularity decoder for the recognition and prediction of argument elements. The structure of the proposed model is shown in
Figure 7.
DEE task can be described as extracting one or more structured events from an input document consisting of sentences, where is the number of events contained in the document. Each event extracted from the document contains event types and their associated roles, and we denote the set of all event types by and the set of all role types by . For the structured event information extracted from the document, we denote it by , where denotes the event type, and the role information corresponding to each event type and the arguments used to populate the role information are denoted by and , respectively.
4.1. Candidate entity extraction
Candidate entity extraction, as the first subtask of the DEE task, has a huge impact on the performance of the two subsequent subtasks of entity type detection and argument identification. However, in previous candidate entity extraction tasks, candidate entities are usually considered as flat entities and the candidate entity extraction task is accomplished using sequence annotation. Although this sequence labeling-based approach achieves better results in the flat entity extraction task, it ignores the role overlapping problem and cannot perform accurate extraction of candidate entities with multiple roles. And to solve this problem, we firstly use two different matrices to represent the head position information and tail position information of candidate entities respectively in this stage, and then use multiplicative attention to make the two interact to mine the deep information, obtain the score matrix and complete the candidate entity extraction according to the score matrix.
Specifically, given a document
, for each sentence
, we first perform an embedding representation
using the pre-trained language model RoBERTa, where
is the sentence length. Then, to obtain the textual representation, we encode the embedded representation of the text using the Transformer encoder, and eventually we can obtain the text representation
for each sentence
as follows:
where
,
,
is the hidden layer size.
Finally, in order to accurately extract the overlapping candidate entities of characters, we use two FNNs networks to generate two different matrices,
and
, respectively, to represent the head position information and tail position information of candidate entities, i.e., the contextual information of target characters. By using different matrices to represent the head position information and tail position information of the candidate entities and training them, the start position and end position of the candidate entities can be identified, respectively. Since the contexts of the start and end positions of the candidate entities are different, using different matrices to represent the head position information and tail position information of the candidate entities, respectively, greatly improves task accuracy compared with using the output of Transformer directly. On top of this, we do what is shown in
Figure 8, i.e., we use multiplicative attention to let the start position information and end position information interact to generate a score matrix
for candidate entity prediction, where
is the number of predefined role types plus one (i.e., non-predefined role types). We obtain the score matrix in the following way:
where
,
,
are hyperparameters indicating the window size in obtaining the contextual embedding of the target character, and we take the value of 64 here to indicate that 64 characters before and after the target character are obtained as the contextual representation of the target character, i.e., the start position or end position representation information,
,
indicates the vector representation of the start position information and end position information of the candidate entity with the role type
entity span
,
.
After obtaining the score matrix
, we perform candidate entity prediction according to the following equation:
where
,
. Finally, after the transformation, we obtain the prediction results as shown in
Figure 8 for the start position, end position, and role type of the candidate entities. For the candidate entities extracted from each sentence, we denote them by the triplet
, where
is the start position of the candidate entity,
is the end position of the candidate entity, and
is the role type of the candidate entity. For all the candidate entities extracted from the whole document, we denote them by the set
, and for each candidate entity, we denote them by the quadruplet
, where
denotes the sentence index of the candidate entity.
Regarding the loss function in this section, the cross-entropy loss function is used as follows:
4.2. Event type detection
Before performing event type detection, our model needs to understand the document as a whole, i.e., obtain document-level contextual encoding information. To obtain a holistic representation of the document, we use the Transformer encoder to allow all sentence information to interact with the candidate entity information. Specifically, to obtain comprehensive document encoding information, we first MaxPoolling the textual representation of candidate entities
and each sentence textual representation
so that the candidate entities textual representation has the same dimension as the sentence textual representation, and then facilitate the interaction between the two. Then, the sentence information and the candidate entity information are embedded separately in two aspects: on the one hand, the sentence position information is fused with the sentence text representation after the MaxPooling operation using the sentence position encoder. On the other hand, we use the entity type encoder to embed the role type
of the candidate entity into the text representation of the candidate entity after the same MaxPooling operation, in which we embed multiple role types separately for the candidate entity with multiple role types and fuse multiple embeddings. Finally, the completed embedded sentence text representation
and the candidate entity text representation
are fed to the Transformer encoder, and the whole document representation is obtained by the interaction between them. Specifically, as in the following equation:
where
is the sentence representation of the document representation,
is the candidate entity representation of the document representation, and
is the number of candidate entities extracted from the document.
After obtaining the overall document representation, we can use the sentence representation
of the document representation for event type detection. Specifically, we bifurcate each event type by performing the MaxPolling operation on
, i.e., for each event type, as follows:
where
denotes the probability that the i-th event is an event of type
.
Regarding the loss function in this section, the cross-entropy loss function is used as follows:
where
indicates the number of defined event types.
4.3. Argument identification
In this stage, we need to match and fill in the arguments and event roles for the existing events. Following Yang et al.[
10], we use a multi-granularity decoder to extract events in a parallel manner. This method consists of three parts: an event decoder, a role decoder, and an event-to-role decoder.
The event decoder is designed to support parallel extraction of all events and is used to model interactions between events. A learnable query matrix
is generated for event extraction, where
is a hyperparameter representing the number of events contained in the document. The event query matrix
is then fed into a non-autoregressive decoder composed of multiple identical stacks of Transformer layers. In each layer, there is a multi-head self-attention mechanism to simulate interactions between events, and a multi-head cross-attention mechanism to integrate the document-aware representation
into the event query
:
where
.
The role decoder is designed in a similar way to support parallel filling of all roles in the event and modeling interactions between roles. A learnable query matrix
is generated for event extraction, where
is the number of roles for the corresponding event. Then, the role query matrix
is fed into a decoder with the same architecture as the event decoder. Specifically, the self-attention mechanism can model relationships between roles, and the cross-attention mechanism can integrate candidate entity representations
from the document representation.
where
.
In order to generate different events and their associated roles, we designed an event-to-role decoder to simulate the interaction between event information and role information.
where
.
Finally, after decoding with a multi-granularity decoder, we transform
event queries and
role queries into
predicted events and their corresponding
predicted roles. To filter out false events, we assess whether each predicted event is non-empty. Specifically, predicted events can be obtained through the following approach:
where
is learnable matrix.
Afterwards, for each predicted event with predefined roles, we decode the predicted arguments by filling the candidate indices or null values with an
class classifier.
where
,
,
are learnable matrices,
.
So far, we have obtained predicted events, , and the candidate entities for each role corresponding to each event, . This completes the event extraction and the identification, matching, and filling of the corresponding arguments.
Regarding the loss function for this section, we first use the assignment problem in operations research[
30,31] to find the optimal assignment between the predicted events
and the ground truth events
:
where
is the permutation space with a length of
, and
is the pairwise matching cost between the ground truth data
and the predicted data
with the index
. By considering all the predicted cases of the roles in the event, we define
as follows:
where
indicates that the event is not empty. The optimal assignment
can be effectively calculated using the Hungarian algorithm[
30]. Then, based on all optimal assignments, we define a loss function with negative log-likelihood:
Finally, our overall loss function considers the candidate entity extraction loss
, the event type detection loss
, and the event argument recognition loss
, which involves filling the entity-role pairs, as shown below:
where
,
, and
are hyperparameters.
5. Experiments
5.1. Experimental Setting
We use our labeled Chinese document-level unexpected event extraction dataset CDEEE as our experimental data. Our dataset contains a total of 5000 documents, including four event types: "InjureDead", "Rescure", "Accident", and "NaturalHazard".
Regarding the evaluation metrics, this paper adopts the evaluation criteria used in the Doc2EDAG model. Specifically, for all golden events in each chapter, the predicted events with the same event type and the highest number of correct roles and thesis elements are found using a non-replayback approach, and the precision (P), recall (R), and F1 measure (F1 score) are calculated as the prediction results of the model. Since event types usually include multiple roles, the Micro-F1 value at the role level is calculated as the final metric. Information about the experimental setting and hyperparameters is described in detail in the
Appendix A.
5.2. Comparison experiment and result analysis
Given that our dataset follows the concept of triggerless word annotation, we utilize the following model as a baseline model for comparison experiments as well as a quality validation model for the dataset:
Doc2EDAG: An end-to-end model that converts DEE into a table-population task, directly populating event tables with entity-based paths for extensions.
GreedyDec: This model is a baseline model in the Doc2EDAG model that uses a greedy strategy to populate against an event table.
DE-PPN: This model uses multi-granularity decoders for parallel extraction of events to improve the speed of event extraction while effectively addressing the challenges of multiple events and argument scattering of document-level events.
Based on the experimental setup in
Section 5.1, we use a manual approach to extract event information and analyze it against the results of the baseline model on the CDEEE dataset on the one hand; on the other hand, we train our proposed model RODEE and analyze the results of RODEE against the baseline model experiments under the same experimental conditions.
Table 2 and
Table 3 show the results obtained by each model on the CDEEE dataset with respect to each event type and the overall experimental results. We can observe that the scores achieved by humans on the CDEEE dataset are much higher than those of the existing DEE models. On the one hand, this indicates the high quality of our labeled dataset, and on the other hand, it also indicates that there is still more room for improvement in the DEE task.
Considering that the existing DEE model methods use sequence annotation to complete the candidate entity prediction task in the candidate entity extraction stage and embed the role types of candidate entities to assist in the DEE task, our CDEEE dataset is annotated for the candidate entity role overlapping problem. Therefore, we modify the baseline model by removing the role type embedding module from the baseline model and naming the modified baseline model as Doc2EDAG*, GreedyDec*, and DE-PPN*.
As can be seen from
Table 2 and 3, in the CDEEE dataset labeled with the role overlapping problem, the model approach of embedding a single role type to assist in the DEE task has an overall lower performance compared to the DEE model approach without role type embedding. Therefore, we believe that the embedding of incomplete role type information not only does not contribute to the overall performance of the DEE task but also may have a negative impact on the performance of the DEE task to some extent.
We can also observe from
Table 2 and
Table 3 that the overall performance of our proposed RODEE model is better than the existing DEE task model, with an improvement of 7.8 percentage points in the accuracy P of our model compared to the best performing DE-PPN* model among the baseline models, and compared to the best performing Dco2EDAG* model among the baseline models in terms of F1 values, our model's F1 value improved by 3.9 points. In addition, we also observe that the recall R and F1 values for event “InjureDead” are lower than those of the Doc2EDAG* model, which we attribute to the fact that event “InjureDead” has only four role types and a low role overlapping rate, and thus the performance of our proposed model for the candidate entity role overlapping problem is slightly lower than that of the Doc2EDAG* model for this class of events.
To further analyze the performance of RODEE, we also conducted experiments on the candidate entity extraction subtask, and the results are shown in
Figure 9. As can be seen in
Figure 9, RODEE not only outperforms the other models on the DEE task, but also on the candidate entity extraction subtask. Compared with other models, RODEE improves at least 11 percentage points on the F1 of the candidate entity extraction subtask. We believe this is due to the fact that more deep information about the text is obtained to improve the performance of the model for candidate entity extraction. This demonstrates that the performance of candidate entity extraction, as the first subtask of the DEE task, has a significant impact on the overall DEE task, and the improvement of the candidate entity extraction subtask has a positive effect on the DEE task.
5.3. Ablation experiment
To verify the effectiveness of our work on model improvement, we conducted ablation experiments on some of the modules. First, we perform ablation experiments for the fusion of features of role information for candidate entities. Since we have previously removed the role information features of the candidate entities in each baseline model and obtained better results than the source model, we also need to verify our role information features. Therefore, we also need to verify whether our fusion of role information features has a positive or negative impact on our overall task. Second, since we use the pre-trained language model RoBERTa, we need to verify whether the overall task performance improvement is entirely due to the pre-trained language model. It is well known that pre-trained language models have a large performance improvement for all natural language processing tasks, and we use a pre-trained language model that can capture more textual information due to the need to obtain more fine-grained information when solving the problem of overlapping role information of meta-entities and use the RoBERTa pre-trained language model as an important part of our model. For various reasons, we also performed ablation experiments on this part, but instead of removing the pretrained language model to support our need for finer-grained information, we added the pretrained language model to the baseline model, DE-PPN.
In
Table 4, we present information about the model -RoleType after deleting the candidate entity role information features on our model and adding the same pre-trained language model +BERT as ours to the baseline model DE-PPN, and we can see that our model is still in an advantageous position compared to the above two models. Our model still has a 1.1-point F1 improvement compared to the model with the candidate role information features removed, thus demonstrating that incorporating the correct candidate role information is a boost to our DEE task and that incomplete or incorrect candidate role features pull down the overall performance of the document-level event extraction task, as shown in the experimental results in
Section 5.1. And by comparing our model with the baseline model with the addition of the pre-trained language model, we can find that our model still has a performance improvement of 3.2 F1 points. We can conclude that although the powerful performance of the pre-trained language model is leveraged in our model, our proposed model improvements for the candidate entity role overlapping problem still contribute significantly to the performance of the DEE task.