1. Introduction
Most natural language processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as abstract meaning representation (AMR) parsing [
1], semantic dependency parsing [
2], constituency parsing [
3], semantic role labeling [
4], machine translation [
5], and text summarization [
6]. An intuition to address this issue is first to decompose complex sentences and then re-link simple ones, which shares similar ideas with tasks like RST-style discourse parsing (hereafter RST parsing) [
7], split-and-rephrase (SPRP) [
8], text simplification (TS) [
9], simple-sentence-decomposition (SSD
1) [
10], etc.
However, previous works with such intuitions to process complex sentences are not practical for semantic parsing tasks, such as AMR [
11] and semantic dependency parsing [
12]. RST parsing, aiming to extract the rhetorical relations among elementary discourse units (EDUs) [
13] at a document level, is still an open problem, where the state-of-the-art model only achieves 55.4 and 80.4
Parseval-Full scores for multi- and intra-sentential parsing. Besides, the blurry definitions of EDUs and the misalignments between rhetorical relations and semantic relations make RST parsing unsuitable for semantic parsing. The SPRP task keeps a big splitting granularity, where outputs may still be complex sentences. The TS and SSD tasks, which decompose complex sentences into simple ones, can not preserve the original semantics for rephrasing for simpler syntax (TS) or dropping discourse connectives (SSD).
In this paper, we propose a novel task, hierarchical clause annotation (HCA), based on the linguistic research of clause hierarchy [
14], where clauses are fundamental text units centering on a verb phrase, and sentences with multiple clauses form a complex hierarchy. Our HCA is a more lightweight task at the sentence level, has explicit definitions of clauses and appropriate mappings between inter-clause and semantic relations (vs. RST parsing), and aims to annotate complex sentences into a clause hierarchy (vs. SPRP) without changing or dropping any semantics (vs. TS and SSD).
To show the potentialities of HCA to facilitate semantic parsing with complex sentences, we demonstrate the HCA tree, AMR graph, and semantic dependency graph (SDG) of a complex sentence from the AMR 2.0 dataset
2 in
Figure 1:
The above sentence is segmented into five clauses , where
and are coordinate and contrastive,
and are conditional adverbial and relative clauses of , respectively,
is a resultative adverbial clause of .
In the HCA tree, the coordinate relation
and clauses
are nodes, and subordinate relations are directed edges. As demonstrated in
Figure 1, the HCA tree shares the same hierarchy with two semantic parsing representations, indicating the possibility of incorporating HCA’s structural information into semantic parsing with complex sentences.
Our main contributions are as follows:
We propose a novel framework, hierarchical clause annotation (HCA), to segment complex sentences into clauses and capture their interrelations based on the linguistic research of clause hierarchy, aiming to provide clause-level structural features to facilitate semantic parsing tasks.
We elaborate on our experience developing a large HCA corpus – including determining an annotation framework, creating a silver-to-gold manual annotation tool, and ensuring annotating quality. The resulting HCA corpus contains English sentences from AMR 2.0, each including at least two clauses.
We decompose HCA into two subtasks, i.e., clause segmentation and parsing, and adapt discourse segmentation and parsing models for the HCA subtasks. Experimental results show that the adapted models achieve satisfactory performances in providing reliable silver HCA data.
3. Hierarchical Clause Annotation Framework
To present an framework for annotation the clause hierarchy of complex sentences, we reference and modify Payne’s version [
14] of Clause Hierarchy due to his pellucid and comprehensive definitions of clause combination cases. As demonstrated in
Figure 2, we do not consider
compound verbs as a clause combination, as these cases are uncommon and produce one-verb clauses after annotation.
With the above version of Clause Hierarchy, we synthesize the HCA framework and build a dataset under the guidance of the framework. The annotation work consists of a preprocessing stage with silver annotations transformed from existing schemas (constituent parsing and syntactic dependency parsing) and a manual proofreading phase with gold annotations on an elaborate browser-based annotator.
3.1. Concept
We list major concepts in the HCA framework.
3.1.1. Sentence and Clause
Sentences, typically starting with a capitalized word and ending with a complete stop, are principally units of written grammar and annotation inputs in HCA. A sentence must consist of at least one clause.
Clauses, considered core units of grammar, center around a verb phrase that largely determines what else must or may occur [
25]. Clauses can be categorized by the inner verb type:
Finite: clauses that contain tensed verbs;
Non-finite : clauses that only contain non-tensed verbs like ing-participles, ed-participles, and to-infinitives.
In the HCA framework, finite clauses should be annotated, while non-finite clauses that are separated by a comma are also segmented out.
3.1.2. Clause Combination
The main ways in which clauses combine to form sentences are by joining clauses of equal syntactic status (coordination) and subordinate relation (subordination).
- (1)
Coordination and Coordinator
Coordination is an interrelation between clauses that share the same syntactic status and are typically connected by a coordinator such as and, or, but, etc. In addition, coordinators can be correlative structures (e.g., either...or... and not only...but also...) or just substituted by comma punctuation.
- (2)
Subordination, Subordinator, and Antecedent
Subordination occurs in a subordinate clause and a matrix clause that is superordinate to the subordinate clause. Subordination can be cataloged as follows:
Nominative: Function as clausal arguments or noun phrases in the matrix clause and can be subdivided into Subjective, Objective, Predicative and Appositive.
Relative: Define or describe a preceding noun head in the matrix clause.
Adverbial: Function as a Condition, Concession, Reason, and such for the matrix clause.
Subordinators are the words that introduce a subordinate clause and indicate a semantic relation between the subordinate clause and its matrix clause, including subordinate conjunctions, relative pronouns, and relative adverbs. Simple subordinators contain a single word, e.g., that, wh-words, if, etc., while complex ones consist of more than one word, e.g., as if, so that, even though, etc.
Antecedents are nouns or pronouns modified before relative clauses and nouns explained before appositive clauses.
To better explain these HCA definitions, we demonstrate some example sentences which are segmented into multiple clauses in
Table 4.
3.2. HCA Representation
As illustrated in
Figure 3, we model the two basic hierarchical schemas with concepts defined in
Section 3.1, characterizing inter-clause relations with the same
nucleus-satellite pattern in RST. To be specific, coordination is a multinuclear relation that involves two or more clauses (denoted as
nucleus node
) dominated by the coordination node
, while subordination is a mononuclear relation (denoted as a directed edge
) pointing from the matrix clause
(
nucleus) to its subordinate clause
(
satellite).
As a sentence consists of more clauses, its HCA representation can be a tree structure, where each node is a clause or an inter-clause coordination, and each directed edge is an inter-clause subordination. A three-layer HCA tree of a complex sentence involving five clauses and four interrelations is demonstrated in
Figure 1a.
3.3. HCA corpus
With the annotation framework discussed above, we aim to build an HCA corpus for further research on the possibilities of applying clausal structure features to semantic parsing tasks. We choose the AMR 2.0 dataset as our corpus base, whose sentences are collected from the DARPA BOLT and DEFT programs, various newswire data, and weblog data.
The annotation work is conducted in two phases. First, two existing syntactic features, i.e., constituent and syntactic dependency parse trees, are employed to produce silver HCA annotations with transformation rules. Second, human annotators with prior English grammar research experience and extensive hands-on annotation training, review and modify silver annotations on a browser-based annotation tool.
3.3.1. Silver Data from Existing Schemas
Previous researchers[
26,
27,
28,
29] utilize constituent-based and syntactic dependency parse trees to extract clauses from sentences with some manual rules. Following the experience from these works, we employ Stanza [
30] as our constituent parser and syntactic dependency parser to obtain silver HCA data.
The constituency parse tree (CPT) represents the syntactic structure of a sentence using a tree, where the nodes are sub-phrases that belong to a specific category in the grammar, and the edges are unlabeled. The transformation from CPT to the silver HCA data consists of three phases:
Traverse non-leaf nodes in the CPT and find the clause-type nodes: S, SBAR, SBARQ, INV, and SQ.
Identify the tokens dominated by a clause-type node as a clause.
When a clause-type node dominates another one, an inter-clause relation between them is determined without an exact relation type.
As demonstrated in
Figure 4, the first two clauses of the sentence exemplified in
Section 1 are identified through their constituent parse tree. The
SBAR node and its child node
S are combined as a single clause, as no
VP is dominated by the other child constituent
IN of
SBAR. Moreover, the
S node on the top dominates the
SBAR node, indicating subordination between the two clauses in dashed boxes.
The syntactic dependency parse tree (SDPT) consists of a set of directed syntactic relations between the words in the sentence whose root is either a non-copular verb or the subject complement of a copular verb. The transformation from SDPT to silver HCA consists of three phases:
Use a mapping of dependency relations to clause constituents: subjects (S) and the governor, i.e., a non-copular verb(V), via relation nsubj and such; objects (O) and complements (C) in V’s dependants via relations dobj, iobj, xcomp, ccomp, and such; adverbials (A) in V’s dependents via relations advmod, advcl, prep_in, and such.
When detecting a verb
5 in the sentence, a corresponding clause, consisting of the verb and its dependant constituents, can be identified.
-
If a clause governs another clause via a dependency relation, the interrelation between them can be determined by the relation label:
Coordination: conj:and, conj:or, conj:but
Subjective: nsubj
Objective: dobj, iobj
Predicative: xcomp, ccmop
Appositive: appos
Relative: ccomp, acl:relcl, rcmod
Adverbial: advcl
As demonstrated in
Figure 5, the first two clauses of the sentence exemplified in
Section 1 are identified through their syntactic dependency tree. Moreover, the inter-clause relation can be inferred as
Adverbial: Conditional with the dependency relation
advcl and the subordinator “If”.
3.3.2. Gold Data from Manual Annotator
As discussed above, the syntactic structures of CPT and SDPT can be transformed into clauses and inter-clause relations. However, these silver annotations are still unable to fulfill the need to build an HCA corpus for the following reasons:
Specific inter-clause relations can not be obtained via the two syntactic structures, where CPT can only provide the existence of a relation without a label, and the dependency relations in SDPT have multiple mappings (e.g., ccomp to Predcative or Relative) or no mapping (e.g., advcl to no exact Adverbial subtype like Conditional).
Pre-set transformation rules identify more clauses out of the HCA definitions. For example, the extracted non-finite clauses (e.g., to-infinitives) embedded in its matrix clause are too short and lead to hierarchical redundancies in the HCA tree.
The performances of two syntactic parsers degrade when encountering long and complex sentences which are the core concerns of our HCA corpus.
Therefore, we recruit a group of human annotators with prior English grammar research experience to proofread these silver HCA on a browsed-based software
ClausAnn created for the annotation work. The Java Web application
ClausAnn provides convenient operations and efficient keyboard shortcuts for annotators, and we open source it on our GitHub repository
6. A typical annotation trial on
ClausAnn consists of the following steps:
Review annotations from CPT, SDPT, or other annotators by switching the name tags in
Figure 6a.
Choose an existing annotation to proofread or just start from the original sentence.
-
Segment a text span into two by double-clicking the blank space of a split point, and select the relation between them in
Figure 6b.
- (a)
If the two spans are coordinated, select a specific coordination and label coordinators that indicate the interrelation in
Figure 6c.
- (b)
If the two spans are subordinated, designate the superordinate one, select a specific subordination, and label subordinators that indicate the interrelation in
Figure 6d.
Remerge two text spans into one by clicking the two spans successively.
Repeat step c and d until all text spans are segmented into a set of clauses constructed in a right HCA tree.
3.3.3. Quality Assurance
There are mainly two steps taken jointly to ensure the quality of the final HCA corpus, i.e., multi-round annotation and consistency measurement.
The total annotation work consists of sentences, and three rounds of annotation are arranged by 5%, 5%, and 90% of total sentences and conducted with a progressive and negotiable strategy. Before the first round, every annotator thoroughly understands the HCA framework and uses the tool ClausAnn proficiently after adequate hands-on training. During the first two rounds, the lead annotator, who majors in English grammar, organizes a discussion on complex or abnormal cases with other annotators.
For consistency measurement, we tracked inter-annotator agreement (IAA) after each round of the annotation work. As discussed in
Section 2.1, the HCA and RST parsing tasks aim to extract the clause/EDU hierarchy from texts. Thus, the evaluation metrics of discourse segmentation [
15] and discours parsing [
16] subtasks in RST parsing are adopted as IAA metrics in evaluating the annotation quality of the HCA corpus:
P/R/ on clauses: precision, recall, and score on the segmented clauses, where a positive match means that both segmented clauses from two annotators have the same start and end boundaries.
RST-Parseval [
31] on interrelations: consisting of
Span,
Nuclearity,
Relation, and
Full are used to evaluate unlabeled, nuclearity-, relation-, and fully labeled interrelations respectively between the matched clauses from two annotators.
In the first two rounds of annotation, a total of 10% sentences are double-annotated, and the ratio comes to 16% in the last round, higher than 13.8% in the RST-DT corpus [
7]. According to the statistics, the IAA measured by the above two metrics grows as the annotation rounds increase, indicating that the two steps of multi-round annotation and consistency measurement play a significant role in ensuring annotation quality.
As shown in
Table 5, the final IAA achieves high consistencies, where
scores on clauses range from 98.4 to 100,
RST-Parseval scores on interrelations range from 97.3, 97.0, 93.6, 93.4 to 98.1, 97.8, 94.2, 94.1 in Span, Nuclearity, Relation, and Full, respectively.
Compared with the RST-DT corpus, whose IAA score on EDUs ranges from 95.1 to 100 and IAA scores on rhetorical relations with three metrics, Spans, Nuclearity, Relation, range from 77.8, 69.5, 59.7 to 92.9, 88.2, 79.2, our HCA corpus reaches better consistencies, as the HCA framework has more restrict definitions on the elementary unit (i.e., clauses) and fewer types of the interrelation.
3.3.4. Dataset Detail
The resulting HCA-AMR2.0 dataset is based on AMR 2.0, which contains
sentences, and
(49.4%) sentences are paired with an HCA tree, while the rest are simple sentences with only one clause. The train, dev, and test set split follows the original split in AMR 2.0. Detailed statistics are listed in
Table 6.
7. Conclusions
In this paper, we propose a novel framework, hierarchical clause annotation (HCA), to segment complex sentences into clauses and capture inter-clause relations with strict definitions from linguistic research of clause hierarchy. We aim to explore the potentialities of integrating HCA structural features in semantic parsing with complex sentences, avoiding the deficiencies of previous works such as RST-parsing, SRRP, TS, SSD, etc. Following the HCA framework, we build up a large HCA corpus comprising English sentences from AMR 2.0. The annotation consists of silver data transformed from the constituency and syntactic dependency parse trees, and gold data annotated by experienced human annotators using a newly-created tool, ClausAnn. Moreover, we decompose HCA into two subtasks, i.e., clause segmentation and clause parsing, and provide effective baseline models for both subtasks to generate more HCA data.
Figure 1.
Clauses in (a) correspond to subgraphs in (b) and (c), respectively. Colored directed edges in (a) are inter-clause relations, mapping the same colored AMR nodes and edges in (b) and semantic dependencies in (c). Note that reentrant AMR relations in (b) introduced by the pronoun “I” are omitted to save space, as well as semantic dependencies between orphan tokens and the root token “If” in (c).
Figure 1.
Clauses in (a) correspond to subgraphs in (b) and (c), respectively. Colored directed edges in (a) are inter-clause relations, mapping the same colored AMR nodes and edges in (b) and semantic dependencies in (c). Note that reentrant AMR relations in (b) introduced by the pronoun “I” are omitted to save space, as well as semantic dependencies between orphan tokens and the root token “If” in (c).
Figure 2.
Modified version of Payne’s Clause Hierarchy.
Figure 2.
Modified version of Payne’s Clause Hierarchy.
Figure 3.
Two basic hierarchical schemas in HCA, where node , node , and edge represent a clause, coordination and subordination, respectively.
Figure 3.
Two basic hierarchical schemas in HCA, where node , node , and edge represent a clause, coordination and subordination, respectively.
Figure 4.
Extract clauses and inter-clause relations via the constituency parse tree. Two clauses in dashed boxes are identified by underlined clause-type nodes S and its child node SBAR. Note that child constituent nodes of the left VP and the right ADJP are omitted to save space.
Figure 4.
Extract clauses and inter-clause relations via the constituency parse tree. Two clauses in dashed boxes are identified by underlined clause-type nodes S and its child node SBAR. Note that child constituent nodes of the left VP and the right ADJP are omitted to save space.
Figure 5.
Extract clauses and inter-clause relations via the syntactic dependency parse tree. Two clauses in dashed boxes are identified by the underlined verb and the governed constituents. The inter-clause relation Adverbial can be determined by the dependency advcl between the two clauses.
Figure 5.
Extract clauses and inter-clause relations via the syntactic dependency parse tree. Two clauses in dashed boxes are identified by the underlined verb and the governed constituents. The inter-clause relation Adverbial can be determined by the dependency advcl between the two clauses.
Figure 6.
Operating steps of an annotation trial in the browser-based tool, ClausAnn
Figure 6.
Operating steps of an annotation trial in the browser-based tool, ClausAnn
Figure 7.
Bottom-Up Clause Parsing.
Figure 7.
Bottom-Up Clause Parsing.
Figure 8.
Abstract Meaning Representation (AMR) Graph predicted by the state-of-the-art AMR parser. Red dotted relation edges, which are missed by the parser, can be recovered by transformation rules derived from the HCA tree. Red solid relation edges, which are mistakenly predicted by the parser, can be deleted by transformation rules derived from the HCA tree.
Figure 8.
Abstract Meaning Representation (AMR) Graph predicted by the state-of-the-art AMR parser. Red dotted relation edges, which are missed by the parser, can be recovered by transformation rules derived from the HCA tree. Red solid relation edges, which are mistakenly predicted by the parser, can be deleted by transformation rules derived from the HCA tree.
Figure 9.
Semantic dependency graph (SDG) predicted by the state-of-the-art semantic dependency parser, DynGL-SDP. Dotted red dependency edges, which are missed by the parser, can be recovered by transformation rules derived from the HCA tree.
Figure 9.
Semantic dependency graph (SDG) predicted by the state-of-the-art semantic dependency parser, DynGL-SDP. Dotted red dependency edges, which are missed by the parser, can be recovered by transformation rules derived from the HCA tree.
Table 1.
Comparison between our HCA task and the RST parsing task. Two exemplified sentences are from RST Discourse Tagging Reference Manual. Units (i.e., clauses or EDUs) are segmented by square brackets, and index . Relations between units are represented as arrows directed from a matrix clause or a nucleus EDU to a subordinate clause or a satellite EDU with a specific relation.
Table 1.
Comparison between our HCA task and the RST parsing task. Two exemplified sentences are from RST Discourse Tagging Reference Manual. Units (i.e., clauses or EDUs) are segmented by square brackets, and index . Relations between units are represented as arrows directed from a matrix clause or a nucleus EDU to a subordinate clause or a satellite EDU with a specific relation.
Task |
Output Descrpition |
Output Example |
Unit |
Relation |
Hierarchical Clause Annotation |
Clause trees built up by clauses and inter-clause relations |
(1) [But some big brokerage firms said [they don’t expect major problems as a result of margin calls.
|
|
(2) [Despite their considerable incomes and assets, one-fourth don’t feelthat they have made it.
|
|
RST Parsing |
Discourse trees built up by EDUs and rhetorical relations |
(1) [But some big brokerage firms said [they don’t expect major problems as a result of margin calls.
|
|
(2) [Despite their considerable incomes and assets,one-fourth don’t feel that they have made it.
|
|
Table 2.
Comparison between our HCA task and similar tasks that decompose complex sentences into parts. The input sentence “
If I do not check, I get very anxious, which does sort of go away after 15-30 mins, but the anxiety is so much that I can not wait that long.” is selected from the AMR 2.0 dataset and exemplified in
Section 1.
Underlined words in the
Output Example column of each task are modified from the original sentence, while crossed words are deleted from the original sentence.
Table 2.
Comparison between our HCA task and similar tasks that decompose complex sentences into parts. The input sentence “
If I do not check, I get very anxious, which does sort of go away after 15-30 mins, but the anxiety is so much that I can not wait that long.” is selected from the AMR 2.0 dataset and exemplified in
Section 1.
Underlined words in the
Output Example column of each task are modified from the original sentence, while crossed words are deleted from the original sentence.
Task |
Output Descrpition |
Output Example |
Hierarchical
Clause
Annotation |
Finite clauses and
non-finite clauses
seperated by
a comma |
(1) If I do not check,
(2) I get very anxious,
(3) which does sort of go away after 15-30 mins,
(4) but often the anxiety is so much
(5) that I can not wait that long. |
Clause
Indentification |
Finite clauses, non-
tensed verb phrases,
coordinators, and
subordinators |
(1) If (2) I do not check, (3) I get very anxious,
(4) which (5) does sort of go away after 15-30 mins,
(6) but (7) often the anxiety is so much
(8) that (9) I can not wait that long. |
Split-and
-Rephrase |
Shorter sentences |
(1) If I do not check, I get very anxious.
(2) The anxieties does sort of go away after 15-30 mins.
(3) But often the anxiety is so much that I can not wait
that long. |
Text
Simplification |
Sentences with
simpler syntax |
(1) If I do not check.
(2) I get very anxious
(3) The anxiety lasts for 15-30 mins.
(3) But I am often too anxious to wait that long. |
Simple
-Sentence
-Decomposition |
Simple sentences
with only one clause |
(1) If I do not check.
(2) I get very anxious,
(3) The anxiety does sort of go away after 15-30 mins.
(4) But Often the anxiety is so much.
(5) that I can not wait that long. |
Table 3.
Three main types of clause hierarchy and the clines of their clause integration tightness degree.
Table 3.
Three main types of clause hierarchy and the clines of their clause integration tightness degree.
Type |
Cline of Clause Integration Tightness Degree |
Matthiessen [21] |
Embedded > Hypotaxis > Parataxis > Cohesive devices > Coherence |
Hopper& Traugott [22] |
Subordination > Hypotaxis > Parataxis |
Payne [14] |
Compound verb > Clausal argument > Relative > Adverbial > Coordinate > Sentence |
Table 4.
Examples of sentences with different types of inter-clause relations. Clauses are segmented by square brackets and clause marks . The underlined, double-underlined, and wave-underlined words are coordinators, subordinators, and antecedents, respectively.
Table 4.
Examples of sentences with different types of inter-clause relations. Clauses are segmented by square brackets and clause marks . The underlined, double-underlined, and wave-underlined words are coordinators, subordinators, and antecedents, respectively.
Table 5.
Inter-annotation agreement (IAA) of 16% double-annotated sentences in the HCA corpus by ten annotators marked as 1 to 10. Note that bold and underlined figures indict the highest and lowest consistencies in the corresponding metrics, respectively.
Table 5.
Inter-annotation agreement (IAA) of 16% double-annotated sentences in the HCA corpus by ten annotators marked as 1 to 10. Note that bold and underlined figures indict the highest and lowest consistencies in the corresponding metrics, respectively.
Annotator |
Clause |
Interrelation |
P |
R |
|
Span |
Nuc. |
Rel. |
Full |
1, 2 |
99.9 |
99.8 |
99.8 |
97.9 |
97.5 |
94.3 |
94.1 |
1, 3 |
100 |
100 |
100 |
98.1 |
97.8 |
94.2 |
94.1 |
2, 4 |
99.8 |
99.5 |
99.6 |
97.5 |
97.3 |
93.9 |
93.8 |
1, 5 |
99.0 |
98.3 |
98.6 |
98.0 |
97.7 |
94.2 |
94.0 |
6, 7 |
99.6 |
99.1 |
99.3 |
97.3 |
97.0 |
93.8 |
93.6 |
4, 8 |
99.4 |
99.3 |
99.3 |
97.9 |
93.9 |
93.7 |
93.5 |
1, 9 |
99.0 |
98.6 |
98.8 |
97.2 |
97.0 |
93.9 |
93.8 |
5, 10 |
98.8 |
98.0 |
98.4 |
97.3 |
97.1 |
93.6 |
93.4 |
Table 6.
Main statistics of the Hierarchical Clause Annotation dataset based on AMR 2.0 (HCA-AMR2.0). * means that some input sequences contain multiple sentences, and the coordination is necessary for these inter-sentence relations in these cases. * indicates that can be divided into nine sub-types like , , and .
Table 6.
Main statistics of the Hierarchical Clause Annotation dataset based on AMR 2.0 (HCA-AMR2.0). * means that some input sequences contain multiple sentences, and the coordination is necessary for these inter-sentence relations in these cases. * indicates that can be divided into nine sub-types like , , and .
Item |
Occurence |
Relation |
Occurence |
# Sentences (S) |
|
MulSnt* |
|
# S with HCA |
|
And |
|
# S in Train Set |
|
Or |
974 |
# S in Dev Set |
740 |
But |
|
# S in Test Set |
751 |
Subjective |
992 |
# Tokens (T) |
521,805 |
Objective |
|
# Clauses (C) |
57,330 |
|
|
# Avg. T/S |
26.9 |
Appositive |
667 |
# Avg. T/C |
9.1 |
Relative |
|
# Avg. C/S |
3.1 |
Adverbial★
|
|
Table 7.
Main statistics of HCA-AMR2.0, GUM, STAC, and RST-DT dataset. Note that Unit in the header represents clause or elementary discourse unit (EDU), Rel. represents inter-clause/EDU relation, #Avg. U/S means the average number of units per sentence, and #Avg. R/U means the average number of inter-clause/EDU relations per unit.
Table 7.
Main statistics of HCA-AMR2.0, GUM, STAC, and RST-DT dataset. Note that Unit in the header represents clause or elementary discourse unit (EDU), Rel. represents inter-clause/EDU relation, #Avg. U/S means the average number of units per sentence, and #Avg. R/U means the average number of inter-clause/EDU relations per unit.
Dataset |
#Unit (U) / #Sentence (S) |
#Avg. U/S |
#Rel. |
#Rel. Type |
#Avg. Rel./U |
Train |
Dev |
Test |
HCA- AMR2.0 |
52,758/17,885 |
2,222/740 |
2,350/751 |
3.1 |
49,160 |
18 |
0.86 |
GUM |
14,766/6,346 |
2,219/976 |
2,283/999 |
2.3 |
- |
- |
STAC |
9,887/8,754 |
1,154/991 |
1,547/1,342 |
1.1 |
- |
- |
- |
RST-DT |
17,646/6,671 |
1,797/716 |
2,346/928 |
2.6 |
19,778 |
18 |
0.91 |
Table 8.
Hardware and software used in our experiments.
Table 8.
Hardware and software used in our experiments.
Environment |
Value |
Hardware |
CPU |
Intel i9-10900K @ 3.7 GHz (10-core) |
GPU |
NVIDIA RTX 3090Ti (24G) |
Memory |
64 GB |
Software |
Python |
3.8.16 |
Pytorch |
1.12.1 |
Anaconda |
4.10.1 |
CUDA |
11.3 |
IDE |
PyCharm 2022.2.3 |
Table 9.
Final hyper-parameters configuration of clause segmentation model.
Table 9.
Final hyper-parameters configuration of clause segmentation model.
Layer |
Hyper-Parameter |
Value |
Clause Segmentation Model |
Character Embedding (Bi-LSTM) |
layer |
1 |
hidden_size |
64 |
dropout |
0.2 |
Word Embedding |
fastText |
300 |
Electra |
1024 (large) |
Feature Embedding |
POS/Lemma/DP |
100 |
Bi-LSTM |
layer |
1 |
hidden_size |
512 |
dropout |
0.1 |
Trainer |
optimizer |
AdamW |
learning rate |
5e-4, 1e-4 |
# epochs |
60 |
patience of early stopping |
10 |
validation criteria |
+span_f1 |
Clause Segmentation Model |
Word Embbeding |
pretrained language model |
768/1024(base/large) |
FFN |
hidden_size |
512 |
dropout |
0.2 |
Trainer |
optimizer |
AdamW |
learning rate |
2e-4, 1e-5 |
weight decay |
0.01 |
batch size (# spans/actions) |
5 |
# epochs |
20 |
patience of early stopping |
5 |
gradient clipping |
1.0 |
validation criteria |
RST-Parseval-Full
|
Table 10.
Performaces of the adapted DisCoDisCo model on HCA-AMR2.0 for clause segmentation, and performances of DisCoDisCo and GumDrop on three datasets for the contrastive task, discourse segmentation. Note that * and ∘ indicate gold annotated features from the corresponding dataset and silver features annotated by Stanza, respectively. Bold numbers are the best scores in each dataset. All the experiments on the clause segmentation task are conducted for five runs with different seeds, and the experimental results are averaged.
Table 10.
Performaces of the adapted DisCoDisCo model on HCA-AMR2.0 for clause segmentation, and performances of DisCoDisCo and GumDrop on three datasets for the contrastive task, discourse segmentation. Note that * and ∘ indicate gold annotated features from the corresponding dataset and silver features annotated by Stanza, respectively. Bold numbers are the best scores in each dataset. All the experiments on the clause segmentation task are conducted for five runs with different seeds, and the experimental results are averaged.
Task |
Dataset |
Model |
P |
R |
|
Discourse Segmentation |
GUM |
GumDrop[37] |
96.5 |
90.8 |
93.5 |
- all feats. *
|
97.7 |
87.4 |
92.3 |
DisCoDisCo[15] |
93.9 |
94.4 |
94.2 |
- all feats. *
|
92.7 |
92.6 |
92.6 |
STAC |
GumDrop[37] |
95.3 |
95.4 |
95.3 |
- all feats.*
|
85.0 |
76.7 |
80.6 |
DisCoDisCo[15] |
96.3 |
93.6 |
94.9 |
- all feats. *
|
91.8 |
92.1 |
91.9 |
RST-DT |
GumDrop[37] |
94.9 |
96.5 |
95.7 |
- all feats.*
|
96.3 |
94.6 |
95.4 |
DisCoDisCo[15] |
96.4 |
96.9 |
96.6 |
- all feats. *
|
96.8 |
95.9 |
96.4 |
Clause Segmentation |
HCA-AMR2.0 |
DisCoDisCo |
92.9 |
89.7 |
91.3 |
- lem. *
|
86.8 |
93.9 |
90.2 |
- dp.∘
|
91.0 |
87.7 |
89.3 |
- pos∘
|
91.4 |
87.6 |
89.4 |
- all feats.∘
|
89.2 |
85.4 |
87.2 |
- fastText
|
90.5 |
82.7 |
86.4 |
Table 11.
Performances of the top-down and bottom-up parsers with various pretrained language models (PLMs) for clause parsing and discourse parsing tasks, which are evaluated by four RST-Parseval metrics, i.e., Span, Nuclearity (Nuc.), Relation (Rel.), and Full. Standard deviations for three runs are shown in parentheses. Bold numbers are the best scores in each task with each model. Note that we only conduct experiments on HCA-AMR2.0 and RST-DT datasets for clause parsing and discourse parsing, respectively.
Table 11.
Performances of the top-down and bottom-up parsers with various pretrained language models (PLMs) for clause parsing and discourse parsing tasks, which are evaluated by four RST-Parseval metrics, i.e., Span, Nuclearity (Nuc.), Relation (Rel.), and Full. Standard deviations for three runs are shown in parentheses. Bold numbers are the best scores in each task with each model. Note that we only conduct experiments on HCA-AMR2.0 and RST-DT datasets for clause parsing and discourse parsing, respectively.
Model |
PLM |
Discourse Parsing |
Clause Parsing |
Span |
Nuc. |
Rel. |
Full |
Span |
Nuc. |
Rel. |
Full |
Top-Down |
BERT |
92.6±0.53 |
85.7±0.41 |
75.4±0.45 |
74.7±0.54 |
96.0±0.20 |
92.1±0.27 |
85.8±0.51 |
85.7±0.47 |
RoBERTa |
94.1±0.46 |
88.4±0.46 |
79.6±0.17 |
78.7±0.11 |
96.5±0.02 |
93.1±0.09 |
87.1±0.23 |
87.1±0.22 |
SpanBERT |
94.1±0.15 |
88.8±0.19 |
79.4±0.49 |
78.5±0.39 |
96.4±0.13 |
92.6±0.12 |
86.1±0.15 |
85.9±0.15 |
XLNet |
94.8±0.39 |
89.5±0.39 |
80.5±0.59 |
79.5±0.53 |
96.6±0.07 |
93.5±0.20 |
87.0±0.57 |
86.9±0.56 |
DeBERTa |
94.2±0.33 |
89.0±0.16 |
80.1±0.43 |
79.1±0.32 |
96.6±0.02 |
93.3±0.07 |
87.2±0.04 |
87.2±0.10 |
Bottom-Up |
BERT |
91.9±0.34 |
84.4±0.31 |
74.4±0.37 |
73.8±0.30 |
96.3±0.22 |
92.3±0.36 |
86.1±0.37 |
86.0±0.32 |
RoBERTa |
94.4±0.12 |
89.0±0.34 |
80.4±0.47 |
79.7±0.51 |
96.4±0.16 |
92.8±0.20 |
86.6±0.56 |
86.6±0.58 |
SpanBERT |
93.9±0.24 |
88.2±0.19 |
79.3±0.37 |
78.4±0.29 |
96.5±0.09 |
92.6±0.06 |
86.4±0.20 |
86.2±0.22 |
XLNet |
94.7±0.31 |
89.4±0.24 |
81.2±0.27 |
80.4±0.34 |
96.9±0.10 |
93.6±0.09 |
87.4±0.33 |
87.3±0.31 |
DeBERTa |
94.6±0.38 |
89.8±0.65 |
81.0±0.64 |
80.2±0.70 |
97.0±0.10 |
94.0±0.17 |
87.8±0.39 |
87.7±0.38 |
Table 12.
Clause Parsing Results with large versions of pretrained language models (PLMs), XLNet, and DeBERTa (RST-Parseval). † indicates PLMs with a large version. Standard deviations for three runs are shown in parentheses. Bold numbers are the best scores in each model.
Table 12.
Clause Parsing Results with large versions of pretrained language models (PLMs), XLNet, and DeBERTa (RST-Parseval). † indicates PLMs with a large version. Standard deviations for three runs are shown in parentheses. Bold numbers are the best scores in each model.
Model |
PLM |
Span |
Nuc. |
Rel. |
Full |
Top-Down |
XLNet |
96.6±0.07 |
93.5±0.20 |
87.0±0.57 |
86.9±0.56 |
XLNet†
|
96.7±0.27 |
93.6±0.33 |
87.6±0.36 |
87.6±0.37 |
DeBERTa |
96.6±0.02 |
93.3±0.07 |
87.2±0.04 |
87.2±0.10 |
DeBERTa†
|
97.0±0.14 |
94.0±0.29 |
87.6±0.69 |
87.6±0.61 |
Bottom-Up |
XLNet |
96.9±0.10 |
93.6±0.09 |
87.4±0.33 |
87.3±0.31 |
XLNet†
|
97.0±0.34 |
93.7±0.46 |
87.6±0.67 |
87.6±0.67 |
DeBERTa |
97.0±0.10 |
94.0±0.17 |
87.8±0.39 |
87.7±0.38 |
DeBERTa†
|
97.4±0.08 |
94.5±0.13 |
88.6±0.27 |
88.5±0.28 |