Event Narrative Module, version 3 Deliverable D5.1.3

Transcription

Event Narrative Module, version 3
Deliverable D5.1.3
Version FINAL
Authors: Piek Vossen1 , Tommaso Caselli1 , Agata Cybulska1 , Antske Fokkens1 ,
Filip Ilievski1 , Anne-Lyse Minard2 , Paramita Mirza2 , Itziar Aldabe3 ,
Egoitz Laparra3 , German Rigau3
Affiliation: (1) VUA, (2) FBK, (3) UPV/EHU
Building structured event indexes of large volumes of financial and economic
data for decision making
ICT 316404
2/148
Grant Agreement No.
Project Acronym
Project Full Title
316404
NEWSREADER
Building structured event indexes of
large volumes of financial and economic
data for decision making.
Funding Scheme
FP7-ICT-2011-8
Project Website
http://www.newsreader-project.eu/
Prof. dr. Piek T.J.M. Vossen
VU University Amsterdam
Tel. + 31 (0) 20 5986466
Project Coordinator
Fax. + 31 (0) 20 5986500
Email: [email protected]
Document Number
Deliverable D5.1.3
Status & Version
FINAL
Contractual Date of Delivery
October 2015
Actual Date of Delivery
January 2016
Type
Report
Security (distribution level)
Public
Number of Pages
148
WP Contributing to the Deliverable
WP05
WP Responsible
VUA
EC Project Officer
Susan Fraser
1
1
Authors: Piek Vossen , Tommaso Caselli , Agata Cybulska1 , Antske Fokkens1 ,
Filip Ilievski1 , Anne-Lyse Minard2 , Paramita Mirza2 , Itziar Aldabe3 , Egoitz
Laparra3 , German Rigau3
Keywords: Event detection, event-coreference, event components, NAF, RDF,
SEM, GAF, event relations, timelines, storylines, attribution and perspective, crosslingual event extraction
Abstract: This deliverable describes the final version of the modules that convert the Natural Language
Processing output in NAF to the Semantic Web interpretation in SEM-RDF. We describe the way we represent
instances in SEM-RDF, relations between these instances and the GAF pointers to the mentions in the text. Since
instances can have many different mentions in different sources, we define identity criteria for mentions. The resulting
SEM-RDF representations are loaded into the KnowledgeStore as triples that can be queried through SPARQL.
In addition to the data on individual events, we also extract causal and temporal relations between events. Time
anchoring of events and the relations are used to create timelines of events. Timelines are then used to derive
storylines. In addition to the event data, we also derive the perspective of the sources of information with respect
to the data. This is represented in a separate RDF structure that models provenance and attribution. Finally, we
describe how the system extract semantic data across different languages.
NewsReader: ICT-316404
February 1, 2016
3/148
Table of Revisions
Version
Date
Description and reason
By
1.1
1 October 2015
First structure
1.2
5 October 2015
Streaming architecture
1.3
11 October 2015
Event relations
1.4
13 October 2015
Perspectives
1.5
13 October 2015
Storylines
1.6
21 October 2015
RDF evaluation and clean up
1.7
21 October 2015
Review
1.8
5 October 2015
Streaming architecture revision
1.9
21 October 2015
Revision all
2.0
January 2015
Event coreference
2.1
January 2015
Event coreference
2.2
January 2016
Cross-lingual extraction
2.2
January 2016
Reviewed by
2.3
January 2016
Revised after review
2.4
29 January 2016
Check by coordinator
Piek
Vossen,
VUA
Filip
Ilievski,
VUA
Anne-Lyse Minard, FBK
Antske
Fokkens, VUA
Tommaso
Caselli, VUA
Piek
Vossen,
VUA
Marieke
van
Erp, VUA
Filip
Ilievski,
VUA
Piek
Vossen,
VUA
Piek
Vossen,
VUA
Agata Cybulska, VUA
Piek
Vossen,
VUA
Egoitz Laparra,
EHU
Piek
Vossen,
VUA
VUA
Affected
tions
all
2
4
6
5
2
all
2
all
3
3
7
all
all
-
February 1, 2016
sec-
4/148
February 1, 2016
5/148
Executive Summary
This deliverable describes the final version of the modules that convert the Natural Language Processing output in NAF to the Semantic Web interpretation in SEM-RDF. We
describe the way we represent instances in SEM-RDF, relations between these instances
and the GAF pointers to the mentions in the text. Since instances can have many different
mentions in different sources, we define identity criteria for mentions. The resulting SEMRDF representations are loaded into the KnowledgeStore as triples that can be queried
through SPARQL. Two different approaches have been defined for processing NAF files:
1) assuming an empty KnowledgeStore and a batch of NAF files, all NAF files are processed and compared after which the KnowledgeStore is populated with the RDF and 2)
assuming a streaming set-up in which each NAF file is processed one-by-one and the result
is compared with the data that is already in the KnowledgeStore. The data is event-centric,
which means we provide only the data relevant for the events detected. Every event is assumed to be anchored to a date or period in time and have at least one participant. Next
to the data on individual events, we also extract causal and temporal relations between
events. Time anchoring of events and the relations are used to create timelines of events.
Timelines are then used to derive storylines. In addition to the event data, we also derive
the perspective of the sources of information with respect to the data. This is represented in
a separate RDF structure that models provenance and attribution. Finally, since the RDF
representation is agnostic for the expression in language and our NLP modules for English,
Spanish, Dutch and Italian are interoperable through NAF, the NAF2SEM module can
extract the same RDF representation across different languages. This provides a unique
opportunity for comparing the semantic processing of text across languages. We describe
the cross-lingual processing and the comparison across the language-system output.
In comparison with the previous deliverable D5.1.2 almost all sections changed. Minor
changes were made for sections 4, 5 and 6. Other sections were changed drastically.
February 1, 2016
6/148
February 1, 2016
7/148
Contents
Table of Revisions
3
1 Introduction
13
2 Interpreting NAF-XML as SEM-RDF
2.1 Extracting instances from NAF layers . . . .
2.1.1 Entities and non-entities . . . . . . .
2.1.2 Events . . . . . . . . . . . . . . . . .
2.1.3 Participant and event relations . . .
2.1.4 Temporal anchoring . . . . . . . . . .
2.2 Identity across events . . . . . . . . . . . . .
2.2.1 Event comparison in batch mode . .
2.2.2 Event comparison in streaming mode
2.3 Evaluation . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
16
19
19
23
25
28
32
35
39
44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
47
47
48
49
51
51
52
52
54
54
57
58
61
67
77
.
.
.
.
.
.
.
.
79
79
79
80
83
83
84
87
87
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Event Coreference
3.1 Bag of Events Approach . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 The Overall Approach . . . . . . . . . . . . . . . . . . . . .
3.1.2 Two-step Bag of Events Approach . . . . . . . . . . . . . . .
3.1.3 Step 1: Clustering Documents Using Bag of Events Features
3.1.4 Step 2: Clustering Sentence Templates . . . . . . . . . . . .
3.1.5 One-step Bag of Events Approach . . . . . . . . . . . . . . .
3.1.6 Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.7 Experimental Set Up . . . . . . . . . . . . . . . . . . . . . .
3.1.8 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Evaluation of the NewsReader pipeline . . . . . . . . . . . . . . . .
3.2.1 NewsReader output . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Maximizing the event detection . . . . . . . . . . . . . . . .
3.2.3 Conclusion NewsReader cross-document event coreference .
4 Event Relations
4.1 Temporal Relations . . . . . . . . . .
4.1.1 Annotation Schema . . . . . .
4.1.2 Temporal Relation Extraction
4.2 Causal Relation . . . . . . . . . . . .
4.2.1 Annotation Scheme . . . . . .
4.2.2 Causal Relation Extraction .
4.3 Predicate Time Anchors . . . . . . .
4.3.1 Annotation Scheme . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
February 1, 2016
4.3.2
8/148
Predicate Time Anchor Relation Extraction . . . . . . . . . . . . .
5 From TimeLines to StoryLines
5.1 TimeLine extraction . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 TimeLines: task description . . . . . . . . . . . . . . .
5.1.2 System Description and Evaluation . . . . . . . . . . .
5.1.3 Document level time-anchoring for TimeLine extraction
5.2 Storylines . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 StoryLines aggregated from entity-centered TimeLines
5.2.2 Storylines aggregated from climax events . . . . . . . .
5.2.3 Workshop on Computing News Storylines . . . . . . .
6 Perspectives
6.1 Basic Perspective Module . .
6.2 Factuality module . . . . . . .
6.2.1 Event factuality . . . .
6.2.2 Identifying factualities
6.2.3 Factuality module . . .
6.2.4 Future work . . . . . .
6.3 A perspective model . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
88
.
.
.
.
.
.
.
.
89
. 90
. 90
. 92
. 93
. 97
. 98
. 101
. 107
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
108
108
113
113
114
116
117
118
7 Cross-lingual extraction
7.1 Crosslingual extraction of entities .
7.2 Crosslingual extraction of events . .
7.3 Crosslingual extraction of relations
7.4 Conclusions . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
121
127
130
134
137
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8 Conclusions
138
9 Appendix
139
February 1, 2016
9/148
List of Tables
1
Cross-document event coreference arguments for stream processing . . . . .
41
2
Quality triple evaluation of SEM-RDF extracted from Wikinews. . . . . . .
45
3
Detailed quality triple evaluation of SEM-RDF extracted from Wikinews
with and without taking even-coreference into account. . . . . . . . . . . .
46
4
Sentence template ECB topic 1, text 7, sentence 1 . . . . . . . . . . . . . .
49
5
Sentence template ECB topic 1, text 7, sentence 2 . . . . . . . . . . . . . .
49
6
Document template ECB topic 1, text 7, sentences 1-2 . . . . . . . . . . .
49
7
ECB+ statistics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
8
Features grouped into four categories: L-Lemma based, A-Action similarity,
D-location within Discourse, E-Entity coreference and S-Synset based. . . .
53
9
Bag of events approach to event coreference resolution, evaluated on the
ECB+ in MUC, B3, mention-based CEAF, BLANC and CoNLL F measures. 55
10
Baseline results on the ECB+: singleton baseline and lemma match of event
triggers evaluated in MUC, B3, mention-based CEAF, BLANC and CoNLL
F. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
11
Best scoring two-step bag of events approach, evaluated in MUC, B3, entitybased CEAF, BLANC and CoNLL F in comparison with related studies.
Note that the BOE approach uses gold while related studies system mentions. 56
12
BLANC refererence results macro averaged over ECB+ topics in terms of
recall (R), precision (P) and F1 (F) for NewsReader output with different
proportions of WordNet synsets to match: S=only synset matches, SL=
Synsets matches if synsets and lemma matches if no synsets associated,
L=lemmas only. Different columns represent proportions in steps of 10%
from 1% to 100%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
recall (R-SL30), precision (P-SL30) and F1 (F-SL30). AR is stable across
the results, meaning that a single participant in any role needs to match.
We varied the hypernyms (H) and lowest-common-subsumer (L) for action
matches and the time constraints: no time constraint (NT), year (Y), month
(M) and day (D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
recall (R-SL30), precision (P-SL30) and F1 (F-SL30). The hypernyms (H),
lowest-common-subsumer (L) and time constraint month (M) are kept stable. We varied the role-participant constraints: NR=no constraint, A0 role
participant should match, A1 should match, A2 should match, A0 and A1
should match, A0 and A2 should match, A1 and A2 should match . . . . .
66
13
14
February 1, 2016
15
16
17
18
19
20
21
22
23
24
10/148
Macro averaged Mention identification for ECB+ topics. NWR=NewsReader
pipeline v3.0 without adaptation, EDg(old)=NWR augmented with EventDetection trained with gold data, EDg(old)EC= same as EDg(old) but skipping predicates with an Event class, EDs(ilver)= NWR augmented with
EventDetection trained with silver data, EDs(ilver)EC= same as EDs but
skipping predicates with an Event class. . . . . . . . . . . . . . . . . . . .
Predicates missed more than once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ . . . . . . . . . .
Predicates missed once by NewsReader extended with EventDetection (silver) and Event class filter as events in ECB+ . . . . . . . . . . . . . . . .
Predicates invented and occurring more than once by NewsReader extended
with EventDetection (silver) and Event class filter as events in ECB+ . . .
Predicates invented and occurring only once by NewsReader extended with
EventDetection (silver) and Event class filter as events in ECB+ . . . . . .
Reference results macro averaged over ECB+ topics with different options
for event detection. kNWR=NewsReader event detection without invented
mentions, maximizing precision, NWR=NewsReader pipeline v3.0 without
adaptation, EDg=NWR augmented with EventDetection trained with gold
data, EDgEC= same as EDg but skipping predicates with an Event class,
EDs= NWR augmented with EventDetection trained with silver data, EDsEC= same as EDs but skipping predicates with an Event class. ARM=
standard setting one participant in any role (AR), time month match and
action concept and phrase match 30%, mR= maximizing recall by no constraints on participant match and time, action concept and phrase match
1%, mP=maximizing precision by participant roles A0A1, time day match
and action concept and phrase match is set to 100%. . . . . . . . . . . . .
Reference results macro averaged over ECB+ corpus with different options
for event detection. kNWR=NewsReader event detection without invented
mentions, maximizing precision, NWR=NewsReader pipeline v3.0 without
adaptation, EDg=NWR augmented with EventDetection trained with gold
data, EDgEC= same as EDg but skipping predicates with an Event class,
EDs= NWR augmented with EventDetection trained with silver data, EDsEC= same as EDs but skipping predicates with an Event class. ARM=
standard setting one participant in any role (AR), time month match and
action concept and phrase match 30%, mR= maximizing recall by no constraints on participant match and time, action concept and phrase match
1%, mP=maximizing precision by participant roles A0A1, time day match
and action concept and phrase match is set to 100%. . . . . . . . . . . . .
Reference results macro averaged over ECB+ corpus as reported by Yang
et al. (2015) for state-of-the-art machine learning systems . . . . . . . . . .
Distribution of tell, kill and election over all text and annotated text per
mention, document and topic in ECB+ . . . . . . . . . . . . . . . . . . . .
Temporal relations in TimeML annotation . . . . . . . . . . . . . . . . . .
67
69
70
71
72
73
75
75
76
79
February 1, 2016
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
11/148
Tempeval-3 evaluation on temporal relation classification . . . . . . . . . . 82
CLINK extraction system’s performance. . . . . . . . . . . . . . . . . . . . 87
System Results (micro F1 score) for the SemEval 2015 Task 4 Task A - Main 92
System Results (micro F1 score) for the SemEval 2015 Task 4 Task A Subtask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Results on the SemEval-2015 task . . . . . . . . . . . . . . . . . . . . . . . 97
Figures of the StoryLine gold dataset. . . . . . . . . . . . . . . . . . . . . . 99
Results of the StoryLine extraction process. . . . . . . . . . . . . . . . . . 101
Certainty, polarity and tense values . . . . . . . . . . . . . . . . . . . . . . 114
DBpedia entities extracted for English, Spanish, Italian and Dutch Wikinews
with proportion of coverage, measured as macro and micro coverage. I=instances,
M=mentions, O=overlap, maC=macro-average over all document results,
miC=microAverage over all mentions . . . . . . . . . . . . . . . . . . . . . 130
DBpedia entities in the Wikinews Airbus corpus most frequent in English
with Spanish, Italian and Dutch frequencies . . . . . . . . . . . . . . . . . 132
DBpedia entities in the Wikinews Apple corpus most frequent in English
DBpedia entities in the Wikinews GM, Chrysler, Ford corpus most frequent
in English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . 133
DBpedia entities in the Wikinews stock market corpus most frequent in
English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . . . 133
ILI-based events extracted for English, Spanish, Italian and Dutch Wikinews
ILI-based events in the Wikinews Airbus corpus most frequent in English
ILI-based events in the Wikinews Apple corpus most frequent in English
ILI-based events in the Wikinews GM, Chrysler, Ford corpus most frequent
in English with Spanish, Italian and Dutch frequencies . . . . . . . . . . . 135
ILI-based events in the Wikinews stock market corpus most frequent in
English with Spanish and Dutch frequencies . . . . . . . . . . . . . . . . . 135
Triple predicates that are most frequent in the English Wikinews corpus
with coverage in Spanish, Italian and Dutch . . . . . . . . . . . . . . . . . 136
ILI-based Triples extracted for English, Spanish, Italian and Dutch Wikinews
FrameNet frames for contextualEvents . . . . . . . . . . . . . . . . . . . . 139
FrameNet from for sourceEvents . . . . . . . . . . . . . . . . . . . . . . . . 140
FrameNet frames for grammaticalEvents . . . . . . . . . . . . . . . . . . . 140
February 1, 2016
12/148
February 1, 2016
1
13/148
Introduction
The goal of the NewsReader project1 is to automatically process massive streams of daily
news in 4 different languages to reconstruct longer term story lines of events. For this purpose, we extract events mentioned in news articles, the place and date of their occurrence
and who is involved, using a pipeline of Natural Language Processing modules (for the
details see Agerri et al. (2015)) that process each news article and store the interpretation
in the form of the Natural Language Processing Annotation Format (NAF (Fokkens et al.,
2014)). Representations in NAF are mention-based. Mentions of an event, entity, place or
time are not unique. The same instance (event, entity, place or date) can be mentioned
several times in a single text or in different texts and, likewise NAF represents each mention separately. Consider the following short fragments of two news articles published on
the same day:
http://www.telegraph.co.uk/finance/newsbysector/industry/engineering/10125280/Porsche-family-buys-back10pc-stake-from-Qatar.html
17 Jun 2013 Porsche family buys back 10pc stake from Qatar
Descendants of the German car pioneer Ferdinand Porsche have bought back a 10pc stake in the company
that bears the family name from Qatar Holding, the investment arm of the Gulf State’s sovereign wealth
fund.
————————————————————————————–
http://english.alarabiya.net/en/business/banking-and-finance/2013/06/17/Qatar-Holding-sells-10-stake-in-Porscheto-family-shareholders.html
Monday, 17 June 2013 Qatar Holding sells 10% stake in Porsche to founding families
Qatar Holding, the investment arm of the Gulf state’s sovereign wealth fund, has sold its 10 percent stake in
Porsche SE to the luxury carmaker’s family shareholders, four years after it first invested in the firm.
Both fragments describe the same event but do this very differently. The first fragment
talks about a buy event of 10pc stake in the Porsche company from Qatar holding by
the Porsche family. The second fragments frames this as a sell event of the same stake
by Qatar Holding to Porsche. Both articles make reference to the Porsche company, the
Porsche family and Qatar Holding in various ways e.g. Descendants of the German car
pioneer Ferdinand Porsche and the investment arm of the Gulf state’s sovereign wealth
fund. However, there is no difference in the content across the texts. The NewsReader
pipeline for processing text deals with this by detecting DBpedia URIs for each entity and
resolving coreference relations in the text to connect other expressions to these entities. The
same is done for events mentioned in the text: buys back and bought back are represented
through a unique URI as a single event, and so are sells and solds in the second fragment.
Obviously, these events are not in DBpedia and therefore unique blank URIs are created.
In NewsReader, we ultimately generate representations for unique instances represented
from these URIs in RDF across all these mentions and the relations between them as RDFtriples according to the Simple Event Model (SEM, van Hage et al. (2011)). Within SEM,
1
FP7-ICT-316404 Building structured event indexes of large volumes of financial and economic data for
decision making, www.newsreader-project.eu/
February 1, 2016
1
2
3
4
5
6
7
: e v e n t #23
a
rdfs : label
f n : Buyer
fn : S e l l e r
f n : Goods
sem : hasAtTime
14/148
sem : Event , f n : C o m m e r c e s e l l , f n : Commerce buy ;
” buy ” , ” s e l l ” ;
dbp : r e s o u r c e / P o r s c h e ;
dbp : r e s o u r c e / Q a t a r I n v e s t m e n t A u t h o r i t y ;
: non− e n t i t i e s /10 p c s t a k e ;
:20150120.
Figure 1: SEM event instance
1
2
3
4
5
6
7
8
9
: e v e n t #23
g a f : denotedBy
<h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =15,19> ,
<h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =13 ,19 >.
dbp : r e s o u r c e / P o r s c h e
g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =0 ,7 >.
: non− e n t i t i e s /10 p c s t a k e
g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =25 ,35 >.
dbp : r e s o u r c e / Q a t a r I n v e s t m e n t A u t h o r i t y
g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =0 ,13 >.
Figure 2: SEM event instance with mentions
data objects are defined for events, actors, places and time with relations between them.
Sources that contain the same events, entities and relations thus should result in the same
structure in SEM. A SEM representation for the above two fragments would look as in
Figure 1.
For the event, we created the so-called blank URI event#23 which is an instance of the
ontological types sem:Event, fn:Commerce sell, fn:Commerce buy. The two latter types
come from FrameNet Baker et al. (1998). Furthermore, we see the labels buy and sell that
are aggregated from the two sources. Furthermore, there are triples that relate the event to
the entities in the text through FrameNet relations, while the event is anchored to a data
through a sem:hasAtTime relation. From both text, only a single event representation is
derived with the same set of triples.
Since we do not want to lose the connection to the textual mentions of the information,
we developed the Grounded Annotation Framework (GAF, Fokkens et al. (2013)), which
formally distinguishes between mentions of events and entities in NAF and instances of
events and entities in SEM, connected through denotedBy links between representations.
For each object in SEM, we therefore provide the pointers to the character offsets in the
text where the information is mentioned as additional triples, as is shown in Figure 2.
In this deliverable, we describe the modules that read NLP interpretations from NAF
files and create the RDF representations according to SEM and GAF. The RDF representations are eventually stored into the KnowledgeStore (Corcoglioniti et al. (2013)). The
following Figure 3 shows the position of modules in the overall process.
In this deliverable, we describe the final implementation of the modules that interpret
NAF data as SEM/GAF. The document is structured as follows. In section 2, we describe
the implementation of the NAF2SEM module that carries out the conversion. There
are two implementations: one for batch processing and one for stream processing. The
latter implementation is part of the streaming end-to-end architecture through which a
text file is completely processed through the NLP pipeline, interpreted as SEM-RDF and
integrated into the KnowledgeStore without intermediate storage. In section 3, we report
February 1, 2016
15/148
Figure 3: Input-output schema for Work Packages in NewsReader
on the progress made on event coreference, which is the basis for establishing instance
representation, detailing the current implementation of the NAF2SEM module. In section
4, we describe two modules for detecting temporal and causal relations. These relations are
used in section 5.2, where we report on the modules that extract timelines and storylines.
Section 6 discusses the implementation of the perspective and attribution module. Events
are divided into contextual events and source events. The former describe the statements
about the changes in the world, while the latter events indicate the relations between
sources and these statements. We developed a separate module that derives the perspective
and attribution values from these two event types, incorporating the output of the opinion
layer and the attribution layer in NAF. Finally in section 7, we provide our results on crosslingual event extraction (i.e. combining the event information extracted from documents
in different languages). The English Wikinews corpus was translated to Dutch, Italian and
Spanish. By processing the translations with the Spanish, Italian and Dutch pipelines,
we were able to create SEM representation from each translated data set. These SEM
representations should entail the same instance information. We report on the results of
this comparison. Conclusions are given in section 8.
February 1, 2016
2
16/148
Interpreting NAF-XML as SEM-RDF
Where NAF contains the semantic interpretations of mentions in text, SEM represents
instances of entities, events and relations linked to these mentions through GAF. Following
GAF, we create SEM instances from NAF layers by combining different mentions of the
same event or entity into a unique URI representation with gaf:denotedBy links to their
mentions. Identity across mentions is based on various measures, among which identical
DBPedia URIs, overlap with spans of tokens or terms in coreference sets, normalisation of
dates, and similarity in WordNet. Next we show the output for processing just the titles
in the two Qatar-Porsche examples as separate NAF sources. The RDF-TRiG has 3 parts:
nwr:instances a named graph of all the instances detected: events, participants and time
instances or intervals. See Figure 4.
event relations a set of named-graphs, each representing a relation with an event instance. See Figure 5.
nwr:provenance a named graph with gaf:denotedBy relations between the named graphs
of the event relations and the offset positions in the sources . See Figure 6.
The TRiG example shows a graph that includes all the instances. We see here 4
types of instances: events, entities, non-entities and time descriptions. Instances are based
on coreference of mentions. Entity mention coreference is established by a the nominal
coreference module but also indirectly by the URI assigned to the entities. Since URIs
are unique and most URIs are based on DBpedia, mentions for which we create the same
URI are automatically merged and all statements will apply to this unique representation
across different NAF files. In this example we see that the entity
<dbp:Porsche>
has two different mentions originated from the two different sources. In other cases such as
Qatar and Qatar Holding, different URIs have been chosen by the Named Entity Disambiguation (NED) module and consequently the entities are represented as distinct instances.
There are also event components that are not considered as entities, such as:
<nwr:data/porsche/non-entities/10pc+stake>
To identify the concept, we create a URI based on the phrase that is only unique across
data sets. So all reference to 10p stake within the database will become corerefential and
point to the same instance representation. Obviously, this is not a proper representation
since each 10 percent can be a different one. We also see that the events sell and buy
are kept separately here by the software. This is because the similarity between these
words is not high enough to merge them. Finally, the two time expressions in the two
documents represent the day on which the document was processed, which resolves to the
same time:Instant instance. This is because the meta data was lacking and there is no
other time description in the title.
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
17/148
prov :
<h t t p : / /www. w3 . o r g / n s / p r o v#> .
gaf :
<h t t p : / / g r o u n d e d a n n o t a t i o n f r a m e w o r k . o r g / g a f#> .
wn :
<h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / o n t o l o g i e s /pwn3.0/ > .
n w r o n t o l o g y : <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / o n t o l o g i e s /> .
ili :
<h t t p : / / g l o b a l w o r d n e t . o r g / i l i /> .
rdfs :
<h t t p : / /www. w3 . o r g / 2 0 0 0 / 0 1 / r d f −schema#> .
time :
<h t t p : / /www. w3 . o r g /TR/ owl−t i m e#> .
eso :
<h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / domain−o n t o l o g y#> .
pb :
<h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / o n t o l o g i e s / propbank/> .
owl :
<h t t p : / /www. w3 . o r g / 2 0 0 2 / 0 7 / owl#> .
nwr :
<h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu/> .
rdf :
<h t t p : / /www. w3 . o r g /1999/02/22 − r d f −s y n t a x−n s#> .
sem :
<h t t p : / / s e m a n t i c w e b . c s . vu . n l / 2 0 0 9 / 1 1 / sem/> .
fn :
<h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / o n t o l o g i e s / f r a m e n e t/> .
skos :
<h t t p : / /www. w3 . o r g / 2 0 0 4 / 0 2 / s k o s / c o r e#> .
nwrdata : <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / d a t a/> .
nwr : i n s t a n c e s {
#e n t i t i e s
<dbp : P o r s c h e>
rdfs : label
g a f : denotedBy
<dbp : Qatar>
rdfs : label
g a f : denotedBy
” Porsche ” ;
<h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =0,7>
, <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =33,40> .
” Qatar ” ;
<h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =41,46> .
<dbp : Q a t a r I n v e s t m e n t A u t h o r i t y >
rdfs : label
” Qatar H o l d i n g ” ;
g a f : denotedBy <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#c h a r =0,13> .
<nwr : d a t a / p o r s c h e / non− e n t i t i e s /10 pc+s t a k e >
rdfs : label
”10 pc s t a k e ” ;
g a f : denotedBy <h t t p : / /www. t e l e g r a p h . c o . uk#c h a r =25,35> .
#non− e n t i t i e s
<nwr : d a t a / p o r s c h e / non− e n t i t i e s /10+\%25+ s t a k e+i n+p o r s c h e >
rdfs : label
”10 \% s t a k e i n P o r s c h e ” ;
<nwr : d a t a / p o r s c h e / non− e n t i t i e s / t o+f o u n d i n g+f a m i l i e s >
rdfs : label
” to founding family ” ;
#e v e n t s
<h t t p : / / e n g l i s h . a l a r a b i y a . n e t#ev1>
a
sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t ,
fn : Commerce sell , eso : S e l l i n g , i l i : i32963 , i l i : i32953
rdfs : label
”sell” ;
<h t t p : / /www. t e l e g r a p h . c o . uk#ev2>
a
f n : Commerce buy , e s o : Buying , i l i : i 3 2 7 8 8 , i l i : i 3 4 9 0 1
rdfs : label
” buy ” ;
#t i m e
<nwr : t i m e /20150324 >
a
t i m e : day
t i m e : month
time : unitType
time : year
;
;
time : DateTimeDescription ;
”−−−24”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gDay> ;
”−−03”ˆˆ< h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gMonth> ;
t i m e : unitDay ;
”2015”ˆˆ < h t t p : / /www. w3 . o r g / 2 0 0 1 /XMLSchema#gYear> .
<h t t p : / /www. t e l e g r a p h . c o . uk#tmx0>
a
time : I n s t a n t ;
t i m e : inDateTime <nwr : t i m e /20150324 > .
<h t t p : / / e n g l i s h . a l a r a b i y a . n e t#tmx0>
a
}
Figure 4: Instances of events, entities, non-entities and time
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
18/148
<h t t p : / / e n g l i s h . a l a r a b i y a . n e t#pr1 , r l 1 > {
sem : h a s A c t o r
<dbp : Q a t a r I n v e s t m e n t A u t h o r i t y > ;
e s o : p o s s e s s i o n −o w n e r 1
<nwr : o n t o l o g i e s / f r a m e n e t / C o m m e r c e s e l l @ S e l l e r >
pb : A0
<dbp : Q a t a r I n v e s t m e n t A u t h o r i t y > .
}
<nwr : d a t a / p o r s c h e / non− e n t i t i e s /10+\%25+ s t a k e+i n+p o r s c h e > ;
<nwr : o n t o l o g i e s / f r a m e n e t / Commerce sell@Goods>
<nwr : d a t a / p o r s c h e / non− e n t i t i e s /10+\%25+ s t a k e+i n+p o r s c h e > ;
pb : A1
<nwr : d a t a / p o r s c h e / non− e n t i t i e s /10+\%25+ s t a k e+i n+p o r s c h e > .
}
<nwr : d a t a / p o r s c h e / non− e n t i t i e s / t o+f o u n d i n g+f a m i l i e s > ;
<nwr : o n t o l o g i e s / f r a m e n e t / Commerce sell@Buyer>
pb : A2
<nwr : d a t a / p o r s c h e / non− e n t i t i e s / t o+f o u n d i n g+f a m i l i e s > .
}
<h t t p : / /www. t e l e g r a p h . c o . uk#pr2 , r l 2 > {
<dbp : P o r s c h e> ;
<nwr : o n t o l o g i e s / f r a m e n e t / Commerce buy@Buyer>
pb : A0
<dbp : P o r s c h e> .
}
<nwr : d a t a / p o r s c h e / non− e n t i t i e s /10 pc+s t a k e > ;
<nwr : o n t o l o g i e s / f r a m e n e t / Commerce buy@Goods>
<nwr : d a t a / p o r s c h e / non− e n t i t i e s /10 pc+s t a k e > ;
pb : A1
<nwr : d a t a / p o r s c h e / non− e n t i t i e s /10 pc+s t a k e > .
}
<dbp : Qatar> ;
<nwr : o n t o l o g i e s / f r a m e n e t / Commerce buy@Means>
<dbp : Qatar> ;
pb : A2
<dbp : Qatar> .
}
<h t t p : / / e n g l i s h . a l a r a b i y a . n e t#t r 1 > {
sem : hasAtTime <h t t p : / / e n g l i s h . a l a r a b i y a . n e t#tmx0> ;
sem : hasTime
<h t t p : / / e n g l i s h . a l a r a b i y a . n e t#tmx0> .
}
<h t t p : / /www. t e l e g r a p h . c o . uk#t r 2 > {
sem : hasAtTime <h t t p : / /www. t e l e g r a p h . c o . uk#tmx0> ;
sem : hasTime
<h t t p : / /www. t e l e g r a p h . c o . uk#tmx0> .
}
Figure 5: Sem triples embedded in named-graphs
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
19/148
nwr : p r o v e n a n c e {
<h t t p : / / e n g l i s h . a l a r a b i y a . n e t#pr1 , r l 1 >
<h t t p : / /www. t e l e g r a p h . c o . uk#pr2 , r l 2 >
}
Figure 6: Provenance information indicating the mentions of relations
These examples show that the interpretation of mentions as instances, is a delicate
process that is on the one hand the result of creating identical URIs for the interpretation
of a mention (e.g. DBpedia references or normalised dates) and on the other hand the
result of specific strategies to match the semantics. The module NAF2SEM2 defines this
interpretation. The core function to interpret the mentions in NAF is the class GetSemFromNaf.java. It creates the SEM objects and relations in memory. At that point there
are two API options: either the information is stored on disk as a binary object file or
the SEM objects are serialized to an RDF file or stream. The former is used for batch
processing described in Section 2.2.1, whereas the latter is used for the streaming architecture described in Section 2.2.2. Below we first describe the representation of the instances
and their relations in more detail and next the general way in which we establish identity
across events. In the next subsections, we describe in detail the interpretation as instances
and relations and how identity is established.
2.1
Extracting instances from NAF layers
The NAF2SEM module combines information from various NAF layers to define the SEM
Objects (internal data structures defined in Java) for entities, events, time and event
relations. We discuss each type of object in more details below.
2.1.1
Entities and non-entities
Genuine entities are represented in the entity layer in NAF and have a DBpedia URI that
identifies them and are a participant in an event that is extracted. However, there are
many important objects that do not fit all 3 of these requirements. There are entities
without a reference to DBPedia, there are entities that do not play a role in an event
that is represented and sometimes important participants in events are not detected as
entities. As a result, we can find the different entities and entity-like objects in the RDF
representations.
Regular entities that have been detected by the Named Entity Recognizer (NERC) and
got an external reference to an URI by the spotlight program are represented as discussed
2
https://github.com/cltl/EventCoreference
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<e n t i t y i d =”e 1 ” t y p e=”ORGANIZATION”>
<r e f e r e n c e s >
<!−−P o r s c h e−−>

<t a r g e t i d =”t 1 ” />

</ r e f e r e n c e s >
<e x t e r n a l R e f e r e n c e s >
<e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t
r e f t y p e =”en ” />
r e f t y p e =”en ” />
r e f t y p e =”en ” />
r e f t y p e =”en ” />
r e f t y p e =”en ” />
r e f t y p e =”en ” />
r e f t y p e =”en ” />
r e f t y p e =”en ” />
r e f t y p e =”en ” />
r e f t y p e =”en ” />
</ e x t e r n a l R e f e r e n c e s >
</ e n t i t y >
20/148
v 1 ” r e f e r e n c e =”dbp : P o r s c h e ” c o n f i d e n c e = ” 0 . 9 9 9 9 3 2 1 7 ”
v 1 ” r e f e r e n c e =”dbp : P o r s c h e D e s i g n G r o u p ” c o n f i d e n c e = ” 3 . 9 7 6 1 5 2E−5”
v 1 ” r e f e r e n c e =”dbp : P o r s c h e 9 1 1 ” c o n f i d e n c e = ” 1 . 5 3 5 9 3 9 7E−5”
v 1 ” r e f e r e n c e =”dbp : P o r s c h e 9 1 4 ” c o n f i d e n c e = ” 7 . 1 7 1 3 5 8E−6”
v 1 ” r e f e r e n c e =”dbp : F e r d i n a n d P o r s c h e ” c o n f i d e n c e = ” 5 . 5 2 2 5 6 3 7E−6”
v 1 ” r e f e r e n c e =”dbp : P o r s c h e f a m i l y ” c o n f i d e n c e = ” 2 . 4 9 3 9 7 5 6E−10”
v 1 ” r e f e r e n c e =”dbp : P o r s c h e i n m o t o r s p o r t ” c o n f i d e n c e = ” 2 . 9 5 7 0 8 0 8E−14”
v 1 ” r e f e r e n c e =”dbp : P o r s c h e 5 5 0 ” c o n f i d e n c e = ” 8 . 5 1 7 1 6 5E−17”
v 1 ” r e f e r e n c e =”dbp : P o r s c h e 3 5 1 2 ” c o n f i d e n c e = ” 2 . 5 0 4 6 7 4 6E−17”
v 1 ” r e f e r e n c e =”dbp : P o r s c h e R S S p y d e r ” c o n f i d e n c e = ” 1 . 9 3 9 9 4 5 2E−17”
Figure 7: Organisation Entity in NAF with URIs
1
2
3
4
5
6
7
8
<e n t i t y i d =”e 4 5 ” t y p e=”LOCATION”>
<!−−N o r t h e a s t China ’ s−−>
<t a r g e t i d =”t 6 9 7”/>< t a r g e t

</ e n t i t y >
<e n t i t y i d =”e 4 6 ” t y p e=”LOCATION”>
i d =”t 6 9 8”/>< t a r g e t
i d =”t 6 9 9 ”/>
Figure 8: Location Entity in NAF without URI
before through their URI. The NAF representation for the Porsche instance shown in TRiG
example above is shown in Figure 7. There are however many phrases detected as entities
by the NERC that did not receive an external reference to DBPedia by spotlight as is
shown in Figure 8. For these so-called dark entities we create a blank URI within the set
of entities in a project. The entity type assigned by the NERC is used to create a subclass
relation. This is shown in Figure 9.
When the same URI was recovered for different mentions of an entity, this results in a
single representation of that entity with gaf:denotedBy links to each mention. The spans
of these mentions can however overlap with the span in the entity coreference set in a
NAF layer. In these cases, we can extend the mentions of an entity with the mentions of
the coreference set but in some cases, distinct entities also get tight to the same instance
1
2
3
4
<nwr : d a t a / c a r s / e n t i t i e s / N o r t h e a s t C h i n a s >
a
n w r o n t o l o g y : LOCATION ;
rdfs : label
” N o r t h e a s t China ’ s ” ;
g a f : denotedBy <nwr : d a t a / c a r s /57DD−HR81−JB4B−V3T6 . xml#c h a r =3582 ,3599 > .
Figure 9: Entity in SEM with a blank URI
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
21/148
<e n t i t y i d =”e 5 ” t y p e=”PERSON”>
<!−− D i d i e r Drogba−−>

<t a r g e t i d =”t 6 8 ”/>
<t a r g e t i d =”t 6 9 ”/>

<e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t v 1 ” r e f e r e n c e =”dbp : D i d i e r D r o g b a ” c o n f i d e n c e =”1.0”
r e f t y p e =”en ” s o u r c e =”en”/>
<e x t e r n a l R e f r e s o u r c e =”dbp ” r e f e r e n c e =”dbp : D i d i e r D r o g b a ” c o n f i d e n c e =”1.0” s o u r c e =”POCUS”/>
</ e n t i t y >
<e n t i t y i d =”e 2 ” t y p e=”PERSON”>
<!−− D i d i e r Yves Drogba T e b i l y −−>

<t a r g e t i d =”t 2 ”/>
<t a r g e t i d =”t 3 ”/>
<t a r g e t i d =”t 4 ”/>
<t a r g e t i d =”t 5 ”/>

</ e n t i t y >
Figure 10: Drogba as an entity in NAF
through the coreference set. Consider the example of Didier Drogba and Didier Yves Drogba
Tébily in Figure 10 that was in both cases detected as an entity but only the former is
mapped to DBpedia. Within the same NAF we also find the coreference set shown in
Figure 11. The spans in the coreference set overlap with the spans of the two entities. As
a result of that, we can merge all these representation into a single entity and list all of the
mentions of the entity layer and the coreference set. Consequently, the rdfs:label predicate
is also extended with all the different labels used to make reference. This is shown in the
SEM representation in Figure 12.
Although the NAF2SEM module reads all entities from a NAF file, we currently only
output entities that play a role in events. To determine whether an entity plays a role in
an event, we need to match the mentions of an entity with the span of the roles of events.
We will discuss this later below.
In addition, there are phrases that play a role but are not detected as entities. We have
seen two examples in the TRiG example shown in the beginning of this section. To limit
the amount of entities and triples, we only consider roles that have a FrameNet role element
assigned. We consider these roles essential for understanding what the event is about. In
all cases that these roles cannot be assigned to an entity, we represent the concept as a
so-called non-entity. To find these non-entities, we match the span of the roles with all the
spans of the entities extracted before. We assign the type NONENTITY to these instances
as showin in Figure 13.
Spotlight is not only applied to entities in the text but also to all other content phrases.
This is represented in the NAF layer with the markables, see Figure 14. We use these
markables to find potential DBpedia references that somehow relate to the non-entity.
Again the overlap in the span elements is used to determine relevance. Since we do not
know the precise semantic relation, we use the skos:relatedMatch predicate to relate the
non-entity to the DBpedia entry. Through the skos:relatedMatch it is possible to query the
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
< c o r e f i d =”c o 2”>
<!−−His−−>
 <t a r g e t i d =”t 1 2 9 ”/> 
 <t a r g e t i d =”t 5 3 1 ”/> <t a r g e t i d =”t 5 3 2 ”/> 
<!−−Drogba−−>
<!−−he−−>
 <t a r g e t i d =”t 7 4 ”/> 
<!−−h i s −−>
 <t a r g e t i d =”t 9 ”/> 
 <t a r g e t i d =”t 6 2 6 ”/> <t a r g e t i d =”t 6 2 7 ”/> 
 <t a r g e t i d =”t 6 8 ”/> <t a r g e t i d =”t 6 9 ”/>
<!−−His−−>
<t a r g e t i d =”t 1 0 8 ”/>
<!−−h i s −−>
<t a r g e t i d =”t 8 4 ”/>
<!−− D i d i e r Yves Drogba T e b i l y −−>
<t a r g e t i d =”t 2”/>< t a r g e t i d =”t 3”/>< t a r g e t i d =”t 4”/>< t a r g e t
</ c o r e f >
22/148
i d =”t 5 ”/>
Figure 11: Coreference set with Drogba in NAF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
dbp : D i d i e r D r o g b a
rdfs : label
” Drogba ” , ” D i d i e r Yves Drogba T e b i l y ” , ” h i s ”
” D i d i e r Drogba ” , ” D i d i e r Drogba ’ s ” ;
g a f : denotedBy
<nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =11,36> ,
<nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =694 ,697 > ,
<nwr : d a t a / c a r s /57R0−J5K1−JC86−C1N7 . xml#c h a r =53,56> ,
<nwr : d a t a / c a r s /59 JJ −0761−DY2M−33MF. xml#c h a r =4277 ,4292 > ,
<nwr : d a t a / c a r s /59 JJ −0761−DY2M−33MF. xml#c h a r =4693 ,4699 > .
,
Figure 12: Drogba as an instance in SEM
1
2
3
4
5
6
7
8
9
10
11
<nwr : d a t a / c a r s / non− e n t i t i e s / from+a+n e w s p a p e r+o r+magazine>
a
n w r o n t o l o g y :NONENTITY ;
rdfs : label
” from a n e w s p a p e r o r m a g a z i n e ” ;
g a f : denotedBy
<nwr : d a t a / 2 0 0 4 / 0 3 / 2 6 / 4 C16−2HY0−01JV−13BX. xml#c h a r =4099 ,4127 > ;
skos : relatedMatch
dbp : Magazine
<nwr : d a t a / c a r s / non− e n t i t i e s / a+l a w s u i t >
a
n w r o n t o l o g y :NONENTITY ;
rdfs : label
”a l a w s u i t ” ;
g a f : denotedBy
<nwr : d a t a / c a r s /5629−K3P1−F190−V098 . xml#c h a r =3239 ,3248 > ;
skos : relatedMatch
dbp : L a w s u i t .
Figure 13: Non-entity instance in SEM
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23/148
<!−−magazine−−>


<t a r g e t i d =”w23”/>

<e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t ” r e f e r e n c e =”dbp : Magazine ” c o n f i d e n c e =”1.0”

<!−− l a w s u i t −−>


<t a r g e t i d =”w601”/>

<e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t ” r e f e r e n c e =”dbp : L a w s u i t ” c o n f i d e n c e =”1.0”

Figure 14: Markables in NAF
non-entities through the DBpedia entries and classes rather than just on the string values.
2.1.2
Events
We cannot recover a URI from DBpedia to represent an event instance. We therefore create a blank URI using the meta data of the document and an event counter. Furthermore,
we give the type information, the labels, the mentions for each event object. The event
coreference sets in NAF are the basis for defining the event instances in a single document.
In Figure 15, we show several examples of event coreference sets. Event coreference sets
are derived from the predicates in the Semantic Role Layer (SRL). Every predicate is represented in a coreference set. Predicates with the same lemma are represented in the same
set, as well as predicates that are semantically similar. As a result we get both singleton
sets (noreference) and multiform sets (coreference). For each mention, we determine the
highest scoring wordnet synsets and across different mentions we select the most dominant
senses from the highest scoring senses. Likewise, each coreference set gets external references to the most dominant senses with the averaged word-sense-disambiguation score.
Whenever the similarity across these dominant senses is above the threshold, we merge
coreference sets and add the lowest common subsumer to the merged set. This is shown
for the last example in Figure 15, were shot and injured are merged into a single set with
the lowest common subsumer synset eng-30-00069879-v, injure:1, wound:1 and a similarity
score of 2.6390574 using the method described by Leacock and Chodorow (1998).
Each coreference set becomes a potential event instance, where we use the mentions for
the labels and the references to the WordNet synset as a subclass relation.3 Furthermore,
we collect a subset of the ontological labels assigned to each predicate in the SRL. For
example for chased, we find the following typing in the SRL layer shown in Figure 16, from
which we copy the FrameNet and ESO classes into the instance representation.
3
We convert all reference to WordNet synset to InterLingualIndex concepts to allow cross-lingual match-
ing
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
< c o r e f i d =” c o e v e n t 7 2 ” t y p e=” e v e n t ”>
<!−−c h a s e d−−>

<t a r g e t i d =”t 3 7 1 ”/>

<e x t e r n a l R e f r e s o u r c e =”WordNet −3.0”
s o u r c e =” d o m i n a n t s e n s e ”/>
</ c o r e f >
<!−− g i v i n g −−>

<t a r g e t i d =”t 1 5 0 ”/>

</ c o r e f
<!−−w i n n i n g−−>
<!−−won−−>
<!−−won−−>
</ c o r e f >
24/148
r e f e r e n c e =” i l i −30−02001858−v ” c o n f i d e n c e =”1.0”
r e f e r e n c e =” i l i −30−02200686−v ” c o n f i d e n c e = ” 0 . 8 6 5 3 8 1 9 6 ”
r e f e r e n c e =” i l i −30−02199590−v ” c o n f i d e n c e = ” 0 . 9 1 3 9 8 1 ”
r e f e r e n c e =” i l i −30−01629403−v ” c o n f i d e n c e = ” 0 . 9 7 3 0 2 9 3 ”
r e f e r e n c e =” i l i −30−02288295−v ” c o n f i d e n c e = ” 0 . 9 7 5 4 5 7 1 3 ”
<!−−s h o t −−>
<!−−s h o t −−>
<!−− i n j u r e d −−>
 <t a r g e t i d =”t 6 8 4 ”/>
<e x t e r n a l R e f r e s o u r c e =” P r i n c e t o n WordNet 3 . 0 ” r e f e r e n c e =”eng −30−00069879−v ” c o n f i d e n c e = ” 2 . 6 3 9 0 5 7 4 ”
s o u r c e =”l o w e s t c o m m o n s u b s u m e r ”/>
<e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” r e f e r e n c e =” i l i −30−02055267−v ” c o n f i d e n c e = ” 0 . 8 1 2 2 6 4 8 6 ”
<e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” r e f e r e n c e =” i l i −30−01134781−v ” c o n f i d e n c e = ” 0 . 9 5 1 7 7 9 2 ”
<e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” r e f e r e n c e =” i l i −30−01137138−v ” c o n f i d e n c e = ” 0 . 9 2 4 6 8 0 7 ”
<e x t e r n a l R e f r e s o u r c e =”WordNet −3.0” r e f e r e n c e =” i l i −30−02484570−v ” c o n f i d e n c e = ” 0 . 9 3 0 1 4 6 4 6 ”
</ c o r e f >
Figure 15: Event coreference set in NAF
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
25/148
<!−−t 3 7 1 c h a s e d : A1 [ t 3 6 7 England ] AM−DIR [ t 3 7 2 down]−−>

<!−−c h a s e d−−>

<t a r g e t i d =”t 3 7 1 ”/>

<e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” c h a s e .01”/ >
<e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =”c h a s e −51.6”/ >
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Cotheme”/>
<e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” T r a n s l o c a t i o n ”/>
<e x t e r n a l R e f r e s o u r c e =”EventType ” r e f e r e n c e =” c o n t e x t u a l ”/>
<e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−02001858−v”/>
<nwr : d a t a / c a r s /55XK−XGX1−JBKJ−C3CF . xml#ev72>
a
e s o : T r a n s l o c a t i o n , f n : Cotheme , i l i : i 3 1 7 4 7 ;
rdfs : label
” chase ” ;
g a f : denotedBy <nwr : d a t a / c a r s /55XK−XGX1−JBKJ−C3CF . xml#c h a r =2074 ,2080 > .
Figure 16: SRL in NAF with event types for the predicate chased
In the case of the coreference set of injured and shot, we derive no classes for injured
but various FrameNet classes from the predicate shot, seen in Figure 17.
In addition to the specific semantic classes, every event is of the type sem:Event and one
of the 3 main event classes in NewsReader:
sourceEvent events that introduce a source of information as the semantic subject, such
as speech-acts (say, claim, declare) and cognitive-verbs (think, believe, hope, fear ).
grammaticalEvent auxiliary verbs (be, have, should, will ), aspectual verbs (stop, begin,
take place, happen) that do not introduce other participants in an event, do not define
a different time interval for events and express properties of other events.
contextualEvent all other events than the above that take place in some world and do
not introduce a source of events or express a property of an event.
On the basis of an analysis of the car data set, we create lists of FrameNet frames classified
as contextual, source or grammatical. In Appendix 9, we give a complete list of FrameNet
frames that distinguish between these three main event types.
2.1.3
Participant and event relations
Once the instances for entities, non-entities and events have been established, we determine
the relations between them. We first extract from the SRL layer all the roles with a valid
Propbank role. The following role labels are considered valid:
PRIMEPARTICIPANT a0, arg0, a-0, arg-0
NONPRIMEPARTICIPANT a1, a2, a3, a4, arg1, arg2, arg3, a-1, a-2, a-3, a-4, arg-1,
arg-2, arg-3, arg-4, am-dir, argm-dir
LOCATION am-loc, argm-loc, am-dir
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
26/148
<!−−t 5 7 9 i n j u r e d : A0 [ t 5 7 7 he ] AM−MNR[ t 5 8 0 s o ] A1 [ t 5 8 1 t h a t]−−>

<!−− i n j u r e d −−>

<t a r g e t i d =”t 5 7 9 ”/>

<e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” i n j u r e .01”/ >

<!−−s h o t −−>

<t a r g e t i d =”t 5 0 0 ”/>

<e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” s h o o t .02”/ >
<e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =”p o i s o n −42.2”/ >
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” H i t t a r g e t ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” S h o o t p r o j e c t i l e s ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” U s e f i r e a r m ”/>
<e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” s h o o t .02”/ >
<nwr : d a t a / c a r s /59JB−GV01−JBSN−30SP . xml#ev84>
a
sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , f n : H i t t a r g e t ,
i l i : i27293 , i l i : i27278 , i l i : i34141 ,
fn : S h o o t p r o j e c t i l e s
i l i : i22125 , fn : Use firearm ;
rdfs : label
” shoot ” , ” i n j u r e ” ;
g a f : denotedBy <nwr : d a t a / c a r s /59JB−GV01−JBSN−30SP . xml#c h a r =2215 ,2219 > ,
<nwr : d a t a / c a r s /59JB−GV01−JBSN−30SP . xml#c h a r =2588 ,2595 > .
,
Figure 17: SRL in NAF for predicate injured
Next, we intersect the span of the role with the span of any mention of the entities and nonentities that were extracted before. Since the spans are established in very different ways
across the NLP modules that create the NAF layers, we implemented a loose matching
principle that the number of matching content words across the role and the entity needs
exceed 75% of each. This prevents excessively long spans to match with short spans.
Content words are terms with part-of-speech that start with R (adverb), N (noun), V
(verb), A (adjective) or G (adjective). Note that non-entities always exactly match at
least one role since they are derived from the roles. If there is a match between an entity
and a role, we create a SEM relation between the event instance and the entity instance
and copy the semRole value and the external references from the role as predicates for
the relation. In Figure 18 we show the SRL structure for the chased predicate given
before, followed by the SEM relations that are extracted from it. The SEM relations are
combined in named-graphs for which we created blank URIs based on the identifiers for the
predicate and the role, e.g. pr71,rl175. Since the role is considered to be an actor role, the
sem:hasActor predicate is added. Furthermore, only the PropBank, ESO and FrameNet
roles are kept. Note that the NAF role down the ball is not represented in RDF. This is
because there is no matching entity for this role and there is no FrameNet role to promote
it as a non-entity. We thus constraint the representation of the events to components that
are somehow modeled and crystallized.
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
27/148
<!−−t 3 7 1 c h a s e d : A1 [ t 3 6 7 England ] AM−DIR [ t 3 7 2 down]−−>

<!−−c h a s e d−−>

<t a r g e t i d =”t 3 7 1 ”/>

<e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =”c h a s e −51.6”/ >
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Cotheme”/>
<e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” T r a n s l o c a t i o n ”/>
< r o l e i d =” r l 1 7 5 ” semRole=”A1”>
<!−−England d e f e n d e r John Terry−−>

<t a r g e t i d =”t 3 6 7 ”/>
<t a r g e t i d =”t 3 6 8 ” head=” y e s ”/>
<t a r g e t i d =”t 3 6 9 ”/>
<t a r g e t i d =”t 3 7 0 ”/>

<e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =”c h a s e −51.6@Theme”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Cotheme@Cotheme”/>
<e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” c h a s e . 0 1 @1”/>
<e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” T r a n s l o c a t i o n @ t r a n s l o c a t i o n −theme”/>
</ r o l e >
< r o l e i d =” r l 1 7 6 ” semRole=”AM−DIR”>
<!−−down t h e b a l l −−>

<t a r g e t i d =”t 3 7 2 ” head=” y e s ”/>
<t a r g e t i d =”t 3 7 3 ”/>
<t a r g e t i d =”t 3 7 4 ”/>

</ r o l e >

<nwr : d a t a / c a r s /55XK−XGX1−JBKJ−C3CF . xml#pr71 , r l 1 7 5 > {
<nwr : d a t a / c a r s /55XK−XGX1−JBKJ−C3CF . xml#ev72>
dbp : J o h n T e r r y ;
f n : Cotheme@Cotheme
e s o : t r a n s l o c a t i o n −theme
pb : A1
dbp : J o h n T e r r y .
}
Figure 18: SRL in NAF with roles and corresponding SEM triples for actor relations
February 1, 2016
2.1.4
28/148
Temporal anchoring
Time objects and temporal relations play an important role in NewsReader. Without a
proper time anchoring, we cannot compare one event to another. The same type of event
with the same participants on a different day is by definition not the same event: compare
telling the same story today and next week. As disjoint spatial boundaries define distinct
objects, so do disjoint temporal boundaries for events. Time objects for delineating events
are derived from the timex3 layer in NAF. There are two types of time expressions: DATE
and DURATION. Both have span elements to the words expressing it and can have value
attributes. DATE expressions have a value attribute that usually points to a specific
normalised ISO date. DURATION expressions usually have ISO periods as values but can
have optional attributes for beginPoint and endPoint whose values are DATE expressions
represented elsewhere. A special timex3 element is the document creation time (if known).
This is a DATE expression with the attribute functionInDocument=”CREATION TIME”
and usually the identifier tm0. The document creation time is derived from the meta data
of a document and has no span element pointing to its expression in the document. Some
examples for each type taken from different NAF files are given in Figure 19.
From the timex3 elements, we derive two types of instances: time:Instant and time:Interval.
We can only do this if we can obtain at least the year from the normalised value (the month
and day can remain underspecified). If the values are relative and the year is not explicit,
we cannot interpret the expression as a time object and we have to ignore it. The next
examples in Figure 20show the representation of time objects in RDF-TRiG derived from
different time expressions such as the ones shown above. Time expressions as well as the
document creation time are represented as instances with unique URIs. Instances of the
type time:Instant have a time:inDateTime relation to a date object, whereas instances of
time:Interval have a time:hasBeginning and/or time:hasEnd relation to a date. Dates are
represented as separate instances of the type time:DateTimeDescription with values for
the year, month and/or day according to owl-time4 .
Each event in SEM-RDF needs to be anchored to at least one time expression that
resolves to a time instance. Events without time anchoring are ignored in the output.
Anchoring relations are expressed using a sem:hasTime relation between an event URI
and a time expression URI. The triples are embedded inside a named graph just as the
participant-event relations we have seen before:
<http://www.telegraph.co.uk#tr2> {
<http://www.telegraph.co.uk#ev2> sem:hasTime <http://www.telegraph.co.uk#tmx0> .
}
We use the following heuristic to detect relations between events and time instances in
a NAF file:
1. The NAF layer temporalRelations provides explicit anchor relations between predicates and time expressions;
4
http://www.w3.org/TR/owl-time
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
29/148
<t i m e x 3 i d =”tmx0” t y p e=”DATE” f u n c t i o n I n D o c u m e n t=”CREATION\ TIME”
v a l u e =”2007−01−10T00 : 0 0 : 0 0 ” / >
<t i m e x 3 i d =”tmx1” t y p e=”DATE” v a l u e =”2007−01−10”>
<!−−J a n u a r y 10 , 2007−−>
<t a r g e t i d =”w7”/>< t a r g e t i d =”w8”/>< t a r g e t
</timex3>
i d =”w9”/>< t a r g e t
i d =”w10”/>
<t i m e x 3 i d =”tmx1” t y p e=”DURATION” b e g i n P o i n t =”tmx19 ” e n d P o i n t=”tmx0 ” v a l u e =”P10Y”>
<!−−10 y e a r s −−>
 <t a r g e t i d =”w70”/>< t a r g e t i d =”w71”/>
</timex3>
<t i m e x 3 i d =”tmx2” t y p e=”DATE” v a l u e =”PRESENT REF”>
<!−−now−−>
<t a r g e t i d =”w73”/>
</timex3>
<t i m e x 3 i d =”tmx2” t y p e=”DATE” v a l u e =”2008−09”>
<!−−September 2008−−>
<t a r g e t i d =”w76”/>< t a r g e t i d =”w77”/>
</timex3>
<!−−March−−>
</timex3>
<!−−June−−>
</timex3>
<t i m e x 3 i d =”tmx5” t y p e=”DURATION” e n d P o i n t=”tmx0 ” v a l u e =”PXM”>
<!−− r e c e n t months−−>
</timex3>
<t i m e x 3 i d =”tmx4” t y p e=”DURATION” b e g i n P o i n t =”tmx7 ” e n d P o i n t=”tmx0” v a l u e =”P8Y”>
<!−− e i g h t y e a r s −−>
</timex3>
<t i m e x 3 i d =”tmx5” t y p e=”DURATION” v a l u e =”PXY”>
<!−−t h e y e a r s −−>
</timex3>
<t i m e x 3 i d =”tmx6” t y p e=”DURATION” b e g i n P o i n t =”tmx8 ” e n d P o i n t=”tmx0” v a l u e =”P21Y”>
<!−−21 y e a r s −−>
</timex3>
<t i m e x 3 i d =”tmx7” t y p e=”DATE” v a l u e =”2003−10−20”/>
<t i m e x 3 i d =”tmx8” t y p e=”DATE” v a l u e =”1990−10−20”/>
Figure 19: Timex3 elements in NAF
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# document c r e a t i o n time , which h a s no m e n t i o n s
<nwr : w s j 1 0 1 3 . xml#tmx0>
a
rdfs : label
” nwr : t i m e / 1 9 8 9 1 0 2 6 ”
t i m e : inDateTime
nwr : t i m e / 1 9 8 9 1 0 2 6 .
30/148
;
# DURATION w i t h b e g i n and end p o i n t
<nwr : 4 M1J−3MC0−TWKJ−V1W8 . xml#tmx2>
a
time : I n t e r v a l ;
rdfs : label
” week ” ;
g a f : denotedBy
nwr : 4 M1J−3MC0−TWKJ−V1W8 . xml#c h a r =822 ,825 ,
nwr : 4 M1J−3MC0−TWKJ−V1W8 . xml#c h a r =826 ,830
t i m e : h a s B e g i n n i n g nwr : t i m e / 2 0 0 5 1 0 0 3 ;
t i m e : hasEnd
nwr : 2 0 0 6 1 0 0 2 .
,
,
;
# Q u a r t e r i n t e r p r e t e d a s I n t e r v a l w i t h b e g i n and end p o i n t
<nwr : w s j 1 0 1 3 . xml#tmx3>
a
time : I n t e r v a l ;
rdfs : label
” quarter ” ;
g a f : denotedBy
nwr : w s j 1 0 1 3 . xml#c h a r =363 ,366 ,
nwr : w s j 1 0 1 3 . xml#c h a r =367 ,373
nwr : w s j 1 0 1 3 . xml#c h a r =374 ,381 ;
t i m e : hasEnd
nwr : t i m e / 1 9 8 9 0 9 3 0 .
# DATE i n t e r p r e t e d a s a I n s t a n t
<nwr : w s j 1 0 1 3 . xml#tmx4>
a
rdfs : label
”earlier” ;
g a f : denotedBy
nwr : w s j 1 0 1 3 . xml#c h a r =401 ,402 ,
nwr : w s j 1 0 1 3 . xml#c h a r =403 ,407
nwr : w s j 1 0 1 3 . xml#c h a r =408 ,415 ;
nwr : t i m e /1988> .
<nwr : t i m e /19890701 >
a
t i m e : day
t i m e : month
time : unitType
time : year
t i m e : unitDay ;
<nwr : t i m e /19890930 >
a
t i m e : day
t i m e : month
time : unitType
time : year
t i m e : unitDay ;
<nwr : t i m e /1988>
a
time : unitType
time : year
t i m e : unitDay ;
,
,
Figure 20: SEM representations for time instants and time intervals
February 1, 2016
1
2
3
4
<t a r g e t i d =”p r 1”/>
anchorTime=”tmx2”><t a r g e t i d =”p r 6”/>
b e g i n P o i n t =”tmx2”><t a r g e t i d =”p r 9”/> 
e n d P o i n t=”tmx0”><t a r g e t i d =”p r 4 3”/>
Figure 21: Time anchoring of predicates in NAF
2. If there is no anchor relation through the temporalRelations, check the sentence in
which the event is mentioned for time expressions; else the preceding and following
sentence, and finally two sentences before the event mention;
3. If there is still no anchor relation, then attach the event to the document creation
time;
4. If there is also no document creation time, then anchor the event to the year zero:
0000-12-25.
The next example in Figure 21 shows how predicates are anchored to time expressions
in the temporalRelations layer in NAF, where the predicate identifier is given as the span
and the time expression as an attribute value for either the anchorTime, beginPoint or
endPoint.
To deal with the complexity of the temporal relations, we had to extend SEM with
more specific time relations. In addition to the generic sem:hasTime, events are linked
through any of the following relations:
sem:hasAtTime: we assume the event took place at this time Instant or during this
Interval;
sem:hasFutureTime: we assume the event took place in the future relative to this time
Instant object;
sem:hasEarliestBeginTime: we assume the event began at this time Instant object;
sem:hasEarliestEndTime: we assume the event ended at this time Instant object;
The procedure is as follows: we first check if the event is explicitly anchored in NAF
to a time expression, where:
1. an anchorTime attribute results in a sem:hasAtTime relation
2. a value for beginPoint results in a sem:hasEarliestBeginTime relation
3. an endPoint value in sem:hasEarliestEndTime.
If there is no such relation, we check if the factuality module located the event in the
future. If that is the case, we create a sem:hasFutureTime relation relative to the document
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
32/148
: e v e n t#Worked
sem : hasTime : tmxWeek .
: e v e n t#S l e p t
sem : h a s E a r l i e s t B e g i n T i m e
sem : h a s E a r l i e s t B e g i n T i m e
: tmxMonday ;
: tmxFriday .
: tmxWeek
a
time : I n t e r v a l
rdfs : label
” week ” ;
t i m e : hasEnd
nwr : t i m e / 1 9 8 9 0 7 0 7 .
;
: tmxMonday
a
rdfs : label
time : I n s t a n t
”Monday” ;
nwr : t i m e / 1 9 8 9 0 7 0 1 .
;
a
rdfs : label
time : I n s t a n t
” Friday ” ;
nwr : t i m e / 1 9 8 9 0 7 0 5 .
;
: tmxFriday
Figure 22: SEM relations between events and time expressions
,
creation time.5 For all other cases: the event is related to a time expression in the same or
a close sentence or it is simply related to the document creation time, we relate the event
through the sem:hasAtTime relation.
It is important to realise that intervals are represented in two different ways. An event
can have a sem:hasAtTime relation to an interval object or it can have a:
sem:hasEarliestBeginTime and/or sem:hasEarliestEndTime
relation to two different time instance objects. The former applies to cases where the
interval itself is explicitly referred to in the text, e.g. I worked for a week, whereas the
latter applies to cases where the interval is not mentioned directly in the text but the
boundaries of the event are mentioned, e.g. I slept from Monday till Friday. We represent
these cases as shown in Figure 22, assuming that the expressions have been normalised to
ISO dates. The begin and end points of intervals are therefore either found at the instance
level or at the SEM relation level. Although this complicates the querying of the data for
events at points in time, we believe it better expresses the way time intervals are represented
in language. To ease the retrieval of the events, we always provide a sem:hasTime relation
in addition to any of these specific relations. All events related to a specific point in time
can be retrieved through this relation, regardless of the specific time relation.
2.2
Identity across events
So far, we discussed the way data in a single NAF file is interpreted as SEM instances and
represented in RDF-TRiG. However, the GAF framework is designed to capture identity of
mentions across sources. This means that we need to compare SEM objects across different
5
Note that events can be located in the future in two ways: we either created an explicit
sem:hasFutureTime relation with respect to the document creation time as described here or there was
an explicit anchoring to a time that is in the future with respect to the document creation time. In the
latter case, the time is known whereas in the former case we only know it may happen after the document
creation time
February 1, 2016
33/148
sources to establish identity. For entities, non-entities and normalised time objects this
comes naturally through the way the URIs are defined. For these instances, we assume
that the URI is defining the identity and no further measures are taken.
In the case of events, this is more complicated. The blank URIs are meaningless across
documents and we need to define identity on terms of the triples defined for each event. We
follow Quine (1985) here who assumes that time and place are defining criteria. Without
time and place, actions are just denotations of abstract classes of concepts. They need
to be anchored in time and space to become instantiated. Furthermore, we assume that
the type of action and the participants play a role in establishing identity (Cybulska and
Vossen, 2013a). We therefore defined so-called Composite Event structure combining all
the information relevant for an event. It contains the following SEM components:
1. the event instance
2. the entity instances that participate in the event (both actors and locations)
3. time instants and intervals that apply to the event
4. the relations between the event and the other elements
Since the instance representations generalise over the mentions, relations between event
components are aggregated from different mentions. For example, the time of an event can
be mentioned in one sentence whereas a participant can be mentioned in another sentence
that is coreferential. Likewise, the instance representation is the aggregation of all the
information that directly relates to the events expressed over the complete document.
The Composite Event structures are compared across different NAF files to establish
cross-document coreference. For this comparison, again all the information related to the
instances is used. In case of identify, Composite Event structures are merged, if not, they
are kept separate. Figure 23 shows an overview of the two-step approach. The NAF
files represent mentions of events (labeled as e), entities that participate in these events
(labeled as p), time expressions linked to the events (labelled as t) and locations labeled
as l. As far as mentions have identity relations they are coreferential across the mentions
in a single NAF file. NAF2SEM then first creates SEM instances from these mentions in a
single NAF file and next compares the Composite Events based on the SEM representation
across NAF files to establish cross-document identity of events on the basis of the similarity
of the action, the participants and the time. The details of this approach are described in
more detail below.
We developed two different implementations of the above general approach. One for
batch processing in which an empty KnowledgeStore is filled with the result of processing
a fixed set of NAF documents. Another for a continuous stream of NAF documents, in
which the interpretation of incoming NAF files is compared to data already stored in the
KnowledgeStore. In the former case, a file structure is used to define which Composite
Events need to be compared. In the latter case, each Composite Event extracted from a
NAF file is compared to the data in the KnowledgeStore and the result is directly stored.
We describe both processes in more detail in next subsection.
February 1, 2016
34/148
Figure 23: Identity and event relations between NAF mentions and SEM instances using
Composite Event structures
February 1, 2016
2.2.1
35/148
Event comparison in batch mode
The main steps for batch processing are:
1. ClusterEventObjects
(a) Read each NAF file one by one;
(b) Create Composite Event objects from single NAF files;
(c) Store the Composite Events into temporal buckets as NAF objects;
2. MatchEventObjects
(a) Read all Composite Events from the NAF objects in the same temporal bucket;
(b) Compare the events to decide on cross-document event coreference and merge
data if necessary;
(c) Output the Composite event data to a SEM.trig file for each temporal bucket;
3. Populate the KnowledgeStore with the SEM.trig files;
In Figure 24, we show an overview of the batch processing architecture. From single
NAF files, we extract Composite Event structures which are stored in so-called temporal
buckets for the different main types of events: contextualEvent, source and grammatical
events (see below for more details). After processing all NAF files, the binary object files in
each bucket are loaded in memory and compared for identity. Identical events are merged.
The result is stored in SEM RDF-TRiG files. When the process has finished, the SEM
RDF-TRiG files are used to populate the KnowledgeStore.
After creating the time-event relations, we use the time relation to create temporal
buckets, where we apply the following rules:
1. Events without a temporal relation or with too many temporal relations are ignored.
The threshold for the maximum number of temporal relations is now set to 5. If
more than 5 time expressions are associated with the event, we assume the temporal
relation is too complex to interpret.
2. If there is a single time relation to a time:Instant Object, we use the date to create
a folder, e.g. e-1989-10-26 to store the event data. If the time Object is of the type
time:Interval, we create a period using the begin and end points, e.g. e-2005-10-032006-10-02.
3. If there is more than one time Object related to an event we check if the relations are
sem:hasEarliestBeginTime and sem:hasEarliestEndTime. These relations implicitly
define a period without the use of an interval expression, i.e. there is no linguistic
expression of the interval but only of the begin and end of the event. If so, we create
a period bucket by first taking the begin points and then taking the end points, e.g.
e-1989-10-12-1989-10-26. Note that events that have only a begin or end point, still
get a simple event bucket similar to events with a single event relation.
February 1, 2016
36/148
Figure 24: NAF2SEM batch processing overview
4. If the multiple time Objects are not the begin or end points, we create different time
buckets for each time object and the event is duplicated to each bucket. This means
that we assume that the event can be matched with different events at different points
of time with other events.6
The above time buckets are created within the basic event type distinction between
contextual, source and grammatical events except for those events that are positioned in
the future (see below). For each event, we check the associated frames against these lists to
decide on the main event type. If no frame is associated, the event is considered as contextual. Events with the sem:hasFutureEvent relation are stored in a separate folder without
distinguishing the basic event types. Figure 25 shows the structure that is created for storing Composite Event data in batch processing mode. The events folder is subdivided into
4 subfolders: contextualEvent, sourceEvent, grammaticalEvent and futureEvent. Within
each subfolder, binary object files (.obj) are stored in temporal buckets for events anchored
to that time from each NAF file with all the data relevant for that event.
6
There is a positive and negative side effect to this strategy. If the events get different instance URIs
they get automatically split. If the URIs remain the same they get merged back into a single instance
eventually. Whether this happens depends on the order of the comparison since the event instance URI is
based on the first URI used in the comparison. A future fix of this arbitrary effect is that merging events
forces the creation of a new unique URI. A side-effect of this fix is that split events will never get merged
and are treated as distinct events
February 1, 2016
37/148
Figure 25: Example of an event type and temporal bucket structure created by the
NAF2SEM module. First the events are divided into contextual, source and grammatical events, and within these into temporal buckets. In each temporal bucket, an object
file is created for the events from each NAF file that are associated with the time. Across
the different NAF object files, a single sem.trig file is created with all the event data.
February 1, 2016
38/148
The purpose of the division into event type and temporal buckets is to compare events
for cross-document event coreference, after which the SEM RDF-TRiG files are created.
We create a sem.trig file for each bucket. We apply different strategies for the different
event types and for future events. Events are considered identical:
IFF events have the same time anchoring;
IFF events share the same dominant senses or the same lemmas;
IFF events shared sufficient participants, which varies for the type of event and whether
events are labeled as future or not;
The temporal anchoring follows from the comparison of events within the same temporal
buckets. The events with the same time anchor are first compared for the type of action
expressed. Events are derived from the coreference sets in the NAF file and each Composite
Event can be based on more than one mention of the event. These mentions can have the
same lemma and share wordnet synsets. Across the Composite Events, we first check if
they have sufficient overlap in the wordnet synsets (defined by a threshold that can be
adapted, default is set to 50%). This prevents different meanings of the same word to be
matched for the wrong reasons. If there are no wordnet synsets associated with either of
the Composite Events, we check if the lemmas sufficiently overlap across the coreference
set (this can also be set through a proportional threshold, default 50%).7
Finally, events need to share participants. What participants and how many participants need to be matched can be specified through the API by specifying the role-labels
that need to be matched. For the different types of events, we follow the following principles:
1. source events must share the PropBank A0 participant to be coreferential. This
correlates to the source of the event. The target of the source event is more difficult
to compare because the descriptions are usually longer and can vary across mentions.8
2. grammatical events typically have PropBank A1 and A2 arguments that need to
match since these arguments usually refer to the main event, e.g. They stopped the
meeting on Monday, The meeting was stopped on Monday. We also assume that the
lemmas need to match instead of synsets since the meanings of these expressions are
abstract.
3. contextual events typically can vary with respect to the participants expressed. We
typically assume that the A1 role is most informative and needs to match. Depending
on the domain other roles can be choosen.
7
We also may have the synset of Lowest-Common-Subsumers (LCS) from NAF coreference sets that
have different lemmas. The LCS represents the hypernym synset that mention pairs share. Currently, this
is not used to match events.
8
A future extension could do a word vector comparison of the target roles of these events to constraint
coreference more
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
39/148
j a v a −Xmx2000m −cp . . / l i b / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r
eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . C l u s t e r E v e n t O b j e c t s
−−n a f−f o l d e r ” . . / t e s t ” −−e v e n t −f o l d e r ” . . / t e s t ” −−e x t e n s i o n ” . xml ” −−p r o j e c t c a r s
−−s o u r c e −f r a m e s ” . . / r e s o u r c e s / s o u r c e . t x t ” −−g r a m m a t i c a l −f r a m e s ” . . / r e s o u r c e s / g r a m m a t i c a l . t x t ”
−−c o n t e x t u a l −f r a m e s ” . . / r e s o u r c e s / c o n t e x t u a l . t x t ”
−−non− e n t i t i e s
eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . M a t c h E v e n t O b j e c t s
−−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / c o n t e x t u a l E v e n t ” −−match−t y p e i l i l e m m a −− r o l e s ” a n y r o l e ”
−−c o n c e p t −match 30 −−p h r a s e −match 10 −− i l i . . / r e s o u r c e s / i l i . t t l −−p e r s p e c t i v e
−−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / s o u r c e E v e n t ” −−match−t y p e i l i l e m m a −− r o l e s ” a0 , a1 ”
−−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / g r a m m a t i c a l E v e n t ” −−match−t y p e lemma −− r o l e s ” a1 , a2 ”
−−p h r a s e −match 50 −− i l i . . / r e s o u r c e s / i l i . t t l −−p e r s p e c t i v e
−−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / f u t u r e E v e n t ” −−match−t y p e lemma −− r o l e s ” a l l ”
−−p h r a s e −match 80 −− i l i . . / r e s o u r c e s / i l i . t t l −−p e r s p e c t i v e
Figure 26: NAF2SEM calls for clustering and matching with different paramters
1
2
3
4
5
6
7
8
9
10
11
eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . N o C l u s t e r E v e n t O b j e c t s
−−n a f−f o l d e r ” . . / t e s t ” −−e v e n t −f o l d e r ” . . / t e s t ” −−e x t e n s i o n ” . xml ” −−p r o j e c t c a r s
−−s o u r c e −f r a m e s ” . . / r e s o u r c e s / s o u r c e . t x t ” −−g r a m m a t i c a l −f r a m e s ” . . / r e s o u r c e s / g r a m m a t i c a l . t x t ”
−−c o n t e x t u a l −f r a m e s ” . . / r e s o u r c e s / c o n t e x t u a l . t x t ”
−−non− e n t i t i e s −−t o p i c s
−−e v e n t −f o l d e r ” . . / t e s t / e v e n t s / a l l ” −−match−t y p e i l i l e m m a −− r o l e s ” a n y r o l e ” −−t i m e ”month”
Figure 27: NAF2SEM calls without clustering and matching all events
4. future events need to have identical participants. Since there is no specific time
anchoring, we can only assume identity if all other information matches.
In Figure 26 we show the current settings that are used to call the function for creating the
binary object files for NAF files (ClusterEventObjects) and the functions for processing
the temporal buckets in each type of event (MatchEventObjects):
Alternatively, one can choose also not to cluster events using the class:
eu.newsreader.eventcoreference.naf.NoClusterEventObjects
and in addition use the temporal anchoring of the events as a matching constraint, as shown
in Figure 27. In the latter case, you can set the granularity of the temporal dimension to
year, month or day. Optionally, you can use the –topic parameter to subdivide into topics
within the main folder. Note that the –topic option also works for the above event-type
and temporal clustering option.
2.2.2
Event comparison in streaming mode
The NAF2SEM stream architecture has been introduced to enable real-time, incremental
processing of NAF documents. The batch processing mode works quite solid for transformNewsReader: ICT-316404
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
40/148
nwr : i n s t a n c e s {
<nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev13>
a
sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , wn : eng −14422488−n ,
wn : eng −13456567−n , wn : eng −13457378−n ,
f n : C h a n g e p o s i t i o n o n a s c a l e , e s o : QuantityChange , wn : eng −00203866−v ;
rdfs : label
” decline ” ;
g a f : denotedBy <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#c h a r =33,40> .
<nwr : d a t a / c a r s / non− e n t i t i e s /30+%25>
rdfs : label
”30 %” ;
<nwr : t i m e /20140611 >
a
t i m e : day
t i m e : month
time : unitType
time : year
t i m e : unitDay ;
<nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#tmx0>
a
...
}
<nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#t r 3 > {
sem : hasTime <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#tmx0> .
}
nwr : p r o v e n a n c e {
<nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#pr3 , r l 4 >
<nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#t r 1 0 >
g a f : denotedBy <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#> .
<nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#pr15 , r l 3 4 >
g a f : denotedBy <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#c h a r =296 ,330 > .
...
}
Figure 28: Example snippet of the TRiG stream generated by NAF2SEM
ing very big sets of news articles from NAF to SEM. However, in realistic scenarios the
news articles come incrementally, minute by minute, day by day. The batch architecture
is unable to handle this situation, as it requires the batch to be reprocessed each time new
NAFs need to be processed. Clearly, the streaming architecture is a promising solution for
this challenge. Once the initial set of NAF files would be processed and inserted using the
batch mode, the follow-up NAF files are being processed through a streaming architecture.
Unlike the batch architecture which mainly pays off once we have a significant amount of
NAF files, the stream architecture is meant to work with a small set of NAF files at a time
(maximum of 1000 at a time).
Figure 31 presents an overview of the streaming architecture. Once a new (set of)
NAF(s) has arrived, the NAF2SEM module is first fired to convert each NAF file into
an RDF-TRiG representation. In the streaming architecture, NAF2SEM simply creates
a TRiG set of events by looking at the current file. Similarly as in the batch mode, the
events are extracted together with their attached actors, relations and time expressions.
What is different than in the batch mode, is that the extracted events are not matched
with events coming from other news articles. An example snippet of a TRiG created by
NAF2SEM is given in Figure 28.
The output from the NAF2SEM module is then used as an input for the streaming
February 1, 2016
41/148
Table 1: Cross-document event coreference arguments for stream processing
Argument
–concept-match INT
–phrase-match INT
–contextual-match-type PATH
–contextual-lcs
–contextual-roles PATH
–source-match-type PATH
–source-lcs
–source-roles PATH
–grammatical-match-type PATH
–grammatical-lcs
–grammatical-roles PATH
–future-match-type PATH
–future-lcs
–recent-span INT
Comment
Threshold for conceptual matches of events,
default is 50.
Threshold for phrase matches of events,
default is 50.
Indicates what is used to match events across
resources. Default value is ILILEMMA.
Use lowest-common-subsumers. Default value is ON.
String with roles for which there must be a match,
e.g. ”pb:A1, sem:hasActor”
Indicates what is used to match events across resources.
Default value is ILILEMMA.
Use lowest-common-subsumers. Default value is OFF.
resources. Default value is LEMMA.
resources. Default value is ”LEMMA”
Amount of past days which are still considered recent and
are treated differently.
cross-document event coreference module. Apart from the SEM-TRiG input, the user is
also allowed to specify a rich set of matching arguments, such as match type and a list of
needed roles for each event type. The full list of input arguments for event coreference is
given in Table 1.
The cross-document event coreference module then processes the piped TRiG stream
based on the argument configuration specified by the user. This module essentially builds
a SPARQL SELECT query for each of the events from the TRiG, based on its components:
role participants, relations and time expressions (see Figure 32 for an example query). The
SPARQL query incorporates all criteria for matching of the current event with respect to
the KnowledgeStore. Once it has been composed, the query is sent to the KnowledgeStore.
Every result returned from the KnowledgeStore (if any) is a coreferential event to the event
in question. For each of these events, we create an owl:sameAs relation to the current
event, and append all the sameAs identity relations to the input TRiG in a separate graph
(nwr:identities). The nwr:identity graph is shown in Figure 29.
Now the TRiG contains all the events from the current news article and their identity
relations to already processed events in the KnowledgeStore. It can be the case that the
current event has no coreferential event yet in the KnowledgeStore: in this case, the event
has no entry in the graph with identity relations. Nevertheless, at this point the TRiG
is ready to be inserted into the KnowledgeStore. The module functionality hence finishes
with an INSERT request to the KnowledgeStore to store the resulting TRiG.
Once this is done, the new events still need to be merged to their coreferential events
(if any) in the KnowledgeStore. The remainder of the events (with no coreferential correspondent) is directly inserted in KS.
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
42/148
nwr : i d e n t i t y {
owl : sameAs <nwr : d a t a / c a r s /52 F6−KBJ1−DYWJ−P42R . xml#ev113> .
owl : sameAs <nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev47> .
<nwr : d a t a / 2 0 1 4 / 0 6 / 1 1 / 5CDF−W191−F15C−G01S . xml#ev50 tmx0>
}
Figure 29: Examples of owl:sameAs relations generated in streaming mode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/ b i n / bash
FILES = ” . . / n a f s t o p r o c e s s / ∗ . xml ”
f o r f i n $FILES ;
do
c a t $ f | j a v a −Xmx2000m −cp . . / t a r g e t / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r
eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . GetSemFromNafStream −−p r o j e c t c a r s −−s o u r c e −f r a m e s
” . . / r e s o u r c e s / s o u r c e . t x t ” −−g r a m m a t i c a l −f r a m e s ” . . / r e s o u r c e s / g r a m m a t i c a l . t x t ” −−c o n t e x t u a l −f r a m e s
” . . / r e s o u r c e s / c o n t e x t u a l . t x t ” −−non− e n t i t i e s −−timex−max 5 | j a v a −Xmx2000m −cp
. . / t a r g e t / E v e n t C o r e f e r e n c e −1.0−SNAPSHOT−j a r −with−d e p e n d e n c i e s . j a r
eu . n e w s r e a d e r . e v e n t c o r e f e r e n c e . n a f . P r o c e s s E v e n t O b j e c t s S t r e a m
−−c o n t e x t u a l −match−t y p e ”ILILEMMA” −−c o n t e x t u a l −l c s −−s o u r c e −match−t y p e ”ILILEMMA”
−−s o u r c e −r o l e s ”pb : A0”
−−g r a m m a t i c a l −match−t y p e ”LEMMA” −−g r a m m a t i c a l −r o l e s ”pb : A1”
−−c o n c e p t −match 25 −−p h r a s e −match 25 > . . / t r i g s / $ f . t r i g
done
Figure 30: Script for NAF2SEM in streaming architecture
An example call to the streaming architecture is shown in Figure 30.
The streaming architecture is an end-to-end architecture spreading from NAF to the
KnowledgeStore. It turns NewsReader into a real-time incremental system, ranging from
a news article to KnowledgeStore, in the following manner:
1. A fresh news article appears
2. This file is processed by the NewsReader pipeline.
3. The pipeline creates a NAF file, which is fed to NAF2SEM.
4. NAF2SEM extracts RDF-TRiG data from NAF and pipes this to the cross-document
Event coreference module.
5. This module communicates with the KnowledgeStore and decides on the final set
of events and their coreferential relations to the KS events. This is represented in
RDF-TRiG format.
6. KS finally “digests” the new RDF, by merging the coreferential events and by simply
inserting the events which do not have coreferential event in KS.
From a scientific perspective, this architecture introduces the flexibility to adjust the
manner of event matching and facilitates experimentation with different parameters. Through
February 1, 2016
43/148
Figure 31: Overview of the NAF2SEM and event coreference stream architecture.
Figure 32: Example SPARQL query used to match events from code in the KnowledgeStore.
February 1, 2016
44/148
the list of configuration arguments, we are offered an easy option to experiment with various matching options concerning: roles, matching thresholds, matching types, etc. This list
of arguments is furthermore not finite and may easily expand/shrink as we go along. We
are planning to experiment with using lowest-common-subsumers (LCS) in the matching
of event relations. We also plan to work towards using WordNet relations for coreference.
Additionally, time is an important factor to be further researched. We may want to introduce a recency relevance, which would first try to match an event to the most recent set
of events (from the last few days). This idea comes from the fact that the news articles
about an event often spread in a rather small interval of time. Another time aspect which
we may look into is the matching of future events.
As mentioned above, this architectural solution requires certain caution with respect to
the amount of files being fed to the system. Components from this architecture may easily
become bottlenecks, for instance, the KnowledgeStore may become slow if the frequency
and number of event merges is too high.
2.3
Evaluation
In this section, we present an evaluation of the quality of the SEM-RDF built with our
approach (forthcoming: Rospocher et al. (2016)). Due to the lack of a proper gold standard
to compare with, we relied on human judgment for the triples describing some randomly
sampled events of the graph. A similar approach was applied to evaluate YAGO2 Hoffart
et al. (2013), a large knowledge graph automatically extracted from Wikipedia pages.
We conducted an evaluation of the SEM-RDF triples extracted from the NewsReader
Wikinews corpus consisting of 120 news documents. We sampled 100 events from the
resulting RDF, splitting them in 4 groups of 25 events each. For each event, we retrieved
all its triples, obtaining 4 subgraphs (labeled: S1 , S2 , S3 , S4 ) of approx. 260 triples each.
Each subgraph was submitted to a pair of human raters, which independently evaluated
each triple of their subgraph. The triples of each subgraph were presented to the raters
grouped by event, and for each event the link to its corresponding mentions in the text
were provided, so that raters were able to inspect the original text to assess the correctness
of the extracted triples. In total, 8 human raters evaluated a total of 1,043 triples of the
RDF-SEM data, with each triple independently evaluated by two raters.
Raters were given precise criteria to follow for evaluating their subgraph. For instance,
in case of an event extracted from many event mentions, raters were instructed to first
assess if all its mentions actually refer to the same event: if at least one of these mentions
is referring to an event different than the other ones, all triples of the resulting instance
have to be considered incorrect.9 This is a quite strict and potentially highly-penalizing
criterion, if considered in absolute terms from a triple perspective: one “wrong” mention
out of many coreferring mentions, potentially contributing with few triples to the event,
may hijack all the triples of the corresponding event. There were for example several
instances in which 4 mentions were identified by the pipeline as referring to the same
9
A similar criterion was adopted for cases where something was wrongly identified as an event.
February 1, 2016
45/148
Table 2: Quality triple evaluation of SEM-RDF extracted from Wikinews.
S1
S2
S3
S4
All
Triples
267
256
261
259 1043
Accuracy 0.607 0.525 0.552 0.548 0.551
κ 0.623 0.570 0.690 0.751
event instance, of which 3 were indeed referring to the same instance. Due to our strict
evaluation method, all four mentions were considered incorrect. Performing a pairwise
evaluation would have been less strict, but as our goal is to accurately extract knowledge
graphs from text, and in particular to obtain correctly structured description of events, we
believe this criterion goes in this direction.
Table 2.3 presents the resulting triple accuracy on the whole evaluation dataset, as well
as the accuracy on each subgraph composing it, obtained as average of the assessment
of the each rater pair. For each subgraph, the agreement between the rater pair is also
reported, computed according to the Cohen’s kappa coefficient (κ).
The results show an overall accuracy of 0.551, varying between 0.525 and 0.607 on each
subgraph. The Cohen’s kappa values, ranging from 0.570 and 0.751, show a substantial
agreement between the raters of each pair. Drilling down these numbers on the type of
triples considered — typing triples (rdf:type), annotation triples (rdfs:label), participation
triples (properties modelling event roles according to PropBank, FrameNet, and ESO), the
accuracy on annotation triples is higher (0.772 on a total of 101 triples), while it is slightly
lower for typing (0.522 on 496 triples) and participation triples (0.534 on 446 triples).
Indeed, further drilling down on participation triples, the accuracy is higher for PropBank
roles (0.559) while it is lower on FrameNet (0.438) and ESO roles (0.407), which reflects
the fact that the SRL tool used is trained on PropBank, while FrameNet and ESO triples
are obtained via mapping.
Looking at the event candidates in the evaluation dataset, 69 of them (out of 100) were
confirmed as proper events by both raters. Of the 17 candidate coreferring events (i.e.
those having multiple mentions), only 4 of them were marked as correct by both raters
(i.e. both raters stated that all mentions were actually referring to the same event) while
in a couple of cases an event was marked as incorrect because of one wrong mention out
of 4, thus causing all the triples of the event to be marked as incorrect. To remark the
aforementioned strict evaluation criterion adopted, we note that ignoring all coreferring
events (and their corresponding triples) in the evaluation dataset, the triple accuracy rises
to 0.697 on a total of 782 triples. Table 3 shows the details for both the full evaluation
and when ignoring the even-coreference effects.
Finally, these numbers need to be interpreted in relation to the task-specific evaluation
of event detection and event coreference on benchmark corpora. In Wikinews recall of
event mention detection is about 77% against the gold annotation of events, while within
document event-coreference has a precision of 40.7%. This is a different type of evaluation
February 1, 2016
46/148
Table 3: Detailed quality triple evaluation of SEM-RDF extracted from Wikinews with
and without taking even-coreference into account.
Group 1
Trp
Avg
κ
ALL
267
0.607
0.623
TYPES
122
0.594
0.649
LABELS
28
0.768
0.7
ROLES
117
0.581
0.591
PROPBANK
39
0.628
0.629
FRAMENET
77
0.461
0.559
ESO
26
0.5
0.692
Without Event Coreference:
ALL
207
0.732
0.447
TYPES
91
0.731
0.483
LABELS
23
0.87
0.617
ROLES
93
0.699
0.419
PROPBANK
32
0.734
0.472
FRAMENET
56
0.58
0.387
ESO
17
0.706
0.433
Trp
256
115
26
115
39
73
25
188
91
20
77
25
58
20
Group 2
Avg
κ
0.525
0.57
0.539
0.585
0.788
0.661
0.452
0.509
0.423
0.633
0.438
0.445
0.4
0.5
0.601
0.566
0.925
0.558
0.56
0.457
0.375
0.491
0.578
0
0.37
0.516
0.41
0.468
Trp
261
137
24
100
30
90
38
184
96
17
71
20
70
21
Group 3
Avg
κ
0.552
0.69
0.504
0.65
0.729
0.684
0.575
0.735
0.583
0.796
0.489
0.645
0.368
0.435
0.731
0.672
0.912
0.768
0.825
0.607
0.571
0.683
0.649
0.638
0.724
0.828
0.615
0.438
Trp
259
122
23
114
28
105
29
203
93
19
91
23
79
21
Group 4
Avg
κ
0.548
0.751
0.578
0.748
0.804
0.862
0.465
0.718
0.554
0.928
0.424
0.63
0.345
0.545
0.7
0.758
0.974
0.582
0.674
0.563
0.476
0.625
0.561
0
0.639
0.901
0.512
0.432
Overall
Trp
Avg
1043
0.551
496
0.522
101
0.772
446
0.534
136
0.559
345
0.438
118
0.407
782
371
79
332
100
263
79
0.697
0.663
0.937
0.678
0.72
0.548
0.557
since the annotators only labelled the important events and systems tend to recognise all
possible events in text, which down-ranks the precision. We cannot expect coreference
results within a document and cross-document to exceed the recall of event mentions. In
the next section, we describe the evaluation for cross-document event coreference.
February 1, 2016
3
47/148
Event Coreference
Event coreference resolution is the task of determining whether two event mentions refer
to the same event instance. In this section, we describe a new, robust “bag of events”
approach to cross-textual event coreference resolution on news articles. We discuss two
variations of the approach, a one- and a two-step implementation.
We first delineate the approach and then describe some experiments with the new
method on the ECB+ data set. 10 In the following section, we describe the current
implementation within the NewsReader pipeline.
3.1
3.1.1
Bag of Events Approach
The Overall Approach
It is pretty much common practice to use information coming from event arguments for
event coreference resolution (Humphreys et al. (1997), Chen and Ji (2009a), Chen and Ji
(2009b), Chen et al. (2011), Bejan and Harabagiu (2010a), Lee et al. (2012), Cybulska
and Vossen (2013b), Liu et al. (2014) amongst others). The research community seems
to agree that event context information regarding time and place of an event as well as
information about other participants play an important role in resolution of coreference
between event mentions. Using entities for event coreference resolution is complicated by
the fact that event descriptions within a sentence often lack pieces of information. As
pointed out by Humphreys et al. (1997) it could be the case however that a lacking chunk
of information might be available elsewhere within discourse borders. News articles, which
are the focus of the NewsReader project, can be seen as a form of public discourse (van
Dijk (1988)). As such the news follows the Gricean Maxim of quantity (Grice (1975)).
Authors do not make their contribution more informative than necessary. This means that
information previously communicated within a unit of discourse, unless required, will not
be mentioned again. This is a challenge for models comparing separate event mentions
with one another on the sentence level. To be able to fully make use of information coming
from event arguments, instead of looking at event information available within the same
sentence, we propose to take a broader look at event descriptions surrounding the event
mention in question within a unit of discourse. For the purpose of this study, we consider
a document (a news article) to be our unit of discourse.
We experimented with an “event template” approach which employs the structure
of event descriptions for event coreference resolution. In the proposed heuristic event
mentions are looked at through the perspective of five slots, as annotated in the ECB+
dataset created within the NewsReader project (Cybulska and Vossen (2014b)). The five
slots correspond to different elements of event information such as the action slot (or event
trigger following the ACE (LDC (2005)) terminology) and four kinds of event arguments:
time, location, human and non-human participant slots (see Cybulska and Vossen (2014)).
10
The experiments with the two-step bag of events approach that are reported in this section are described in Cybulska and Vossen (2015).
February 1, 2016
48/148
The ECB+ corpus is used in the experiments described here. Our approach determines
coreference between descriptions of events through compatibility of slots of the five slot
template. The next quote shows an excerpt from topic one, text number seven of the ECB
corpus (Bejan and Harabagiu (2010b)).
The “American Pie” actress has entered Promises for undisclosed reasons.
The actress, 33, reportedly headed to a Malibu treatment facility on Tuesday.
Consider two event templates presenting the distribution of event information over the five
event slots in the two example sentences (tables 4 and 5).
An event template can be created on different levels of information, such as a sentence,
a paragraph or an entire document. We propose a novel “bag of events” approach to
event coreference that explicitly employs event- and discourse structure to account for
implications of Gricean Maxim of quantity. The approach fills in two event templates: a
sentence and a document template. A “sentence template” collects event information
from the sentence of an active action mention (tables 4 and 5). By filling in a “document
template”, one creates a “bag of events” for a document, that could be seen as a kind
of document “summary” (table 6). The bag of events heuristic employs clues coming
from discourse structure and namely those implied by discourse borders. Descriptions of
different event mentions occurring within a discourse unit, whether coreferent or related
in some other way, unless stated otherwise, tend to share elements of their context. In our
example text fragment the first sentence reveals that an actress has entered a rehab facility.
From the second sentence the reader finds out where the facility is located (Malibu) and
when the “American Pie” actress headed to the treatment center. It is clear to the reader
of the example text fragment from the quotation that both events described in sentence
one and two, happened on Tuesday. Also both sentences mention the same rehab center
in Malibu. These observations are crucial for the “bag of events” approach proposed here.
The bag of events method can be implemented as a one- or two-step classification. In
a two-step approach bag of events (document) features are used for preliminary document
clustering. Then per document cluster coreference is solved between action mentions in a
pairwise model, based on information available in the sentence. In a one-step implementation bag of events features are added to sentence-based feature vectors generated per
action mention. Coreference is solved by a classifier in a pairwise model.
3.1.2
Two-step Bag of Events Approach
As the first step of the approach a document template is filled, accumulating instances of
the five event slot mentions from a document, as exemplified in table 6. Pairs of document
templates are clustered by means of supervised classification. In the second step of the
approach coreference is solved between event mentions within document clusters created
in step 1. For this task again an event template is filled but this time, it is a “sentence
template” which per event mention gathers information from the sentence. A supervised
classifier solves coreference between pairs of event mentions and finally pairs sharing comNewsReader: ICT-316404
February 1, 2016
49/148
mon mentions are chained into coreference clusters. Figure 33 depicts the implications of
the approach for the training data. Figure 34 presents how the test set is processed.
Table 4: Sentence template ECB topic 1, text 7, sentence 1
Action
Time
Location
Human Participant
Non-Human Participant
entered
N/A
Promises
actress
N/A
Table 5: Sentence template ECB topic 1, text 7, sentence 2
Action
Time
Location
Human Participant
headed
on Tuesday
to a Malibu treatment facility
actress
N/A
Table 6: Document template ECB topic 1, text 7, sentences 1-2
Action
Time
Location
Human Participant
3.1.3
entered, headed
on Tuesday
Promises, to a Malibu treatment facility
actress
N/A
Step 1: Clustering Documents Using Bag of Events Features
The first step in this approach is filling in an event template per document. We create a
document template by collecting mentions of the five event slots: actions, locations, times,
human and non-human participants from a single document. In a document template there
is no distinction made between pieces of event information coming from different sentences
of a document and no information is kept about elements being part of different mentions.
A document template can be seen as a bag of events and event arguments. The template
stores unique lemmas, to be precise a set of unique lemmas per event template slot. On
the training set of the data, we train a pairwise binary classifier determining whether two
document templates share corefering event mentions.
This is a supervised learning task in which we determine “compatibility” of two document templates if any two mentions from those templates were annotated in the corpus
as coreferent. Let m be an event mention, and doc a collection of mentions from a single
document template such that {mı : 1 ≤ ı ≤ doc} where ı is the index of a mention and
February 1, 2016
50/148
Table 7: ECB+ statistics
ECB+
Topics
Texts
Action mentions
Location mentions
Time mentions
Human participant mentions
Non-human participant mentions
Coreference chains
#
43
982
6833
1173
1093
4615
1408
1958
Figure 33: Bag of events approach - training set processing
Figure 34: Bag of events approach - test set processing
February 1, 2016
51/148
 indexes document templates; doc : 1 ≤  ≤ DOC where DOC are all document templates from the corpus. Let ma and mb be mentions from different document templates.
“Compatibility” of a pair of document templates (doc, doc+1 ) is determined based on
coreference of any mentions (maı, mbı) from a pair of document templates such that:
coref erence(∃maı ∈ doc, ∃mbı ∈ doc+1 ) =⇒ compatibility (doc, doc+1 ).
On the training set we train a binary decision tree classifier (hereafter DT ) to find
pairs of document templates containing corefering event mentions. After all unique pairs
of document templates from the test set have been classified by means of the DT document
template classifier, “compatible” pairs are merged into document clusters based on pair
overlap.
3.1.4
Step 2: Clustering Sentence Templates
The aim of the second step is to solve coreference between event mentions from document
clusters which are the output of the classification task from step 1. We experiment with a
supervised decision tree sentence template classifier but this time in the classification task
pairs of sentence templates are considered. A sentence template is created for every action
mention annotated in the data set (see examples of sentence templates in table 4 and 5).
All possible unique pairs of event mentions (and their sentence templates) are generated
within clusters of document templates sharing corefering event mentions in the training
set. Pairs of sentence templates that translate into features indicating compatibility across
five template slots are used to train a DT sentence template classifier. On the test set; after
output clusters of the DT document template classifier from step 1 are turned to mention
pairs (all unique pairs within a document cluster), pairs of sentence templates are classified
by means of the DT sentence template classifier. To identify the final equivalence classes
of corefering event mentions, within each document cluster, event mentions are grouped
based on corefering pair overlap.
3.1.5
One-step Bag of Events Approach
In the one-step implementation of the approach all possible unique pairs of action mentions
from the corpus are used as the starting point for classification. No initial document
clustering is performed. For every action mention a sentence template is filled (see examples
in table 4 and 5). Also, for every corpus document a document template is filled. Five
bag of events features indicating the degree of overlap between documents, from which two
active mentions come from, are used for classification. In the one-step approach document
features are used by a classifier together with sentence-based features; all in one go. One DT
classifier is trained to determine event coreference. Pairs of mentions are classified based
on a mix of information from a sentence and from a document. Corefering pairs with
overlap are merged into equivalence classes. The one-step classification is implementationwise simpler but it is computationally much more expensive. Ultimately every action
mention has to be compared with every other action mention. This is a drawback of the
one-step method. On the other hand, it could be of advantage to have different types of
February 1, 2016
52/148
information (sentence- and document-based) available simultaneously to determine event
mention coreference.
3.1.6
Corpus
For the experiments we used true mentions from the ECB+ corpus (Cybulska and Vossen
(2014b)) which is an extended and re-annotated version of the ECB corpus (Bejan and
Harabagiu (2010b)). ECB+ is particularly interesting for this experiment because we
extended the ECB topics with texts about different event instances but from the same
event type (see Cybulska and Vossen (2014)). For example in addition to the earlier
mentioned topic of a celebrity checking into a rehab, we added descriptions of another
event involving a different celebrity checking into another rehab facility. Likewise, we
increased the referential ambiguity for the event mentions. Since the events are similar,
we expect that the only way to solve this is through analysis of the event slots. Figure
35 shows some examples of the seminal events represented in ECB+ with different event
instances.
Figure 35: Overview of seminal events in ECB and ECB+, topics 1-10
For the experiments on event coreference we used a subset of ECB+ annotations (based
on a list of 1840 selected sentences), that were additionally reviewed with focus on coreference relations. Table 7 presents information about the data set used for the experiments.
We divided the corpus into a training set (topics 1-35) and test set (topics 36-45).
3.1.7
Experimental Set Up
The ECB+ texts are available in the XML format. The texts are tokenized, hence no sentence segmentation nor tokenization needed to be done. We POS-tagged (for the purpose
of proper verb lemmatization) and lemmatized the corpus sentences. For the experiments
we used tools from the Natural Language Toolkit (Bird et al. (2009), NLTK version 2.0.4):
February 1, 2016
53/148
the NLTK’s default POS tagger, Word-Net lemmatizer11 as well as WordNet synset assignment by the NLTK12 . For machine learning experiments we used scikit-learn (Pedregosa
et al. (2011)).
Table 8: Features grouped into four categories: L-Lemma based, A-Action similarity, Dlocation within Discourse, E-Entity coreference and S-Synset based.
Event Slot
Action
Location
Mentions
Active
mentions
Sent. or doc.
mentions
Sent. or doc
mentions
Time
Sent. or doc
mentions
Human
Participant
Sent. or doc
mentions
NonHuman
Participant
Sent. or doc
mentions
Feature Kind
Lemma overlap (L)
Synset overlap (S)
Action similarity (A)
Discourse location (D)
- document
- sentence
Lemma overlap (L)
Synset overlap (S)
Lemma overlap (L)
Entity coreference (E)
Synset overlap (S)
Lemma overlap (L)
Synset overlap (S)
Lemma overlap (L)
Synset overlap (S)
Lemma overlap (L)
Synset overlap (S)
Explanation
Numeric feature: overlap %.
Numeric: overlap %.
Numeric: Leacock and Chodorow.
Binary:
- the same document or not.
- the same sentence or not.
Numeric: overlap %.
Numeric: overlap %.
Numeric: overlap %.
Numeric: cosine similarity.
Numeric: overlap %.
Numeric: overlap %.
Numeric: overlap %.
Numeric: overlap %.
Numeric: overlap %.
Numeric: overlap %.
Numeric: overlap %.
In the experiments different features were assigned values per event slot (see Table
8). The lemma overlap feature (L) expresses a percentage of overlapping lemmas between
two instances of an event slot, if instantiated in the sentence or in a document (with the
exclusion of stop words). Frequently, one ends up with multiple entity mentions from the
same sentence for an action mention (the relation between an action and involved entities
is not annotated in ECB+). All entity mentions from the sentence (or a document in
case of bag of events features) are considered. There are two features indicating action
mentions’ location within discourse (D), specifying if two active mentions come from the
same sentence and the same document. Action similarity (A) was calculated for a pair of
11
12
www.nltk.org/modules/nltk/stem/wordnet.html
http://nltk.org/ modules/nltk/corpus/reader/wordnet.html
February 1, 2016
54/148
active action mentions using the Leacock and Chodorow measure Leacock and Chodorow
(1998). Per entity slot (location, time, human and non-human participant) we checked if
there is coreference between entity mentions from the sentence of the two compared actions;
we used cosine similarity to express this feature (E). For all five slots a percentage of synset
overlap is calculated (S). In case of document templates features referring to active action
mentions were disregarded, instead action mentions from a document were considered. All
feature values were rounded to the first decimal point.
We experimented with a few feature sets, considering per event slot lemma features
only (L), or combining them with other features described in Table 8. Before fed to a
classifier, missing values were imputed (no normalization was needed for the scikit-learn
DT algorithm). All classifiers were trained on an unbalanced number of pairs of examples
from the training set. We used grid search with ten fold cross-validation to optimize the
hyper-parameters (maximum depth, criterion, minimum samples leafs and split) of the
decision-tree algorithm.
3.1.8
Baseline
We will look at two baselines: a singleton baseline and a rule-based lemma match baseline.
The singleton baseline considers event coreference evaluation scores generated taking into
account all action mentions as singletons. In the singleton baseline response there are no
“coreference chains” of more than one element. The rule-based lemma baseline generates
event coreference clusters based on full overlap between lemma or lemmas of compared
event triggers (action slot) from the test set. Table 10 presents baselines’ results in terms of
recall (R), precision (P) and F-score (F) by employing the coreference resolution evaluation
metrics: MUC (Vilain et al. (1995)), B3 (Bagga and Baldwin (1998)), CEAF (Luo (2005)),
BLANC (Recasens and Hovy (2011)), and CoNLL F1 (Pradhan et al. (2011)). When
discussing event coreference scores must be noted that some of the commonly used metrics
depend on the evaluation data set, with scores going up or down with the number of
singleton items in the data Recasens and Hovy (2011). Our singleton baseline gives us zero
scores in MUC, which is understandable due to the fact that the MUC measure promotes
longer chains. B3 on the other hand seems to give additional points to responses with
more singletons, hence the remarkably high scores achieved by the baseline in B3. CEAF
and BLANC as well as the CoNLL measures (the latter being an average of MUC, B3 and
entity CEAF) give more realistic results. The lemma baseline reaches 62% CoNLL F1. A
baseline only considering event triggers, will allow for an interesting comparison with our
event template approach, employing event argument features.
3.1.9
Results
Table 9 evaluates the final clusters of corefering event mentions produced in the experiments
by means of the DT algorithm when employing different features.
When considering bag of events classifiers using exclusively lemma features L (row two
and three), the two-step approach reached a 1% higher CoNLL F-score than the one-step
February 1, 2016
55/148
approach with document-based lemma features (docL). The one-step method achieved in
BLANC a 2% better precision but a 2% lower recall. This is understandable. In a two-step
implementation when document clusters are created some precision is lost. In a one-step
classification specific sentence information is always available for the classifier hence we see
slightly higher precision scores (also in other metrics).
The best coreference evaluation scores with the highest CoNLL F-score of 73% and
BLANC F of 72% were reached by the two-step bag of events approach with a combination
of the DT document classifier using feature set L (document-based hence docL) across five
event slots and the DT sentence classifier when employing features LDES (see Table 8 for
a description of features). Adding action similarity (A) on top of LDES features in step
two, does not make any difference on decision tree classifiers with a maximum depth of 5
using five slot templates. Our best CoNLL F-score of 73% is an 11% improvement over
the strong rule based event trigger lemma baseline, and a 34% increase over the singleton
baseline.
Table 9: Bag of events approach to event coreference resolution, evaluated on the ECB+
in MUC, B3, mention-based CEAF, BLANC and CoNLL F measures.
Alg
DT
DT
DT
DT
Step1
Slot
Nr
5
5
2
5
Features
docL
docL
docL
docL
Alg
DT
DT
DT
DT
DT
DT
Step2
Slot
FeaNr
tures
5
L
5
L+docL
5
L
5
LDES
2
LDES
5
LADES
R
61
65
71
71
76
71
MUC
P
F
76
80
75
75
70
75
68
71
73
73
73
73
R
B3
P
F
CEAF
F
R
66
68
71
71
74
71
79
83
77
78
68
78
72
75
74
74
71
74
61
64
64
64
61
64
67
69
71
72
74
72
BLANC
P
F
69
73
71
71
68
71
68
71
71
72
70
72
CoNLL
F
70
72
73
73
70
73
Table 10: Baseline results on the ECB+: singleton baseline and lemma match of event
triggers evaluated in MUC, B3, mention-based CEAF, BLANC and CoNLL F.
Baseline
Singleton Baseline
Action Lemma Baseline
R
0
71
MUC
P
F
0
0
60
65
R
45
68
B3
P
100
58
F
62
63
CEAF
R/P/F
45
51
R
50
65
BLANC
P
F
50
50
62
63
CoNLL
F
39
62
To quantify the contribution of document features, we contrast the results of classifiers
using bag of events features with scores achieved when disregarding document features.
The results reached with sentence template classification only (without any document
features, row one in table 9), give us some insights into the impact of the document
features on our experiment. Note that one-step classification without preliminary document
template clustering is computationally much more expensive than a two-step approach,
which ultimately takes into account much less item pairs thanks to the initial document
template clustering. The DT sentence template classifier trained on an unbalanced training
February 1, 2016
56/148
set reaches 70% CoNLL F. This is 8% better than the strong baseline disregarding event
arguments, but only 3% less than the two-step bag of events approach and 2% less than
the one-step classification with document features. The reason for the relatively small
contribution by document features could owe to the fact that in the ECB+ corpus not
that many sentences are annotated per text. 1840 sentences are annotated in 982 corpus
texts, i.e. 1.87 sentence per text. We expect that the impact of document features would
be bigger, if more event descriptions from a discourse unit were taken into account than
only the ground truth mentions.
We run an additional experiment with the two-step approach in which four entity types
were bundled into one entity slot. Locations, times, human and non-human participants
were combined into a cumulative entity slot resulting in a simplified two-slot template.
When using two-slot templates for both, document and sentence classification on the ECB+
70% CoNLL F score was reached. This is 3% less than with five-slot templates.
Table 11: Best scoring two-step bag of events approach, evaluated in MUC, B3, entitybased CEAF, BLANC and CoNLL F in comparison with related studies. Note that the
BOE approach uses gold while related studies system mentions.
Approach
Data
B&H
LEE
BOE-2
BOE-5
BOE-2
BOE-5
ECB B&H 2010
ECB Lee et al. 2012
ECB annot. ECB+
ECB annot. ECB+
ECB+
ECB+
Model
HDp
LR
DT+DT
DT+DT
DT+DT
DT+DT
MUC
R
52
63
65
64
76
71
P
90
63
59
52
70
75
B3
F
66
63
62
57
73
73
R
69
63
77
76
74
71
P
96
74
75
68
68
78
CEAF
F
80
68
76
72
71
74
F
71
34
72
68
67
71
BLANC
R
NA
68
66
65
74
72
P
NA
79
70
66
68
71
CoNLL
F
NA
72
67
65
70
72
F
NA
55
70
66
70
73
To the best of our knowledge, the only related study using clues coming from discourse
structure for event coreference resolution was done by Humphreys et al. (1997) who perform
coreference merging between event template structures. Both approaches determine event
compatibility within a discourse representation but we achieve that in a different way, with
a much more restricted template (five slots only) which in our two-step approach facilitates
merging of all event and entity mentions from a text as the starting point. Humphreys et
al. consider discourse events and entities for event coreference resolution while operating
on the level of mentions, more similar to our one-step approach. They did not report any
event coreference evaluation scores.
Some of the metrics used to score event coreference resolution are dependent on the
number of singleton events in the evaluation data set (Recasens and Hovy, 2011). Hence
for the sake of a meaningful comparison it is important to consider similar data sets. The
ECB and ECB+ are the only available resources annotated with both: within- and crossdocument event coreference. To the best of our knowledge no baseline has been set yet
for event coreference resolution on the ECB+ corpus. So in Table 11 we will also look at
results achieved on the ECB corpus which is a subset of ECB+, and so the closest to the
data set used in our experiments but capturing less ambiguity of the annotated event types
February 1, 2016
57/148
(Cybulska and Vossen, 2014b). We will focus on the CoNLL F measure that was used for
comparison of competing coreference resolution systems in the CoNLL 2011 shared task.
The best results of 73% CoNLL F were achieved on the ECB+ by the two-step bag of
events approach using five slot event templates (BOE-5 in Table 11). When using twoslot templates we get 3% less CoNLL F on ECB+. For the sake of comparison, we run
an additional experiment on the ECB part of the corpus (annotation by Cybulska and
Vossen (2014b)). The ECB was used in related work although with different versions of
annotation so not entirely comparable. We run two tests, one with the simplified templates
considering two slots only: action and entity slot (as annotated in the ECB by Lee et al.
(2012)) and one with five-slot templates. The two slot bag of events (BOE-2 ) on the ECB
part of the corpus reached comparable results to related works: 70% CoNLL F, while the
five-slot template experiment (BOE-5 ) results in 66% CoNLL F. The approach of Lee et
al. (2012) (LEE ) using linear regression (LR) reached 55% CoNLL F although on a much
more difficult task entailing event extraction as well. The component similarity method
of Cybulska and Vossen (2013b) resulted in 70% CoNLL F but on a simpler within topic
task (not considered in Table 11). B&H in the table refers to the approach of Bejan and
Harabagiu (2010) using hierarchical Dirichlet process (in the table referred to by HDp); for
this study no CoNLL F was reported. In the BOE experiments reported in Table 11 we
used the two-step approach. During step 1 only (document based) lemma features (docL)
were used and for sentence template classification (step 2) LDES features were employed.
In the tests with the bag of events approach, ground truth mentions were used.
3.1.10
Conclusion
In this section we experimented with two variations of a new bag of events approach to event
coreference resolution: a one-step method and a higher scoring two-step heuristic. Instead
of performing topic classification before solving coreference between event mentions, as
is done in most studies, the two-step bag of events approach first compares document
templates created per discourse unit and only after that, does it compare single event
mentions and their arguments. In contrast to a heuristic using a topic classifier, that
might have problems distinguishing between different instances of the same event type, the
bag of events approach facilitates context disambiguation between event mentions from
different discourse units. Grouping events depending on compatibility of event context
(time, place and participants) on the discourse level, allows one to take advantage of event
context information, which is mentioned only once per unit of discourse and consequently is
not always available on the sentence level. From the perspective of performance, the robust
two-step bag of events approach using a very small feature set, also significantly restricts
the number of compared items. Therefore, it has much lower memory requirements than
a pairwise approach operating on the mention level. Given that this approach does not
consider any syntactic features and that the evaluation data set is only annotated with 1.8
sentences per text, the evaluation results are highly encouraging.
February 1, 2016
3.2
58/148
Evaluation of the NewsReader pipeline
The ’bag-of-events’ approach described in the previous sections uses the annotations of the
event components. By taking the gold annotation, we can more purely evaluate the impact
of the approach without any impact of the other processing that may introduce errors. In
this section, we describe the performance of the NewsReader pipeline as is on the same
ECB+ data set starting from the text. The same tokenized text was processed with the
NewsReader pipeline version 3.0 (Agerri et al. (2015)). There were 4 files out of 982 for
which the pipeline gave no output. This resulted in 20 events that were not recovered.
The starting point for the cross-document coreference is the intra-document coreference
layer in NAF. Event coreference sets are generated using the EventCoreference module that
makes a distinction between the event types contextualEvent, sourceEvent and grammaticalEvent. For the latter two, no coreference relations are generated within the same document. For contextualEvents, we first group all lemmas into a candidate coreference set,
next decide on the dominant sense of the lemma and finally measure the similarity across
candidate coreference sets. If the dominant senses of lemmas are sufficiently similar, the
candidate sets are merged. Dominant senses are derived from all occurrences of a lemma
in a document by cumulating the WSD score of each occurrence. We take those senses
with the 80% highest cumulated WSD scores as the dominant senses (program setting: –
wsd 0.8). When we compare different lemma-based coreference sets we use these dominant
senses to measure similarity in WordNet according to the Leacock-Chodorow method (Leacock and Chodorow (1998)). We use the hypernym relations and the cross-part-of-speech
event relations from WordNet to establish similarity (program setting: –wn-lmf wneng30.lmf.xml.xpos). The threshold for similarity was set to 2.0 (program setting –sim 2.0). If
different lemmas are considered coreferential, we store the lowest-common-subsumer synset
that established the similarity as an external reference in the coreference set. Once all the
coreference sets are established (both singletons and multiforms), we add all the hypernym
synsets for the dominant senses as external references.
For the cross-document evaluation, we merged all the ECB+ CoNLL files from each
topic into a single key file, effectively mixing ECB and ECB+ files into a single task. Since
ECB+ reflects a systematic referential ambiguity for seminal topic events, we thus create a
task that reflects this ambiguity as well. Since ECB has 43 different topics, 43 unique key
files were created. Below, we show a fragment from such a key file for topic 1 in which Tara
Reid is checking into rehab in the ECB file number 10, whereas Lindsay Lohan is checking
into rehab in the ECB+ file number 10. The identifiers in the key file are created using
the CROMER tool across the different documents, each annotated in NAF, as explained
in Cybulska and Vossen (2014b).13
#begin document (1);
1_10ecb 0 1 Perennial 1_10ecb 0 2 party 1_10ecb 0 3 girl 1_10ecb 0 4 Tara -
13
Note that we reduced multiword phrases such as checked into to the first token only, since the NewsReader system does not mark multiwords as events. Hence in the example below, we removed the idenitifer
(132016236402809085484) from token 1 10ecb 0 8 into.
February 1, 2016
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
0
0
0
0
0
0
0
0
0
0
0
0
0
5 Reid 6 checked (132016236402809085484)
7 herself 8 into 9 Promises 10 Treatment 11 Center 12 ,13 her 14 rep 15 told (132016235311629112331)
16 People 17 . -
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
59/148
A friend of the actress told (110088372)
People she went (132016236402809085484)
to Promises on Tuesday and that her friends and family supported (110088386)
her decision (132016236402809085484)
. -
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1
1
1
1
1
1
1
30
31
32
33
34
35
36
Lindsay Lohan checks (132015738091639707092)
into Betty Ford Center -
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
After skipping out on entering (132015832182464413376)
a Newport Beach rehabilitation facility and facing (132015992713150306962)
the prospect of arrest for violating (132015992916565253818)
her probation (132015993252785693471)
,Lindsay Lohan has checked (132015738091639707092)
into the Betty Ford Center to begin (132015992988761097172)
a 90 -day court -mandated stay (132015736700251185985)
in her -
February 1, 2016
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
3
3
3
3
83
84
85
86
60/148
reckless driving (132015732766174435569)
conviction (132015993650409892802)
. -
The NAF2SEM module from NewsReader was used to generate SEM-RDF files for
each topic by processing all the NAF files within that topic. From the SEM-RDF files, we
extract unique numerical identifiers from the event instances and insert them in a CoNLL
response file for the tokens that form the mentions of the event. Below we show the same
fragment for topic 1 with the output from the NewsReader system added to the tokens:
#begin document (1);
1_10ecb 0 1 Perennial 1_10ecb 0 2 party 1_10ecb 0 3 girl 1_10ecb 0 4 Tara 1_10ecb 0 5 Reid 1_10ecb 0 6 checked (139)
1_10ecb 0 7 herself 1_10ecb 0 8 into 1_10ecb 0 9 Promises 1_10ecb 0 10 Treatment 1_10ecb 0 11 Center 1_10ecb 0 12 ,1_10ecb 0 13 her 1_10ecb 0 14 rep 1_10ecb 0 15 told (139)
1_10ecb 0 16 People 1_10ecb 0 17 . 1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
1_10ecb
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
A friend of the actress told (139)
People she went to Promises (445)
on Tuesday and that her friends and family supported (239)
her decision (18)
. -
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1
1
1
1
1
1
1
30
31
32
33
34
35
36
Lindsay Lohan checks (153)
into Betty Ford Center -
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
After skipping (399)
out on entering (173)
a Newport Beach rehabilitation facility and facing (499)
the prospect (265)
of arrest (42)
for violating (138)
her -
February 1, 2016
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
1_10ecbplus
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
61/148
probation (367)
,Lindsay Lohan has checked (153)
into the Betty Ford Center to begin (464)
a 90 -day court -mandated stay (407)
in her reckless driving conviction (10)
. -
We used the latest version (v8.01) of the official CoNLL scorer (Luo et al., 2014) to
compare the response output with the key data. We generate the BLANC score for each
topic and then macro average the results across all topics. In the next subsection, we
describe the results for the NewsReader pipeline without any adaptation. We vary the
parameters in the NAF2SEM module to see the impact on the performance. In subsection
3.2.2, we measure the result of augmenting the event detection in the NewsReader pipeline
output.
3.2.1
NewsReader output
We first applied the NAF2SEM system on the NewsReader pipeline output as is. Since
ECB+ is an experimental data set with systematic but limited ambiguity (each seminal
event mention within a topic can have two referential values), we ran the NAF2SEM
system by comparing all CompositeEvents within each topic, using temporal anchoring as
an additional parameter. The CompositeEvent RDF data consist of:
• Event action data
– the WordNet synsets and ontology types associated to all event mentions based
on the intra-document event coreference sets
– the labels used to make reference to the event action and the most frequent
label as the preferred label
– all the mentions in the documents
• Participant data
– the URI of the participant, either derived from DBpedia or created from the
phrase as an entity or non-entity
February 1, 2016
62/148
– the labels used to make reference to the entity and the most frequent label as
the preferred label
• Time data
– the URI of the time expression
– the labels used to make reference to the time
– the owl-time object that represents the normalised ISO value for the time expression, with a specification of at least the year and possibly the month and
day
• SEM relations
– Any specific PropBank, FrameNet or ESO role between event actions and participants
– Any sem:hasTime relations between event actions and time expressions
When comparing the CompositeEvents across documents, we can use any of the above
features to compare events. We experimented with a number of parameters to measure
their impact on the performance of the system:
Event matching Event mentions across documents need to match proportionally in terms
of associated synsets, lemmas or combinations of these
Temporal anchoring No temporal constraint, same year, same month or same day
Participant match No participant constraint, at least one participant needs to match,
either through the URI or through the preferred label
For Event matching, we can apply different strategies: compare the WordNet synsets
(program parameter: –match-type ili), lemmas of mentions (program parameter: –matchtype lemma) or a combination of synsets and lemmas (program parameter: –match-type
ililemma). For using synsets associated with the coreference sets, we can choose the dominant senses, the lowest-common-subsumers (program parameter: –lcs) and the hypernyms
of the dominant senses ((program parameter: –hypers).
For the Temporal anchoring, we can set the granularity to years, months and days.
This means that, depending on the amount of detail of the time anchoring, events need to
have matching time according to these settings. When we leave out the –time value, the
time anchoring is not considered.
Finally for participants, we can define the precise roles to be matched (PropBank,
VerbNet, FrameNet, ESO), any role to be matched or none. To match, the program
February 1, 2016
63/148
considers the role label and the role object separately. If the label is specified (e.g. program
parameter: –roles a0,a1,a2), we need to match at least one participant object with that
label. If no label is specified (program parameter: –roles anyrole), at least one participant
needs to match regardless of the role label. Matching the objects is done on the basis of the
URI. If the URIs do not match, we check of the preferred label of a participant is among
the labels of the other participant.
Event coreference depends on the detection of the events. The coreference scorer (Luo
et al., 2014) also gives the scores for the detection of the mentions. Without adaptation,
the event mention recall for the NewsReader output is 72,66% and the precision is 66,29%
(F1 69,01%). For the results below, we cannot expect the coreference to perform above
the results for the event mention detection. The event mention detection defines the upper
bound.
We first experimented with the proportion of WordNet synsets (program parameter:
–match-type ili, –hypers, –lcs) that need to match across two CompositeEvents. Using
these settings, events only match if they share some of the meanings as scored by the WSD
system or through direct hyernyms or lowest-common-subsumers. Events are not matched
through their lemma We set the time constraint to match the year and month and the
participant have at least one match regardless of the role. We varied the proportion of
WordNet synsets to match in steps of 10 between 0% to 100%. The threshold defines
minimal the proportion of synsets that needs to be associated for both compared events to
be merged. Likewise, events with excessive synsets cannot absorb events with few synsets.
The results are shown in Table 12 and in Figure 36.
Table 12: BLANC refererence results macro averaged over ECB+ topics in terms of recall
(R), precision (P) and F1 (F) for NewsReader output with different proportions of WordNet
synsets to match: S=only synset matches, SL= Synsets matches if synsets and lemma
matches if no synsets associated, L=lemmas only. Different columns represent proportions
in steps of 10% from 1% to 100%.
R-SL
R-S
R-L
P-SL
P-S
P-L
F-SL
F-S
F-L
1
36.07%
36.04%
27.60%
30.57%
30.95%
46.12%
30.42%
30.78%
24.77%
10
36.14%
35.97%
20
35.98%
35.77%
37.60%
38.58%
39.40%
39.96%
34.22%
34.31%
34.50%
34.49%
30
35.74%
35.68%
27.52%
41.25%
41.35%
45.98%
34.79%
34.72%
24.63%
40
35.59%
35.40%
50
35.42%
35.17%
60
35.12%
35.09%
70
34.84%
34.82%
80
34.76%
34.70%
90
34.53%
34.41%
41.64%
41.72%
42.13%
42.07%
42.13%
42.18%
42.37%
42.44%
42.54%
42.54%
42.58%
42.55%
34.71%
34.56%
34.72%
34.47%
34.43%
34.40%
34.24%
34.22%
34.19%
34.13%
34.03%
33.88%
100
34.48%
34.33%
27.49%
42.83%
42.81%
45.61%
33.99%
33.82%
24.57%
The highest recall is obtained using first synsets and lemmas in addition with 10%
overlap (R-SL=36.14%). We can see that recall drops when more overlap is required. We
see the opposite for precision but the highest precision is obtained using solely the lemmas,
where 1% overlap is sufficient (R-L=46.12%).14 The highest f-measure is obtained using
14
We did not test all proportions of overlap for lemmas because there is 1.7 lemma per coreference set
February 1, 2016
64/148
Figure 36: Impact of more strict WordNet synset matching on the macro average BLANC
for recall (R), precision (P) and F1 for NewsReader output
first synsets and lemmas in addition with 30% overlap (F-SL=34.79%). Differences are
however very small. We used the latter settings in the further experiments described
below.
Next, we varied the participant constraints and their roles, whether or not hypernyms
and lowest-common-subsumers can be used to match actions and the temporal constraints.
We used the following combinations of properties, where we kept the settings for WordNet
synsets and lemmas to match for 30% proportionally (which gave the best f-measure so
far):
AR-H-L-M AR= a single participant match and role is not considered, H=Hypernyms,
L=Lowest-common-subsumer, M = Month
AR—M AR= a single participant match and role is not considered, Hypernyms and
Lowest-common-subsumer are not considered, HM = Month
AR–L-M AR= a single participant match and role is not considered, Hypernyms are not
considered, L=Lowest-common-subsumer, M = Month
AR-H–M AR= a single participant match and role is not considered, H=Hypernyms,
Lowest-common-subsumer are not considered, M = Month
AR-H-L- AR= a single participant match and role is not considered, H=Hypernyms,
L=Lowest-common-subsumer, time is not considered
on average. It makes little difference to have 1% or 100% overlap.
February 1, 2016
65/148
AR-H-L-Y AR= a single participant match and role is not considered, H=Hypernyms,
L=Lowest-common-subsumer, Y = Year
AR-H-L-D AR= a single participant match and role is not considered, H=Hypernyms,
L=Lowest-common-subsumer, D = Day
-H-L-M participants are not considered, H=Hypernyms, L=Lowest-common-subsumer,
M = Month
A0-H-L-M A0= a single participant match and role should be A0, H=Hypernyms, L=Lowestcommon-subsumer, M = Month
A0A1-H-L-M A0A1= a participant and role match for A0 and A1, H=Hypernyms,
A0A2-H-L-M A0A2= a participant and role match for A0 and A2, H=Hypernyms,
A1A2-H-L-M A1A2= a participant abd role match for A1 and A2, H=Hypernyms,
A0A1A2-H-L-M A0A1A2= a participant and role match for A0, A1 and A2, H=Hypernyms,
In Table 13, we first show the impact of using hypernyms and lowest-common-subsumbers
for matching event actions. We kept the participant constraint stable to require a single
participant to match regardless of the role. The time is first set to month matching. In the
second part of the table, we maintained the participant constraint and also fixed the use
of hypernyms and lowest-common-subsumers. In this case, we varied the time constraint
to no time constraint, year, month and day. In all cases, we use WordNet hypernyms and
lemmas for event action matching with a proportion of 30% (SL30).
We first of all observe that the differences are small. The differences in recall and
f-measure are not significant. Most notably, adding the day as time-constraint give the
highest precision: 45.12% but als the lowest recall and f-measure. In Table 13, we kept
the standard setting for hypernyms and lowest-common-subsumer with the time constraint
to month matching but now varied the specification of the roles for which there should be
a participant match. In principle, we can test roles from PropBank, FrameNet and ESO.
However since the PropBank roles are more general and are always given, we restricted the
testing to the most important PropBank roles A0, A1 and A2. We also tested combination
of roles: A0A1, A0A2 and A1A2. The first column (NR) represents no role restriction.
February 1, 2016
66/148
(R-SL30), precision (P-SL30) and F1 (F-SL30). AR is stable across the results, meaning that a single participant in any role needs to match. We varied the hypernyms (H)
and lowest-common-subsumer (L) for action matches and the time constraints: no time
constraint (NT), year (Y), month (M) and day (D)
R-SL30
P-SL30
F-SL30
AR—M
35.77%
40.78%
34.68%
AR–L-M
35.79%
41.00%
34.74%
AR-H–M
35.72%
40.98%
34.71%
AR-H-L-NT
35.78%
41.26%
34.84%
AR-H-L-Y
35.74%
41.25%
34.80%
AR-H-L-M
35.74%
41.25%
34.79%
AR-H-L-D
30.40%
45.12%
29.35%
Again in all cases, we use WordNet hypernyms and lemmas for event action matching with
a proportion of 30% (SL30).
(R-SL30), precision (P-SL30) and F1 (F-SL30). The hypernyms (H), lowest-commonsubsumer (L) and time constraint month (M) are kept stable. We varied the roleparticipant constraints: NR=no constraint, A0 role participant should match, A1 should
match, A2 should match, A0 and A1 should match, A0 and A2 should match, A1 and A2
should match
R-SL30
P-SL30
F-SL30
NR-H-L-M
41.54%
36.21%
37.58%
A0-H-L-M
30.58%
42.57%
29.20%
A1-H-L-M
31.53%
46.33%
30.76%
A2-H-L-M
28.48%
43.62%
26.06%
A0A1-H-L-M
28.66%
48.74%
26.60%
A0A2-H-L-M
27.48%
46.09%
24.56%
A1A2-H-L-M
27.48%
46.09%
24.56%
Using no constraints on the participant and their role give the highest recall (41.54%)
and f-measure (37.58%) so far. The results are even higher than for a single participant
in any role (recall 35.74% and f-measure 34.79%), although the precision is lower. We
can observe that adding more specific role constraints lowers the recall and increases the
precision, with both prime participants required (A0A1) giving highest precision so far:
48.75%. Note that such a constraint can only be applied to semantic role structures where
both participants have been detected in the sentence. There are many cases where either
the A0 or A1 is not expressed or not recovered.
Concluding In general, we can conclude that ECB+ is probably not rich enough to
see any impact of constraints at a very specific level. Since the referential ambiguity is
restricted to two seminal event, differentiating between them can be done using more global
features such as any participant or just the year rather than the precise role and day. We
expect that in realistic news data sets with thousands of sources reporting on the similar
events, the details are needed to make a more fine-grained comparison. Nevertheless, we
remain dependent on the quality of the NLP software to detect all these details correctly
and sufficiently. We can also see that adding constraints increases the precision but that
the low recall remains a problem. Finally, it is important to realise that over 95% of all
event mentions are not coreferential in ECB+. Detecting coreference relations, even in
an artificial data set such as ECB+, is a very delicate task. Since BLANC averages the
results of noreference (singleton event mentions without a coreference link to any other
February 1, 2016
67/148
event mention) and coreference relations, any drastic approach on establish coreference
relations will be penalised by the noreference results.
3.2.2
Maximizing the event detection
To improve the performance of the event detection in terms of both precision and recall, we
developed a Conditional Random Fields (CRF) classifier, Event Detection system, inspired
by the TIPSem system (Llorens et al., 2010), a state-of-the-art system from the SemEval2010 TempEval task (Verhagen et al., 2010). We implemented the classifier using the
SemEval 2013 - TempEval 3 data (UzZaman et al., 2013b), on which it performs with
F1 scores of 82.6% and 85.9% using gold and silver training data respectively. We used
this re-implemention to either confirm or disqualify predicates that were detected by the
NewsReader SRL. The classifier also adds new events to the SRL output not detected by
NewsReader. Note that for the latter, we only obtain the predicates and not the roles.
When creating the intra-document coreference sets, we only consider predicates from the
SRL that were not disqualified (status=”false”).
We applied two versions of the Event Detection system to the NAF files to augment
the SRL layer. One was trained with the gold data (EDg(old)) and one with the silver
data (EDs(ilver)). In addition, we restricted the augmentation to those predicates that do
not have an event class from VerbNet, FrameNet or ESO. We assumed that correct events
are likely to have some typing from these resources through the PredicateMatrix, whereas
wrong events are expected to have no typing. The Event detection systems that skip
events with event classes are called EDg(old)EC and EDs(ilver)EC respectively. Finally,
we report on event mention detection if we would consider only those tokens that we
annotated as events in the key data (NWR-key in the last column). In this case, we can
assume maximum precision of the predicates detected in relation to the real recall.
In Table 15, the first column (NWR) shows the results for event mention detection
using the NewsReader system. The other settings are ili30, hypers, lcs, month and anyrole
Table 15: Macro averaged Mention identification for ECB+ topics. NWR=NewsReader
pipeline v3.0 without adaptation, EDg(old)=NWR augmented with EventDetection
trained with gold data, EDg(old)EC= same as EDg(old) but skipping predicates with
an Event class, EDs(ilver)= NWR augmented with EventDetection trained with silver
data, EDs(ilver)EC= same as EDs but skipping predicates with an Event class.
NWR NWR-key EDg
EDgEC EDs
EDsEC
recall
72.66%
72.66% 61.59% 72.64% 53.34% 72.43%
precision 66.29%
99.91% 88.23% 76.25% 92.10% 76.37%
f1
69.01%
83.80% 72.15% 74.01% 67.12% 73.94%
First of all, we see in Table 15 that the Event Detection variants have very high precision
(EDg(old) 88.23% and EDs(ilver) 92.10%) compared to NewsReader. Both variants score
only less than 9 points lower than the NWR-key with a maximum precision of 99.91%. In
February 1, 2016
68/148
terms of recall, however, both variants score considerably lower: 61.59% and 34% respectively. The F1 scores are consequently not very different from NewsReader and significantly
lower than for the SemEval-2013 task. The latter is not so surprising since it trained on
a data set from the same task and thus annotated in a similar way. We see that skipping
disqualifying predicates with Event classes, almost fully recovers the loss in recall while
precision drops less with about 15 points. Maintaining predicates with event classes thus
provides the highest F-measures (72.15% and 73.94% respectively), only 10 points lower
when considering the key event tokens (83.80%).
At the end of this section, we provide tables with the most frequent missed and invented predicates according to EDs(ilver)EC and also the full list of hapaxes, i.e. events
only missed and invented once. The most frequently missed events (Table 16) are missed
mostly due to their part-of-speech (nouns, adjectives, prepositions and names): dead, earthquake, Watergate, magnitude, guilty, according. Some tokenization errors are also frequent:
Shooting, Checks were not down-cased and 1 is in all 22 cases actually 6.1 which was split
into separate tokens and annotated as an event indicating the magnitude of the earthquake
by the annotators. Especially Table 17 listing predicates missed once as an event makes
clear that downcasing and lemmatising may solve many cases. Dealing with parts-of-speech
other than verbs more extensively and more proper tokenization will solve the majority of
the missed events.
Tables 18 and 19 show the predicates for the invented events occurring more than
once and only once respectively. In this case, the solution is less clear. Some of the more
frequent predicates such as murder, patch, driving all seem correct events but have not
been annotated for some reason. Others, such as mother, official, home, police clearly
are unlikely to be events regardless off the context. Finally, store, camp, administration
are ambiguous. Inspecting cases such as mother and police show that they appear to have
eventual readings that were falsely assigned to these mentions. A better filtering of nominal
predicates that do not have any eventual meaning (the second group) seems beneficial.
The detection of event mentions defines a natural upper bound for the event detection.
In Table 20, we give the reference results when maximizing the event detection compared
to the standard NewsReader output and using different settings to maximize the recall, the
precision and the f-measure. Results are reported in all different measures: MUC, BCUB,
CEAFe and BLANC, while the F measure of the first 3 scores are averaged as CoNLL F.
The table is divided into 3 sets of rows with different settings for running the system:
ARM at least one participant should match regardless of the role (any role), time anchor
match at granularity month and action match with phrases and concepts should
overlap with 30%.
mR maximizes recall: no matching of participants and time is required and action matches
with phrases and concepts is set to 1%.
mP maximizes precision: two participants should match with PropBank roles A0 and A1,
time anchors should match with granularity of the day and actions should match
100% in terms of concepts and phrases associated.
February 1, 2016
69/148
Table 16: Predicates missed more than once by NewsReader extended with EventDetection
(silver) and Event class filter as events in ECB+
dead
earthquake
according
Watergate
fire
magnitude
guilty
injured
1
Oscars
quake
security
deal
DUI
party
heart
shooting
playoff
Shooting
degree
it
murder
playoffs
pregnant
record
death
first
job
sexual
Checks
scoring
arson
emergency
drunken
game
piracy
riots
career
business
downtime
drunk
market
outage
problem
role
Run
tsunami
WWDC
According
Charged
42
37
35
30
28
27
23
23
22
21
20
18
17
17
17
16
16
14
14
13
13
13
13
13
13
12
12
12
12
11
11
10
10
9
9
9
9
8
7
7
7
7
7
7
7
7
7
7
6
6
double
first-degree
IR
merger
natural
problems
rehab
went
win
acquisition
be
design
Injured
It
Killed
list
Macworld
MVC
operation
Oscar
polygamy
second-degree
sequel
stint
suicide
traffic
Trial
trial
Valley
accident
basis
blaze
campaign
Convicted
custody
damage
Dead
earthquakes
fatally
Fire
going
Murder
news
operations
Playoff
prison
refresh
rounds
Sequel
swimming
6
6
6
6
6
6
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
temblor
top
treatment
triggered
which
”spiritual
Acquisition
affected
air
arraignment
bank
Charges
Clinch
Conference
congestive
crash
Cut
cut
damaging
data
dies
DOUBLE
Earthquake
event
go
gone
Guilty
health
hit
Industry
issues
Magnitude
manslaughter
matters
Missouri
mortar
move
new
NFL
Nominee
one
Pregnant
privacy
quarterfinals
Rehab
safe
Science
Seacom
season
senior
4
4
4
4
4
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
SET
striking
technology
that
touchdown
TRIAL
Win
worth
7.5
7.6
abortion
Accident
AFC
announcement
ARRESTED
Arson
attacks
Attorney
attorney
Availability
availability
basket
battery
Beat
bigamy
Bombing
Bombs
bombs
Burns
Business
California
changes
chase
clashes
clinch
coma
communications
Consulting
consulting
Convicts
crimes
critical
crossfire
cuts
Damaged
deaths
definitive
democracy
Direct
done
3
3
3
3
3
3
3
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
door
DWI
Endangered
energy
engineering
Fall
fatal
Financial
financial
Found
Francisco
furious
games
GUILTY
Heist
heist
hijack
history
Hit
Host
Hurt
incident
Indonesia
injures
injury
interception
Internet
journalism
Killing
landslides
lawless
lawyer
leaner
learning
Leaves
MAKE
more
musicals
offer
Overturn
panic
Placed
policy
Polygamy
position
pricing
program
public
purchase
quakes
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Raid
reasons
rehires
reigning
remote
repairs
Reports
reports
rescue
Returns
Riot
rioting
San
second
Services
sex
Sexual
sexually
Shot
smash
specializes
speculation
Stolen
strategy
suspicion
takeover
telecommunications
tensions
unpatched
unrest
Unveils
Victory
vigil
violence
violent
what
February 1, 2016
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
70/148
Table 17: Predicates missed once by NewsReader extended with EventDetection (silver)
and Event class filter as events in ECB+
Riot
’This
’murder”
#Oscars
’Knows
’bombs
2004
ACCIDENT
ACCUSED
ARRESTS
Accuses
Acquire
Acquiring
Affects
Aftershocks
Alzheimer
Anger
Arrest
Arrested
Attacks
Awards
BEAT
BLAZE
Begins
Bombed
Bumped
CHARGES
CLINCH
Call
Caught
Celebrates
Cloud
Congrats
Consultations
Continue
Cooling
County
Cruises
DEAD
DIRECT
DNA
Deal
Death
Defensive
Dies
Drunk
ESCAPE
Emergency
Engineers
Enraged
Escape
Expand
Expo
FALLOUT
FIRE
FIRED
Fatal
Feigned
Fired
Flagship
Follow
Football
Game
Goes
Handcuffs
Hire
I.R
Insanity
Instated
Interdicts
Investigation
Investment
Jeopardy
Jolts
Lake
Launches
Lead
Levels
MIA
MURDER
MacWorld
Make
Marriage
Musical
NEGOTIATIONS
Named
Negotiation
Negotiations
New
News
Nominates
Nomination
OFFER
Operation
Oscars
PLAYOFF
PLAYOFFS
POLYGAMY
Pending
Picked
Plus
Powder
Pre
Preorders
Protection
Protections
Protests
Pulls
Quake
REACH
REMAINS
Rampage
Reasons
Record
Recorded
Releases
Rescue
Restored
Riots
Rumors
Rushes
SAYSSEPTEMBER
SEQUEL
STATEMENT
Sacked
Semis
Server
Shelter
Shoots
Six
Spree
Strategy
Strike
Strikes
Suspicious
TOP
Takes
Testimony
That
Undisclosed
Vigil
Voting
Vying
WWDC12
War
Winter
Wounded
ablaze
access
accidents
achievement
activities
additional
adultery
affair
aftershocks
aid
alert
anger
announcements
any
are
armed
arrest
arrival
artery
assault
attack
attempted
attractiveness
available
balloting
banking
barefoot
basketball
battleground
behind
betrayal
bid
bids
blow
bout
brak
breadth
break
built-in
bust
cable-news
casting
cause
ceasefire
celebration
chance
chaotic
cheaper
checking
chip
chip-making
circumstances
code
cold
communication
compared
complete
computer
computing
conference
connectivity
coronary
count
counts
coverup
coveted
crazy
credit
credits
culture
dangerous
daunting
debut
delighted
die
die’
disappointing
disaster
disguised
dismissal
disrupts
do
domestic
double-team
down
drama
drinkdrinking
drive
driving
due
effectiveness
efficient
eighth
elections
emotion
equal
era
escalate
escort
experiences
extinction
failures
fall
famed
feedback
felony
fighting
finale
fir
fixes
flagship
flurry
foils
footing
free
frustrating
fuels
fun
game-winning
gaminggeothermal
gettingready
good
graphics
great
green
guide
gunplay
hacked
hacking
harbors
has
have
health-care
healthcare
help
herself
home
homicide
homicides
hostage
hundreds
hungry
hurt
impending
implications
incarnation
indecency
infected
inferno
ink
insanity
integration
intoxicated
investigation
is
isolation
jail
jumper
keyboard
keynote
large
largest
latest
lay-offs
lead
leading
leaves
lies
life
little-used
living
loan
long
longer
lose
lunacy
made
making
manhunt
mark
markets
matter
measuring
media-oriented
menace
mental
microservers
misdemeanor
mixed
modeling
money
murders
musicals”
needs
negotiation
negotiations
neutrality
next
nice
nine
nominee
normal
office
official
opportunity
outperform
outraged
overdue
overturns
payoffs
pending
percent
pick-six
pirate
place
playoffs
playing
pneumonia
polygamous
postpower
practice
pre
prediction
preorder
presidency
press
profile
protections
published
raids
rare
re-arrest
reason
recession
recorded
repair
repercussions
reserve
resolve
return
returned
returns
review
riot
ripening
robbery
running
sacks
saga
scenario
scheduled
school
scientific
screen
secret
sector
seismic
semifinals
series
service
share
shelter
shocking
shot
show
show’s
significant
situation
sixth
slain
slow
snafus
snowballs
sobriety
sobs
spate
spending
spiritual
spiritual’
star
stardom
status
stillness
stopgap
straight-talking
stretch
subject
supported
supremacy
surgery-enhanced
swoop
system
systems
tackled
talks
task
tech
telecom
telecoms
teleconference
telephone
temblors
terrestrial
thwarts
time
time-travelling
tour
trade
tradition
transaction
tremor
turned
underage
underway
underwent
undisclosed
update
upgrade
upping
use
vacant
verdict
vetting
vigils
war
way
weather
weed
wellness
wet
wheel
whereabouts
willing
winter
woes
word
worst
years
February 1, 2016
71/148
Table 18: Predicates invented and occurring more than once by NewsReader extended with
EventDetection (silver) and Event class filter as events in ECB+
mother
murder
patch
store
camp
administration
driving
official
home
police
star
had
pirates
running
season
including
run
have
host
update
left
officials
services
assault
attack
head
player
team
time
building
say
source
suspicion
’s
coast
director
following
life
sequel
show
sources
support
users
28
26
23
20
19
17
16
16
15
14
14
13
13
13
13
12
12
11
11
11
10
10
10
9
9
9
9
9
9
8
8
8
8
7
7
7
7
7
7
7
7
7
7
branch
causes
center
climber
figure
made
portfolio
receiver
said
seed
sentences
shot
tackle
age
cables
came
coach
death
details
employee
estimated
fix
forces
has
part
place
Promises
route
shooting
spot
workers
aged
agent
became
belts
berth
bombs
choice
come
contract
denied
deputy
disclosed
6
6
6
6
6
6
6
6
6
6
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
4
4
4
4
4
4
4
4
4
4
4
end
endangered
failure
group
hit
knee
maker
management
manager
operator
products
report
reports
robbers
seems
strike
times
tournament
Catching
affect
aid
appears
authorities
believed
border
boss
Breaking
candidate
case
caused
centers
committed
connection
convicted
crimes
date
defenders
execution
factory
film
get
got
groups
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
help
history
image
judge
law
leading
lineup
looks
manufacturer
name
pass
planned
point
processors
refused
reporter
residents
rule
seeded
service
statement
suspect
term
Voters
warship
weapons
players
presumed
processor
projects
quake
questions
received
record
referee
registration
rest
resulting
reveal
rival
routes
rules
saw
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
sentence
says
section
seen
standings
start
statements
steps
stuck
style
suffered
surrounding
suspected
suspects
take
target
teams
telecast
terms
territory
toll
trial
trustee
type
unit
used
want
watches
waters
wave
where
wholesaler
worker
parked
partners
parts
pick
pin
ACCUSED
actor
agreement
analyst
assaulting
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
attackers
attempt
back
backed
began
boats
brazen
carried
cause
chance
charges
couple
crown
deal
display
Employees
employees
evidence
face
failed
feed
Fix
found
generation
government
held
helped
hired
homes
house
information
leaving
link
marriages
meaning
measuring
minister
model
nickname
nominee
number
offender
offerings
opening
operators
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
February 1, 2016
72/148
Table 19: Predicates invented and occurring only once by NewsReader extended with
EventDetection (silver) and Event class filter as events in ECB+
‘undeserved
accepted
accused
address
advantage
advocates
aim
aimed
alive
alleged
analysts
answer
appeared
armed
arrest
asked
assurances
attempted
audience
audiences
backs
band
Based
based
battle
become
belt
BERTH
blown
bomb
born
box
breach
break
BREAKING
bug
call
camps
Case
centred
ceremony
challenger
change
charge
climbers
close
Coach
coaching
combination
coming
comment
commented
commit
communications
competition
complications
computing
condition
conditions
conference
confirmed
consulting
contain
contention
continuing
control
convention
conversations
convict
Counts
counts
crews
criminals
critics
cure
daring
data
decision
declined
declining
defender
defending
defense
deficiency
delays
denying
devastated
did
didn’t
disclose
disputed
documents
dodging
duty
earnings
edge
efforts
employed
employers
ending
entrance
eWEEK
excess
exercise
expected
exploit
extended
fact
fading
father
favorite
feat
feeding
feuds
finds
fire
flag
followed
followers
Following
force
gave
given
going
grab
growing
guard
guards
guilt
hand
happened
having
he
hold
hole
holiday
husband
impact
indicated
inmate
input
intended
investigation
issued
killer
kittens
last
launched
leave
leg
let
level
live
located
look
loss
lost
loved
making
managed
managing
manufacturers
margin
mark
matter
mean
meant
meetings
memory
merged
mermaid
moving
murderer
named
names
need
network
numbers
occurred
offense
offer
Officials
onslaught
operations
opinion
opposition
Order
order
ordered
organizers
owner
page
park
parking
participants
Parts
passes
past
patrolled
perform
photographs
pioneer
pipeline
plagued
planning
play
portables
portions
poses
present
price
priority
prisoner
prisoners
producers
product
proposed
prosecution
prospect
protected
protection
protesters
provider
providers
put
quarter
range
ranging
reach
reaction
recommendation
refugee
related
relations
remain
remained
remaining
repaired
representative
required
researcher
reserve
resident
resigning
result
Results
results
return
revamped
revenue
review
revolution
revolving
reward
rewards
risks
rivals
Robbers
Ruling
Running
salesman
saying
scoring
searching
secret
security
see
semifinal
serve
serves
set
setting
shock
shots
sign
signals
signed
skipping
slide
sorts
sounds
spasm
specify
speculate
spoke
spree
stabbing
stages
stand
standard
standards
stars
started
steering
stock
stockholders
stop
stores
students
success
supplier
survivors
Take
tapes
tell
tenant
test
tests
thieves
threatened
throw
tip
total
touched
tourist
touting
transcripts
transition
treatment
tribute
tried
try
trying
turned
types
user
value
valued
vessel
vessels
view
viewers
violence
visiting
voters
Wanted
warehouse
warming
warning
wasn’t
way
ways
weapon
weighing
when
whips
winner
wiretapped
worry
February 1, 2016
73/148
Within each set of rows, the first row (kNWR) shows the results when we only use the
key event annotations as markables for the system response. That means that we maximize
the precision but not the recall. We thus expect higher results due to more precision.
We can compare the real results against these maximized results, where NWR represents
the NewsReader results without any adaptation. In case of EDg(old) and EDs(ilver),
events are detected by CRF Event Detection system trained with gold data and silver
data respectively. In case of EDg(old)EC and EDs(ilver)EC, the system only disqualifies
predicates as events that do not have an event ontology type. Each of these are then
combined with the above settings ARM, mR and mP. Best scores of the true systems per
metric are in bold.
Table 20: Reference results macro averaged over ECB+ topics with different options for
event detection. kNWR=NewsReader event detection without invented mentions, maximizing precision, NWR=NewsReader pipeline v3.0 without adaptation, EDg=NWR augmented with EventDetection trained with gold data, EDgEC= same as EDg but skipping
predicates with an Event class, EDs= NWR augmented with EventDetection trained with
silver data, EDsEC= same as EDs but skipping predicates with an Event class. ARM=
standard setting one participant in any role (AR), time month match and action concept
and phrase match 30%, mR= maximizing recall by no constraints on participant match
and time, action concept and phrase match 1%, mP=maximizing precision by participant
roles A0A1, time day match and action concept and phrase match is set to 100%.
kNWR-ARM
NWR-ARM
EDg-ARM
EDgEC-ARM
EDs-ARM
EDsEC-ARM
kNWR-mR
NWR-mR
EDg-mR
EDgEC-mR
EDs-mR
EDsEC-mR
kNWR-mP
NWR-mP
EDg-mP
EDgEC-mP
EDs-mP
EDsEC-mP
R
33.55
33.55
23.54
33.10
23.26
33.03
53.77
53.77
38.94
54.31
38.16
53.98
11.09
11.09
8.15
11.35
7.96
11.15
MUC
P
74.63
53.78
69.78
59.78
70.54
59.91
68.75
48.59
63.47
54.60
64.15
54.58
70.31
54.42
62.77
57.45
63.02
57.97
F
45.54
40.64
34.45
41.92
34.12
41.91
59.73
50.54
47.58
53.84
47.10
53.65
18.77
18.02
14.12
18.54
13.82
18.31
R
41.10
41.10
28.49
39.98
28.63
40.15
52.05
52.05
36.26
51.83
36.01
51.75
31.59
31.59
22.34
30.99
22.45
31.09
BCUB
P
85.21
55.68
78.50
63.94
79.44
64.10
62.70
39.68
58.55
45.27
59.31
45.35
95.93
63.18
87.09
72.52
88.11
72.65
F
54.83
46.73
41.25
48.42
41.55
48.61
56.14
44.55
44.11
47.62
44.18
47.62
46.97
41.44
35.19
42.75
35.40
42.85
R
60.59
60.23
43.61
58.14
43.94
58.38
45.62
44.41
33.53
42.71
34.01
42.90
62.71
62.54
44.47
60.60
44.96
60.85
CEAFe
P
52.09
32.73
43.69
36.64
45.56
37.02
69.41
40.27
57.25
47.29
60.02
47.66
39.20
25.31
34.29
28.60
35.58
28.79
F
55.12
41.82
42.78
44.33
43.87
44.66
54.26
41.51
41.57
44.19
42.70
44.44
47.47
35.52
37.83
38.31
38.91
38.53
R
34.75
34.75
19.90
34.24
19.26
34.14
41.44
41.44
25.24
41.95
24.28
41.58
27.73
27.73
15.58
27.72
15.00
27.56
BLANC
P
77.69
42.64
68.97
51.31
70.22
51.85
67.59
34.96
59.73
43.22
60.62
43.23
81.31
47.35
69.13
55.17
70.39
55.56
F
45.40
34.05
28.96
37.21
28.40
37.27
49.30
36.89
33.77
41.20
33.05
40.98
35.64
25.03
22.42
28.58
21.95
28.53
CoNLL
F
51.83
43.06
39.49
44.89
39.85
45.06
56.71
45.53
44.42
48.55
44.66
48.57
37.73
31.66
29.04
33.20
29.38
33.23
We first discuss the ARM output which is supposed to give the most balanced results
for precision and recall and thus the highest F measure. EDs(ilver) and EDs(ilver)EC
in most cases have the highest score. The EDg(old) and EDg(old)EC results are slightly
lower. This is in line with the difference in event mention detection observed earlier. The
best CoNLL F score of 45.06% is obtained by EDs(ilver)EC, which is 5 points less than the
kNWR version, with maximum precision. Overall the best scores are about 5 points lower
than the maximum precision scores of kNWR. BLANC F scores are lower than CoNLL.
This is mainly due to the low recall. The best precision of BLANC is 70.22% (EDs(ilver)),
while the kNWR precision is 77.69%. These scores are high and comparable to the stateof-the-art. This shows that most improvement can be expected from improving the recall
and especially the recall in the event detection.
February 1, 2016
74/148
If we look at the results for maximizing recall (mR), we see that in most cases the
recall is higher and the precision is lower, while when maximizing the precision (mP), the
precision is significantly higher and the recall is lower. The exception is CEAFe where
recall and precision are exactly the reversed. Ignoring the CEAFe results, we see that
the maximum recall is 54.31% MUC for EDg(old)EC-mR and the maximum precision is
88.11% BCUB for EDs(ilver). As for ARM, most results are about 5 points below the key
results kNWR. Highest F measures are obtained with maximized recall for EDs(ilver)EC
(40.98% F-BLANC and 48.57% F-CoNLL).
In the state-of-the-art literature, cross-document coreference is not only tested across
documents within the same topic but also across the whole data set. To compare our
results with the state-of-the-art, we abandoned the topic structure and ran the NAF2SEM
program on all the 982 ECB+ files processed by NewsReader. This results in a single RDF
file after comparing all events in the data set with each other.
We compare our results with Yang et al. (2015), who report best results on ECB+
and compare their results to other systems that have so far only been tested on ECB and
not on ECB+. Yang et al use a distance-dependent Chinese Restaurant Process (DDCRP
(Blei and Frazier, 2011)), which is an infinite clustering model that can account for data
dependencies. They define a hierarchical variant (HDDCRP) in which they first cluster
event mentions and data within a document and next cluster the within document clusters
across documents. Their hierarchical strategy is similar to our CompositeEvent approach,
in the sense that event data can be scattered over multiple sentences in a document and
needs to be gathered first. Our approach differs in that we use a semantic representation
to capture all event properties and do a logical comparison, while Yang et al and all the
other methods they report on are based on machine learning methods (both unsupervised
clustering and supervised mention based comparison). Yang et al test their system on
topics 23-43 while they used topics 1-20 as training data and topics 21-23 as development
set. They do not report on topics 44 and 45. To compare our results with theirs, we also
used topics 23-43 for testing. Since our system is fully unsupervised for the task itself,
the training and development sets are irrelevant. In Table 21, we give the NewsReader
results using the ARM settings that give the best F-measure. Table 22 is an exact copy
of the results as reported by Yang et al. (2015). They follow a machine-learning approach
to event-coreference, exploiting both clustering techniques and supervised techniques with
rich feature sets. They also implemented a number of other state-of-the-art systems that
use variations on clustering or supervised learning and applied them to the same data set
within ECB+:
LEMMA a heuristic method that groups all event mentions, either within or across documents, which have the same lemmatized head word.
AGGLOMERATIVE a supervised clustering method for within-document event coreference following Chen and Ji (2009b).
HDP-LEX an unsupervised Bayesian clustering model for within- and cross-document
event coreference (Bejan and Harabagiu, 2010a). It is a hierarchical Dirichlet process
February 1, 2016
75/148
(HDP) model with the likelihood of all the lemmatized words observed in the event
mentions.
DDCRP a Distance-dependent Chinese Restaurant Process model that ignores document
boundaries.
HDDCRP* a variant of the proposed HDDCRP that only incorporates the withindocument dependencies but not the cross-document dependencies.
HDDCRP their preferred HDDCRP system that also uses cross-document dependencies.
Table 21: Reference results macro averaged over ECB+ corpus with different options for
event detection. kNWR=NewsReader event detection without invented mentions, maximizing precision, NWR=NewsReader pipeline v3.0 without adaptation, EDg=NWR augmented with EventDetection trained with gold data, EDgEC= same as EDg but skipping
predicates with an Event class, EDs= NWR augmented with EventDetection trained with
silver data, EDsEC= same as EDs but skipping predicates with an Event class. ARM=
standard setting one participant in any role (AR), time month match and action concept
and phrase match 30%, mR= maximizing recall by no constraints on participant match
and time, action concept and phrase match 1%, mP=maximizing precision by participant
roles A0A1, time day match and action concept and phrase match is set to 100%.
kNWR-ARM
NWR-ARM
EDg
EDgEC
EDs
EDsEC
R
33.21%
33.21%
23.56%
32.73%
24.73%
32.63%
MUC
P
73.38%
50.97%
69.09%
57.49%
69.35%
57.52%
F
45.73%
40.22%
35.14%
41.71%
36.46%
41.64%
R
40.05%
40.05%
27.05%
39.00%
28.49%
39.04%
BCUB
P
83.49%
53.28%
76.77%
61.98%
77.43%
62.00%
F
54.13%
45.73%
40.00%
47.88%
41.66%
47.91%
R
55.12%
54.76%
37.78%
52.74%
39.34%
52.54%
CEAFe
P
53.92%
33.30%
43.71%
37.99%
46.96%
38.48%
F
54.52%
41.42%
40.53%
44.17%
42.81%
44.43%
CoNLL
F
51.46%
42.46%
38.56%
44.59%
40.31%
44.66%
Table 22: Reference results macro averaged over ECB+ corpus as reported by Yang et al.
(2015) for state-of-the-art machine learning systems
MUC
LEMMA
HDP-LEX
AGGLOMERATIVE
DDCRP
HDDCRP*
HDDCRP
R
55.40%
63.50%
59.20%
58.20%
66.40%
67.10%
P
75.10%
75.50%
78.30%
79.60%
77.50%
80.30%
F
63.80%
69%
67.40%
67.10%
71.50%
73.10%
R
39.60%
43.70%
40.20%
39.60%
48.10%
40.60%
BCUB
P
71.70%
65.60%
73.20%
78.10%
69%
73.10%
F
51%
52.50%
51.90%
52.60%
56.70%
53.50%
R
61.10%
60.20%
65.60%
69.40%
63%
68.90%
CEAFe
P
36.20%
34.80%
30.20%
31.80%
38.20%
38.60%
F
45.50%
44.10%
41.40%
43.60%
47.60%
49.50%
CoNLL
F
53.40%
55.20%
53.60%
54.40%
58.60%
58.70%
In line with Yang et al, we averaged the different F-measures to obtain a CoNLL-F value.
BLANC results are not reported by Yang et al. We can see that the best NewsReader
CoNLL-F scores about 14 points lower (44.66% against 58.70%), while the NewsReader
key version (kNWR) scores about 5 points lower. Looking more precisely at the recall
and precision scores, we can see that NewsReader scores significantly lower in recall but
often equal and in some cases higher in precision. In case of EDs(ilver), NewsReader scores
77.43% for BCUB precision, while HDDCRP scores 73.1% and the best precision is 78.1%
by DDCRP. In the case of CEAFe, EDs(ilver) even has a precision of 46.96% and the
February 1, 2016
76/148
best score by Yang et al is 38.6% for HDDCRP. Yang et al noticed that event detection
of a standard SRL system performs low (56% recall) and therefore trained a separate
CRF event detection system for event detection using the ECB+ training documents.15
Their CRF classifier obtains 93.5%, 89.0%, 95.0%, and 72.8% recall of the annotated event
mentions, participants, time and locations on the test data set. The NewsReader system
scores at most 72.66% recall on event detection. This shows that there is a big potential
for NewsReader to improve with respect to the above results, when specifically trained
on detecting events. Increasing the recall of events by 20 points will have a big impact
on the recall for event coreference as well. On the other hand, we can assume that the
NewsReader results are more realistic as an indication of the performance on any data set
since it has not been trained on a specific data set. The results reported by Yang et al are
likely to drop when moving from ECB+ to another data set unless separate training data is
provided. For comparison, our own CRF Even Detection system trained on SemEval 2013
TempEval 3 data performed with F-scores above 80% but performed much lower (more
than 10 points) when applied to ECB+.
Given the nature of the ECB+ data set, it makes sense to consider the actual distribution of predicates over documents, topics and the complete data set to measure the
complexity of within-topic and across-topic comparison. The extend that predicates occur
across topics can be seen as an indication for the referential ambiguity since the main referential events are separated by topics with a systematic ambiguity to two seminal events
within a single topic. We thus expect that contextual events tend to occur only within a
single topic, while for example source events and grammatical events are not restricted to
a specific seminal event. Table 23 shows the division of mentions of three predicates across
documents and across topics. We first give the distribution on the basis of the full news
articles and next the distribution in the annotated part of the article, where on average
1.8 sentence has been annotated per article.
Table 23: Distribution of tell, kill and election over all text and annotated text per mention,
document and topic in ECB+
tell
kill
election
mentions
397
420
77
documents
235
207
29
Full text
ment/doc
1.69
2.03
2.66
topics
39
22
6
ment/top
10.18
19.09
12.83
mentions
23
141
17
Annotated text
documents
ment/doc
21
1.10
129
1.09
10
1.70
topics
8
14
1
ment/top
2.88
10.07
17.00
We can see in Table 23 that a source event such as tell occurs even less than a contextual
event such as kill when considering the full article. Nevertheless, tell occurs in more
documents and more topics than kill. We can see that a specific event such as election
has a low frequency, the highest average document frequency and the lowest document
frequency. These full text distributions confirm that a source event has a high dispersion
compared to contextuals. However when we consider the annotated text, we see that there
are hardly any mentions of tell left (5.7%) in comparison to kill (33.6%) and election
(22.1%). Average mentions per document are lower and more equal for all three predicates
15
In fact they also noted this for the detection of participants: 76%, timex-expressions: 65% of times
and locations: 13%
February 1, 2016
77/148
since there is on average 1.8 sentence per document that is annotated. However, the average
mentions per topic ratio is much higher for election and dropped drastically for tell and
substantially for kill.
Given the distribution of the annotation in ECB+, we thus can see that source events
are marginally annotated and this plays a minor role while some contextuals such as kill
have a high dispersion across topics but others such as election do not. The seminal
nature of the topics and the little overlap in events across topics, supports the fact that
the results for within-topic and across-topic are relatively close. It also suggests that in
real-life contexts when dealing with large volumes of news, it makes sense to apply some
form of prior topic clustering to avoid excessive ambiguity for especially source events (and
grammatical events) that are found in many texts regardless of the context of the event.
3.2.3
Conclusion NewsReader cross-document event coreference
We have seen that NewsReader event coreference can be tuned towards either high recall
(up to 62% at topic level and 41% at corpus level) or high precision (up to 81% at topic level
and 72% at corpus level). We have seen that recall is still limited and that this is mainly
due to the event detection. This is hopeful because it shows the validity of the approach
and it is relatively easy to improve in comparison to the more complex event coreference
process. We thus expect that further improving the event detection will also directly boost
the quality of the event coreference. It is important to note that our state-of-the-art results
are obtained using generic technology using logical comparison and processing without any
domain adaptation and without any machine-learning on the specific data set that has been
used for testing. This is an important feature of the system, since machine learning based
systems tend to have lower results when applied to different data sets than trained on.
We compared the performance of NewsReader against the latest state-of-the-art system
by Yang et al. (2015). Although NewsReader performs lower for CoNLL-F and recall
than the reported systems, it tends to have higher precision scores. The state-of-theart systems implemented by Yang et al benefit from training on ECB+ data, whereas
NewsReader is not adapted to the annotations and the data set. We can thus expect
that the NewsReader performance is more representative for other data sets than ECB+,
whereas the methods reported by Yang et al will be expected to perform much lower on
other data sets. Furthermore, Yang et al boosted the event mention detection from 56%
to 95% (as well as the participants, locations and time detection) by training a separate
classifier on ECB+, whereas the recall of the event detection of NewsReader is 72%. What
events are annotated and what events are not is often dependent on the style of annotation
and thus differs from data set to data set. We can assume that boosting the NewsReader
event detection on a specific data set by training on annotated events will also lead to a
significant boost in the event detection and consequently in the event coreference results.
Finally, the ECB+ data set is an artificially created data set that does not represent a
natural stream of news. Within natural daily news streams, there may not be two seminal
events that compete for interpretation on the same day (e.g. two attacks in Paris) and
there will be many more topics than the 42 topics in ECB+. The best settings for ECB+
February 1, 2016
78/148
therefore may not be the best settings for dealing with daily news streams. Evaluating the
best set up and best settings for event-coreference in a daily news stream is a lot of work
and very complex. Within this project, we did not had the resources to carry out such an
evaluation. Another problem is that there is no freely sharable data set that can be used.
Such a data set needs to contain the news for a certain period (say one month) from many
different sources so that we could follow the daily cumulation.
February 1, 2016
4
79/148
Event Relations
Event relation extraction, in particular temporal relation extraction, is a crucial step to
anchor an event in time, to build event timelines and to reconstruct the plot of a story.
In this section, we describe the task of detecting temporal relations, causal relations and
predicate time anchor relations. The description of each task begins with the presentation
of the annotation schema we have followed.
4.1
4.1.1
Temporal Relations
Annotation Schema
The annotation schema for temporal relations is based on the TimeML specification language (Pustejovsky et al. (2005)). In the TimeML annotation, temporal links are used to i)
establish the temporal order of two events (event-event pair); ii) anchor an event to a time
expression (event-timex pair); and iii) establish the temporal order of two time expressions
(timex-timex pair). In TimeML, temporal links are annotated with the <TLINK> tag.
The full set of temporal relations specified in TimeML version 1.2.1 (Saurı́ et al. (2006))
contains 14 types of relations, as illustrated in Table 24. Among them there are six paired
relations (i.e. with one relation being the inverse of the paired one). These relations map
one-to-one to 12 of Allen’s 13 basic relations.16
a |———|
b |———|
a |———|
b |———|
a |——|
b |————|
a |——|
b |————|
a |——|
b |——————|
a |——————|
b |——|
a |———|
b |———|
a |———| b
a is before b
b is after a
a is ibefore b
b is iafter a
a begins b
b is begun by a
a ends b
b is ended by a
a is during b
b is during inv a
a includes b
b is included in a
a is simultaneous with b
a is identity with b
Table 24: Temporal relations in TimeML annotation
According to the TimeML 1.2.1 annotation guidelines (Saurı́ et al. (2006)), the difference between during and is included (also their inverses) is that the during relation
is specified when an event persists throughout a temporal duration (e.g. John drove for 5
16
Allen’s overlaps relation is not represented in TimeML.
February 1, 2016
80/148
hours), while is included relation is specified when an event happens within a temporal
expression (e.g. John arrived on Tuesday).
In the NewsReader annotation guidelines (Tonelli et al. (2014)), we have simplified
the set of relations by not considering the relation types during and identity which
is a coreferential relation. A new relation has been added with respect to TimeML: the
measure relation. It is used to connect an event and a timex of type duration which
provides information on the duration of the related event.
Example:
The first A380 superjumbo, made pr1 by Airbus, was delivered pr2 today tmx2 to Singapore Airlines (SIA) 18 months tmx3 behind schedule. After the plane was delivered pr4 in
Singapore, it was flown pr3 to Toulouse, France for the ceremony pr5 of about 500 guests.
(DCT tmx1 : 2007-10-15)
The NAF representation of a part of the temporal relations extracted from the sentences
is as follows:
<t e m p o r a l R e l a t i o n s >
<!−−IS INCLUDED ( tmx1 , tmx2)−−>
< t l i n k i d =” t l i n k 6 ” from=”tmx1”
<!−−BEFORE( pr1 , p r 2)−−>
< t l i n k i d =” t l i n k 2 2 ” from=”p r 1 ”
<!−−BEFORE( pr4 , p r 3)−−>
<!−−BEFORE( pr4 , p r 5)−−>
<!−−IS INCLUDED ( pr2 , tmx2)−−>
</ t e m p o r a l R e l a t i o n s >
4.1.2
t o=”tmx2” fromType=”t i m e x ” toType=”t i m e x ” r e l T y p e =”SIMULTANEOUS”/>
t o=”p r 2 ” fromType=” e v e n t ” toType=” e v e n t ” r e l T y p e =”BEFORE”/>
t o=”tmx2 ” fromType=” e v e n t ” toType=”t i m e x ” r e l T y p e =”IS INCLUDED”/>
Temporal Relation Extraction
The temporal relation extraction module extracts temporal relations holding between two
events or between an event and a time expression or between two time expressions.
Two methods are used to extract temporal relations: machine learning method based
on SVM for classifying relations between two events or between an event and a time
expression; rule based method for ordering two time expressions.
Extraction of relations between two events or between an event and a timex.
We consider all combinations of event/event and event/timex pairs within the same
sentence (in a forward manner) as candidate temporal links. For example, if we have a
sentence with entity order such as “...ev1 ...ev2 ...tmx1 ...ev3 ...”, the candidate pairs are (ev1 ,
ev2 ), (ev1 , tmx1 ), (ev1 , ev3 ), (ev2 , tmx1 ), (ev2 , ev3 ) and (ev3 , tmx1 ). We remove event pairs
if the two events are part of the same verbal phrase. We also identify relations between
verbal events and document creation times and between main events 17 of two consecutive
sentences.
The problem of determining the label (i.e. temporal relation type) of a given temporal
link can be regarded as a classification problem. Given an ordered pair of entities (e1 , e2 )
that could be either event/event or event/timex pair, the classifier has to assign a certain
label, namely one of the 14 TimeML temporal relation types.
17
Main events correspond to the ROOT element of the parsed sentence.
February 1, 2016
81/148
A classification model is trained for each type of entity pair (event/event and event/timex), as suggested in several previous works (Mani et al. (2006); Chambers (2013)). We
build our classification models (Mirza and Tonelli (2014b)) using the Support Vector Machine (SVM) implementation provided by YamCha 18 and train them with the TempEval3
training corpus UzZaman et al. (2013a). The feature vectors built for each pair of entities
(e1 , e2 ) are as follows:
• String and grammatical features. Tokens, lemmas, PoS tags and flat constituent
(noun phrase or verbal phrase) of e1 and e2 , along with a binary feature indicating
whether e1 and e2 have the same PoS tags (only for event/event pairs).
• Textual context. Pair order (only for event/timex pairs, i.e. event/timex or
timex/event), textual order (i.e. the appearance order of e1 and e2 in the text)
and entity distance (i.e. the number of entities occurring between e1 and e2 ).
• Entity attributes. Event attributes (class, tense, aspect and polarity), and timex
type attribute19 of e1 and e2 as specified in TimeML annotation. Four binary features
are used to represent whether e1 and e2 have the same event attributes or not (only
for event/event pairs).
• Dependency information. Dependency relation type existing between e1 and
e2 , dependency order (i.e. governor-dependent or dependent-governor ), and binary
features indicating whether e1 /e2 is the root of the sentence.
• Temporal signals. We take the list of temporal signals extracted from the TimeBank 1.2 corpus into account. We found that the system performance benefit from
distinguishing between event-related signals and timex-related signals, therefore we
manually split the signals into two separate lists. Signals such as when, as and then
are commonly used to temporally connect events, while signals such as at, for and
within more likely occur with time expressions. There are also signals that are used
in both cases such as before, after and until, and those kind of signals are added to
both lists. Tokens of temporal signals occurring around e1 and e2 and and their positions with respect to e1 and e2 (i.e. between e1 and e2 , before e1 , or at the beginning
of the sentence) are used as features.
• Temporal discourse connectives. Consider the following sentences: i) “John has
been taking that driving course since the accident that took place last week.” and
ii) “John has been taking that driving course since he wants to drive better.” In
order to label the temporal link holding between two events, it is important to know
whether there are temporal connectives in the surrounding context, because they may
contribute in identifying the relation type. For instance, it may be relevant to distinguish whether since is used as a temporal or a causal cue (example i) and ii) resp.).
18
http://chasen.org/~taku/software/yamcha/
The value attribute tends to decrease the classifier performance as shown in Mirza and Tonelli (2014b),
and therefore, it is excluded from the feature set.
19
February 1, 2016
82/148
This information about discourse connectives is acquired using the addDiscourse tool
(Pitler and Nenkova (2009)), which identifies connectives and assigns them to one of
four semantic classes in the framework of the Penn Discourse Treebank (The PDTB
Research Group (2008)): Temporal, Expansion, Contingency and Comparison. We
include as feature whether a discourse connective belonging to the Temporal class
occurs in the textual context of e1 and e2 . Similar to temporal signals, we also include
in the feature set the position of the discourse connective with respect to the events.
The machine learning based module is available on github 20 and technical details about
it can be found in the Deliverable 4.2.2 (Section 3.12).
The result for relation classification (identification of the relation type given the relations) on TempEval3 test corpus (UzZaman et al., 2013a) is: 58.8% precision, 58.2% recall
and 58.5% F1-measure. We compare the performance of tempRelPro to the other systems
participating in the Tempeval-3 task in Table 25. According to the figures reported in UzZaman et al. (2013a), tempRelPro is the best performing system both in terms of precision
and of recall.
System
tempRelPro
UTTime-1, 4
UTTime-3, 5
UTTime-2
NavyTime-1
NavyTime-2
JU-CSE
F1
58.48%
56.45%
54.70%
54.26%
46.83%
43.92%
34.77%
Precision
58.80%
55.58%
53.85%
53.20%
46.59%
43.65%
35.07%
Recall
58.17%
57.35%
55.58%
55.36%
47.07%
44.20%
34.48%
Table 25: Tempeval-3 evaluation on temporal relation classification
Our complete system (relation identification and classification) attempts to extract a
document’s entire temporal graph, i.e. it extracts a high number of relations in a text. In
the evaluation (see Deliverable 4.2.3) this leads to good performance in terms of recall but
low in terms of precision due to the incompletness of the manually annotated corpora used
as gold standard.
Indeed annotating a corpus with all temporal relations between events and time expressions is a difficult and time consuming task. Consequently, in most of the available
corpora only small portions of the temporal graph are annotated. For example in the
NewsReader annotation guidelines five subtasks were defined to help annotators annotate
the most important temporal relations, but many relations are not considered, such as
relations between nominal events and document creation times.
Cassidy et al. (2014) propose a new annotated corpus called TimeBank-Dense which is
composed of files from the TimeBank corpus annotated with ten more temporal relations
with respect to the original annotation. Currently, we are not able to evaluate our system
20
https://github.com/paramitamirza/TempCauseRelPro
February 1, 2016
83/148
on the TimeBank-Dense corpus because the set of relations annotated is slightly different
from the one annotated by our system and because TimeBank is part of our training
corpus.
Relation extraction between two time expressions. The second step of the extraction of temporal relation in a document is the detection of timex/timex relations for all
dates and times. This step is performed using rules depending on the normalized form of
the value of time expressions. If the two time expressions are dates or times, we compare
first the years, then the months, weeks, days, etc. In the case of fuzzy expressions with
one of the following value PRESENT REF, PAST REF or FUTURE REF, then we use the relation
between the second time expression of the pair and the Document Creation Time to order
them.
This step enables us to make the temporal relations between time expressions explicit.
If the normalization of time expressions is correct, then the right order between them is
extracted. Wrong relations are identified only if the normalization fails.
Examples of timex/timex relations:
Apple Computer announced today tmx1 another special event to be held on October 12 tmx2 .
(DCT tmx0 : 2005-10-04)
• Normalization: tmx1 : 2005-10-04; tmx2 : 2005-10-12
• Relations: tmx1 before tmx2 ; tmx0 simultaneous tmx1 ; tmx0 before tmx2
He will be repatriated to Cuba between now tmx3 and Feb. 10 tmx4 . (DCT tmx5 :
2000-01-07)
• Normalization: tmx3 : PRESENT REF; tmx4 : 2000-02-10
• Relations: tmx3 before tmx4 ; tmx5 simultaneous tmx3 ; tmx5 before tmx4
4.2
4.2.1
Causal Relation
Annotation Scheme
The annotation scheme has been newly defined for the NewsReader project (see the NewsReader guidelines (Tonelli et al., 2014; Mirza et al., 2014)). Similar to the <TLINK>
tag in TimeML for temporal relations, we introduce the <CLINK> tag to mark a causal
relation between two events. Both TLINKs and CLINKs mark directional relations, i.e.
they involve a source and a target event. However, while a list of relation types is part of
the attributes for TLINKs (e.g. before, after, includes, etc.), for CLINKs only one relation
type is foreseen, going from a source (the cause, indicated with s in the examples) to a
target (the effect, indicated with t ).
We also introduce the notion of causal signals through the <C-SIGNAL> tag. CSIGNALs are used to mark-up textual elements signalling the presence of causal relations,
which include all causal uses of prepositions (e.g. because of, as a result of, due to),
February 1, 2016
84/148
conjunctions (e.g. because, since, so that), adverbial connectors (e.g. so, therefore, thus)
and clause-integrated expressions (e.g. the reason why, the result is, that is why).
Wolff (2007) claims that causation covers three main types of causal concepts, i.e.
CAUSE, ENABLE and PREVENT. These causal concepts are lexicalized through three
types of verbs listed in Wolff and Song (2003): i) CAUSE-type verbs, e.g. cause, prompt,
force; ii) ENABLE-type verbs, e.g. allow, enable, help; and iii) PREVENT-type verbs,
e.g. block, prevent, restrain. These categories of causation are taken into account as an
attribute of CLINKs.
Given two annotated events, a CLINK is annotated if there is an explicit causal construction linking them. Such construction can be expressed in one of the following ways:
1. Expressions containing affect verbs (affect, influence, determine, change, etc.), e.g.
Ogun ACN crisis s influences the launch t of the All Progressive Congress.
2. Expressions containing link verbs (link, lead, depend on, etc.), e.g. An earthquake t
in North America was linked to a tsunami s in Japan.
3. Basic constructions involving causative verbs of CAUSE, ENABLE and PREVENT type, e.g. The purchase s caused the creation t of the current building.
4. Periphrastic constructions involving causative verbs of CAUSE, ENABLE
and PREVENT type, e.g. The blast s caused the boat to heel t violently. With
“periphrastic” we mean constructions where a causative verb (caused ) takes an embedded clause or predicate as a complement expressing a particular result (heel ).
5. Expressions containing CSIGNALs, e.g. Its shipments declined t as a result of
a reduction s in inventories by service centers.
Example:
The departure from France of the new Airbus A380 superjumbo airliner on a tour of Asia
and Australia has been delayed pr4 , leading to a rearrangement pr8 of its public appearances.
The NAF representation of the causal relation holding between delayed and rearrangement is as follows:
<c a u s a l R e l a t i o n s >
<!−−(pr4 , pr8)−−>
<c l i n k i d=” c l i n k 5 ” from=”pr4 ” t o=”pr8”/>
</ c a u s a l R e l a t i o n s >
4.2.2
Causal Relation Extraction
We start with an assumption that causality may only occur between events in the same
sentence and between events in two consecutive sentences. Therefore, every possible combination of events in the same sentence and in two consecutive sentences, in a forward
manner, is considered as a candidate event pair.
February 1, 2016
85/148
The problem of detecting causal relations (CLINKs) between events is taken as a supervised classification task. Given an ordered pair of events (e1 ,e2 ), the classifier has to
decide whether there is a causal relation or not. However, since causality is a directional
relation between a cause (source) and an effect (target), the classifier has to assign one of
three possible labels: (i) clink (where e1 is the source and e2 is the target), (ii) clink-r
(with the reverse order of source and target), and (iii) o for no relation.
The classification model is built with YamCha21 (Kudo and Matsumoto (2003)), which
implements Support Vector Machines (SVMs) algorithm. We employ one-vs-one strategy for multi-class classification, and use the polynomial kernel. The overall approach
is inspired by an existing work for identifying causal relations between events (Mirza
and Tonelli (2014a)), with some differences in the feature set. The implemented features are explained in the following sections. The module is available on github https:
//github.com/paramitamirza/TempCauseRelPro.
Event features We implement some morphological, syntactical and textual context features of e1 and e2 , such as:
• lemma and part-of-speech (PoS) tags;
• sentence distance (e.g. 0 if e1 and e2 are in the same sentence, 1 if they are in adjacent
sentences);
• entity distance (i.e. the number of entities occurring between e1 and e2 , which is only
measured if e1 and e2 are in the same sentence);
• dependency path existing between e1 and e2 ;
• binary features indicating whether e1 /e2 is the root of the sentence;
• event attributes of e1 /e2 , including tense, aspect and polarity; and
• a binary feature indicating whether e1 and e2 co-refer.22
Causal marker features We consider three types of causal markers that can cue a
causal relation between events:
1. Causal signals. We extracted a list of causal signals from the annotated C-SIGNALs
in the Causal-TimeBank corpus.23
2. Causal connectives, i.e. the discourse connectives under the Contingency class
according to the output of the addDiscourse tool (Pitler and Nenkova (2009)).
3. Causal verbs. The three types of verbs lexicalizing causal concepts as listed in Wolff
and Song (2003): i) CAUSE-type verbs, e.g. cause, prompt, force; ii) ENABLE-type
verbs, e.g. allow, enable, help; and iii) PREVENT-type verbs, e.g. block, prevent,
restrain.
21
http://chasen.org/~taku/software/yamcha/
When two events co-refer, there is almost no chance that they hold a causal relation.
23
For some causal signals that can have some other tokens in between, e.g. due (mostly) to, we instead
include their regular expression patterns, e.g. /due .*to/, in the list.
22
February 1, 2016
86/148
We further enriched the list of causal signals and causal verbs with the Paraphrase
Database (PPDB, Ganitkevitch et al. (2013)), using the initial list of signals and verbs as
seeds.
Based on the existence of causal markers around e1 and e2 , exactly in that priority
order24 , we include as features:
• causal marker string;
• causal marker position, i.e. between e1 and e2 , before e1 , or at the beginning of the
sentence where e1 /e2 is in; and
• dependency path between the causal marker and e1 /e2 .
TLINKs Mirza and Tonelli (2014a) showed that even though only 32% of the gold
annotated causal links have the underlying temporal relations, the temporal relation type
of an event pair (e1 , e2 ), if any, contributes in determining the direction of the causal
relation (clink vs clink-r), if any. Therefore, we include the information of temporal
relation types in the feature set.
In building the causal relation extraction system, we use Causal-TimeBank25 (Mirza et
al. (2014)) with the previously explained annotation scheme as our development dataset.
Causal-TimeBank is the TimeBank corpus26 taken from the TempEval3 evaluation campaign, which is completed with causal information as well. There are 318 causal links
(CLINKs), only around 6.2% of the total temporal links (TLINKs) found in the corpus,
containing 183 documents in total.
The developed causal relation (CLINK) relation system is then evaluated in a five-fold
cross-validation setting. Table 26 shows the performance of the system, compared with
the system of Mirza and Tonelli (2014a) as a baseline.
Given the limited amount of data annotated with causality, the supervised systems
still do not yield satisfactory results. Mirza and Tonelli (2014a) report issues with data
sparseness, and suggest that other training data could be derived, for instance, from the
Penn Discourse Treebank (Prasad et al. (2008)). We adopt a different approach by combining the small data set available with unlabelled dataset in a semi-supervised setting,
specifically with the self-training method.
We exploit the remaining of TempEval-3 corpus besides TimeBank, i.e. AQUAINT and
TE3-platinum (TempEval-3 evaluation corpus), with gold events and TLINKs. The idea
of using the corpus with gold standard events and TLINKs for semi-supervised learning is
because event’s attributes (tense, aspect and polarity) and TLINKs are quite important
as features. There are 90 additional documents in total. The self-training method is done
with 9 iterations with 10 documents per iteration.
Two different schemes of the self-training are explored: (1) adding all extracted CLINKs
24
We first look for causal signals. If we do not find any, then we continue looking for causal connectives.
And so on.
25
http://hlt.fbk.eu/technologies/causal-timebank
26
Dataset annotated with temporal entities such as temporal expressions, events and temporal relations
in TimeML annotation framework.
February 1, 2016
87/148
as new training data (with imbalanced number of positive and negative examples) and (2)
adding balanced number of positive and negative CLINKs. Self-training with the (1)
scheme improves the precision but lowers the recall. Meanwhile, self-training with the (2)
scheme reduces the precision but improves the recall, but increases the overal performance
in terms of F1-score.
System
Mirza and Tonelli (2014a)
CLINK extraction
self-training (1)
self-training (2)
P
0.6729
0.6917
0.7167
0.6382
R
0.2264
0.2921
0.2730
0.3079
F1
0.3388
0.4107
0.3954
0.4154
Table 26: CLINK extraction system’s performance.
4.3
Predicate Time Anchors
The amount of temporal relations extracted by the previously described Temporal Relation
Extraction modules grows with the number of annotated events and temporal expressions.
Some events are linked to a time expression with a relation of type simultaneous or
is included, but some are only linked with relations of type after or before either to
a time expression or to another event. With the main goal of structuring timelines from
events in texts, we propose to use these relations and other textual information in order to
build a relation “PredicateTimeAnchor” between all events that can be anchored in time
and time expressions.
4.3.1
Annotation Scheme
A narrative container is defined by Styler IV et al. (2014) as a temporal expression or an
event explicitly mentioned in the text into which other events temporally fall. For the
TimeLine shared task at SemEval 2015 (Minard et al. (2015)) we proposed the notion of
temporal anchoring of an event, which is a specific type of temporal relation that links an
event to the temporal expression to which it is anchored. The anchoring in time of an event
can be realized in two ways: either the event is anchored in time through a time expression
which can be a DATE or a DURATION, or the event is anchored to an interval through a
begin point and an end point. Time expressions can be text consuming or not dependent
on the fact that they are explicitly expressed in the text or that they were derived from
another time expression.
Examples: 27
Stock markets around the world have fallen pr1 dramatically today tmx1 .
27
In order to make examples more readable they all contain the time expression and the event in the
same sentence. But the module also anchors in time events with time expressions that are in different
sentences.
February 1, 2016
88/148
PredicateTimeAnchor (pr1 ): time anchor: 2008-09-17 (tmx1 )
The U.S. dollar rose against the yen after six straight days tmx2 of losses pr2 .
PredicateTimeAnchor (pr2 ): time anchor: P6D (tmx2 , begin point: 2008-10-21, end point:
2008-10-27)
The Japanese economy contracted pr3 by 0.9% between April tmx3 and June tmx4 .
PredicateTimeAnchor (pr3 ): begin point: 2008-04 (tmx3 ); end point: 2008-06 (tmx4 )
The Russian government has continued to hold all stock markets closed pr4 until Friday tmx5 .
PredicateTimeAnchor (pr4 ): end point: 2008-09-19 (tmx5 )
In NAF PredicateTimeAnchor relation is described through three attributes which have
as value a reference link to a time expression:
anchorTime: indicate the point in time when the event occured
beginPoint: indicate the begin of the interval in which the event occured
endPoint: indicate the end of the interval in which the event occured
The NAF representation is as follows:
<t e m p o r a l R e l a t i o n s >


<t a r g e t i d=”pr1”/>




<t a r g e t i d=”pr3”/>


</t e m p o r a l R e l a t i o n s >
4.3.2
Predicate Time Anchor Relation Extraction
Following the definition of time anchoring of events in the TimeLine shared task at SemEval
2015 (Minard et al. (2015)), we have developed a system to extract “anchoring” relations
between an event and a time expression in the text. The system is rule based and performs
a kind of reasoning over the temporal information. It uses the temporal relations previously
extracted, verb tenses, dependency trees and temporal signals.
February 1, 2016
5
89/148
From TimeLines to StoryLines
In the previous section, we discussed the detection of temporal expressions, temporal relations and causal relations within a single document. In this section, we describe large
structures that go beyond the document level, such as TimeLines of events for entities
across documents and StoryLines. Stories are seen as the most natural representation of
changes over time that also provide explanatory information to these changes. Not all
changes make a story. Repetitive changes without further impact, e.g. the rising and
dawning of the sun, do not provide a story. We expect that news is typically focused on
those changes that have a certain impact. We further assume that the news tries to explain
these events (How did it get so far? Who is responsible? ) and describethe consequences
of the event. Our starting point is therefore a key concept from narrative studies, namely
that of plot structure (Ryan (1991); Herman et al. (2010)). A plot structure is a representational framework which underlies the way a narrative is presented to a receiver. Figure
37 shows a a graphical representation of a plot structure.
Figure 37: General structure of a plot building up to a climax point
We therefore seek to create structures of events selected from all extracted events that
approximate such abstract plot structure. Contrary to what can be done in narrative
studies, where the documents themselves normally provide a linear development of the
plot structure, we aim at obtaining the plot structure from collections of news articles
on the same topic and spanning over a period of time. We aim at identifying first the
“climax”, which in our perspective will correspond to the most salient event in a news
article. After the most salient event and its participants have been identified we will use
event relations to identify the rising actions (i.e. how and why did the most salient event
occur? ), if any28 , and the falling actions and consequences (i.e. what happened after the
climax? what are the speculations linked to the climax event? . . . ). The first step towards
the creation of StoryLines is to to establish the temporal ordering of events. In subsection
5.1.2 we report on a TimeLine extraction system. In subsection 5.2.1 we describe two
computational models for StoryLines and their implementation, which to our knowledge
are unique in their kind. Finally, in subsection 5.2.3, we report on the results and insights
28
Notice that (unforeseeable) natural events, like earthquakes, are to be considered as self-contained
climax events.
February 1, 2016
90/148
of the first workshop on “Computing News Storylines (CNewsStory 2015)” organised as a
satellite event of the ACL-IJCNLP 2015 conference.
5.1
TimeLine extraction
This section reports on the advancements in the development of the TimeLine extraction.
TimeLine extraction aims at reconstructing the chronological temporal order of events
form (large) collections of news spanning over different years. In the following subsection,
we describe the task, the benchmark data which had been developed for the SemEval
2015 evaluation exercise Task 4: TimeLine: Cross-Document Event Ordering, and the new
version of the TimeLine system.
5.1.1
TimeLines: task description
The Task 4: TimeLine: Cross-Document Event Ordering was proposed as a new pilot task
for the SemEval 2015 Evaluation Exercise. The task builds on previous temporal processing
tasks organised in previous SemEval editions (TempEval-129 , TempEval-230 and TempEval331 ). The task aimed at advancing temporal processing by tackling for the first time with
a common and public dataset the following issues:
• cross-document and cross-temporal event detection and ordering.
• entity-based temporal processing.
Following the task guidelines, a TimeLine can be defined as a set of chronologically anchored and ordered events related to an entity of interest (i.e. a person, a commercial
or financial product, an organization, and similar) obtained from a document collection
spanning over a (large) period of time. Furthermore, not all events are eligible to enter
a TimeLine. The task organisers have restricted the event mentions to specific parts-ofspeech and classes, as defined in the task Annotation Guidelines32 , in particular, an event
can enter into a TimeLine only if the following conditions apply:
• it is realised by a verb, a noun or a pronoun (anaphoric reference);
• it semantically denotes the happening of something (e.g. the launch of a new product)
or it describes the action of declaring something, narrating an event, informing about
an event;
• it is a factual or certain event, i.e. something which happened in the past or in the
present, or for which there is evidence that will happen in the future.
29
http://www.timeml.org/tempeval/
http://timeml.org/tempeval2/
31
http://www.cs.york.ac.uk/semeval-2013/task1/
32
http://alt.qcri.org/semeval2015/task4/data/uploads/documentation/
manualannotationguidelines-task4.pdf
30
February 1, 2016
91/148
No training data was provided. Only a trial dataset of 30 articles manually annotated from
WikiNews, and associated TimeLines for six entities, were provided to the task participants.
Each event in the TimeLine is associated with a time anchor of type “DATE” following the
TimeML Annotation Guidelines (Saurı́ et al. (2006)). In case an event cannot be associated
with a specific time anchor, an underspecified time anchor of the type “XXXX-XX-XX” is
provided.
The final TimeLine representation is composed by a tab field file containing three fields.
The first field (ordering) contains a cardinal number which indicates the position of the
event in the TimeLine. Simultaneous, but not coreferential, events are associated with the
same ordering number. Events which cannot be reliably ordered, either because of a missing
time anchor or underspecified temporal relations (e.g. an event which is associated with
a generic date with value “PAST REF”) are to be put at the beginning of the TimeLine
and associated with cardinal number 0. The second field (time anchor) contains the time
anchor. The third column (event) consists of one event or a list of coreferential events.
Each event must be represented by the file id, the sentence id and the extent of the event
mention (i.e. the token). To clarify the representation of a TimeLine, in Figure 38 we
report the output of SPINOZA VU 1 system for the target entity “Airbus”33 .
Figure 38: Example of timeline output generated by the SPINOZA VU 1 system
Two tracks were proposed. Track A aimed at extracting TimeLines for target entities from
raw text. Track B aimed at extracting TimeLines for target entities by providing manually
annotated data for the gold events. Both tracks have a subtrack, Subtrack A and Subtrack
33
The entity ”Airbus” was one of the entity provided by the SemEval task organisers
February 1, 2016
92/148
B, whose goals are to evaluate only the ordering of events, without taking into account the
time anchoring.
The test data consisted of three different corpora, each containing 30 articles, and an
overall 37 target entities (12 for the first corpus, 12 for the second corpus and 13 for the
third corpus). The evaluation is based on the TempEval-3 evaluation (UzZaman et al.
(2012)) tool. All events associated with cardinal number 0 in a TimeLine are excluded
from the evaluation. Results and ranking reports the micro average F1 score for temporal
awareness.
5.1.2
System Description and Evaluation
A detailed description of the first version of the TimeLine extraction system can be found in
Rospocher et al. (2015) and Caselli et al. (2015a). Two different versions were developed,
called SPINOZA VU 1 and SPINOZA VU 2 respectively. The systems took part only
to the Track A (both main and subtask) of the SemEval 2015 Task 4. In Table 27 we
report the results of both versions of the system for the Task A - Main, including the best
performing system. Table 28 reports the results of both versions of the system for the Task
A - Subtask. In Table 28 no other result is reported because only our system participated.
The F1-scores ranges from 0 to 100.
System Version
SPINOZA VU 1
SPINOZA VU 2
WHUNLP 1
Corpus 1
4.07
2.67
8.31
Corpus 2
5.31
0.62
6.01
Corpus 3
0.42
0.00
6.86
Overall
3.15
1.05
7.28
Table 27: System Results (micro F1 score) for the SemEval 2015 Task 4 Task A - Main
System Version
SPINOZA VU 1
SPINOZA VU 2
Corpus 1
1.20
0.00
Corpus 2
1.70
0.92
Corpus 3
2.08
0.00
Overall
1.69
0.27
Table 28: System Results (micro F1 score) for the SemEval 2015 Task 4 Task A - Subtask
Overall, the results are not satisfying. Out of 37 entity based TimeLines, we obtained
results only for 31 of them. An error analysis showed three main sources of errors which
affected both versions of our system: event detection, temporal relations, semantic role
labelling and the connections between these three. This means that: i.) we may be able
to identify the correct event with respect to the target entity, but we are lacking the
temporal relation information for that event, thus failing to put it into the TimeLine, ii.)
we fail in the identification of the target entity as an argument of an event. The temporal
relation of an event is the main source of error from these three. A detailed report on the
error analysis of the system can be found in Caselli et al. (2015c). Future work is directed
towards detecting more temporal relations between events and expressions that are explicit
in the text but also to use knowledge on temporal ordering of events that is implicit and
February 1, 2016
93/148
not expressed in the text. The latter can be learned from large text corpora. In the next
subsection, we describe the first results for resolving implicit relations exploiting the whole
document.
5.1.3
Document level time-anchoring for TimeLine extraction
As seen previously, TimeLine extraction requires a quite complete time anchoring. We
have shown that the temporal relations that explicitly connect events and time expressions are not enough to obtain a full time-anchor annotation and, consequently, produce
incomplete TimeLines. For this reason, we propose that for a complete time-anchoring
the temporal analysis must be performed at a document level in order to discover implicit
temporal relations. We have developed a preliminary module based on other research lines
involving the extraction of implicit information (Palmer et al., 1986; Whittemore et al.,
1991; Tetreault, 2002). Particularly, we are inspired by recent works on Implicit Semantic
Role Labelling (ISRL) (Gerber and Chai, 2012) and very specially on the work by Blanco
and Moldovan (2014) who adapted the ideas about ISRL to focus on modifiers, including
arguments of time, instead of core arguments or roles. We have developed a deterministic
algorithm of the type of (Laparra and Rigau, 2013) for ISRL.
Similarly to the module presented in 5.1.2, we implemented a system that builds TimeLines from events with explicit time-anchors. We defined a three step process to build
TimeLines. Given a set of documents and a target entity, the system first obtains the
events in which the entity is involved. Second, it obtains the time-anchors for each of these
events. Finally, it sorts the events according to their time-anchors. For steps 1 and 2 we
apply the NewsReader pipeline to obtain annotations at different levels. Specifically, we are
interested in Named-Entity Recognition (NER) and Disambiguation (NED), Co-reference
Resolution (CR), Semantic Role Labelling (SRL), Time Expressions Identification (TEI)
and Normalization (TEN), and Temporal Relation Extraction (TRE).
Named-Entity Recognition (NER) and Disambiguation (NED): We perform
NER using the ixa-pipe-nerc that is part of IXA pipes (Agerri et al., 2014). The module
provides very fast models with high performances, obtaining 84.53 in F1 on CoNLL tasks.
Our NED module is based on DBpedia Spotlight (Daiber et al., 2013). We have created a
NED client to query the DBpedia Spotlight server for the Named entities detected by the
ixa-pipe-nerc module. Using the best parameter combination, the best results obtained
by this module on the TAC 2011 dataset were 79.77 precision and 60.67 recall. The best
performance on the AIDA dataset is 79.67 precision and 76.94 recall.
Coreference Resolution (CR): In this case, we use a coreference module that is
loosely based on the Stanford Multi Sieve Pass sytem (Lee et al., 2011). The system consists of a number of rule-based sieves that are applied in a deterministic manner. The
system scores 56.4 F1 on CoNLL 2011 task, around 3 points worse than the system by Lee
et al. (2011).
February 1, 2016
94/148
Semantic Role Labelling (SRL): SRL is performed using the system included in the
MATE-tools (Björkelund et al., 2009). This system reported on the CoNLL 2009 Shared
Task a labelled semantic F1 of 85.63 for English.
Time Expression Identification (TEI) and Normalization (TEN): We use the
time module from TextPro suite (Pianta et al., 2008) to capture the tokens corresponding
to temporal expressions and to normalize them following TIDES specification. This module is trained on TempEval3 data. The average results for English is: 83.81% precision,
75.94% recall and 79.61% F1 values.
Time Relation Extraction (TRE): We apply the temporal relation extractor module from TextPro to extract and classify temporal relations between an event and a time
expression. This module is trained using yamcha tool on the TempEval3 data. The result
for relation classification on the corpus of TempEval3 is: 58.8% precision, 58.2% recall and
58.5% F1.
Our TimeLine extraction system uses the linguistic information provided by the pipeline.
The process to extract the target entities, the events and time-anchors can be described
as follows:
(1) Target entity identification: The target entities are identified by the NED
module. As they can be expressed in several forms, we use the redirect links contained in
DBpedia to extend the search of the events involving those target entities. For example,
if the target entity is Toyota the system would also include events involving the entities
Toyota Motor Company or Toyota Motor Corp. In addition, as the NED does not always
provide a link to DBpedia, we also consider the matching of the wordform of the head of
the argument with the head of the target entity.
(2) Event selection: We use the output of the SRL module to extract the events that
occur in a document. Given a target entity, we combine the output of the NER, NED,
CR and SRL to obtain those events that have the target entity as filler of their ARG0 or
ARG1. We also set some constraints to select certain events according to the specification
of the SemEval task. That is, we only return those events that are not negated and are
not accompanied by modal verbs except will.
(3) Time-anchoring: We extract the time-anchors from the output of the TRE and
SRL. From the TRE, we extract as time-anchors those relations between events and timeexpressions identified as SIMULTANEOUS. From the SRL, we extract as time-anchors
those ARG-TMP related to time expressions. In both cases we use the time-expression
returned by the TEI module. The tests performed on the trial data show that the best
choice for time-anchoring is combining both options. For each time anchor we normalize
the time expression using the output of the TEN module.
February 1, 2016
95/148
The TimeLine extraction process described following this approach builds TimeLines
for events with explicit time-anchors. We call this system BTE and it can be seen as a
baseline since we believe that the temporal analysis should be carried out at document
level. The explicit time anchors provided by the NLP tools do not cover the full set of
events involving a particular entity. That is, most of the events do not have an explicit time
anchor and therefore are not captured as part of the TimeLine of that entity. Thus, we need
to recover the time-anchors that appear implicitly in the text. In this preliminary work,
we propose a simple strategy that tries to capture implicit time-anchors while maintaining
the coherence of the temporal information in the document. This strategy follows previous
works on Implicit Semantic Role Labelling.
Figure 39: Example of document-level time-anchoring.
The rationale behind algorithm 1 is that by default the events of an entity that appear
in a document tend to occur at the same time as previous events involving the same entity,
except stated explicitly. For example, in Figure 39 all the events involving Steve Jobs, like
gave and announced, are anchored to the same time-expression Monday although this only
happens explicitly for the first event gave. The example also shows how for other events
that occur in different times the time-anchor is also mentioned explicitly, like for those
events that involve the entities Tiger and Mac OS X Leopard.
Algorithm 1 starts from the annotation obtained by the tools described above. For a
particular entity a list of events (eventList) is created sorted by its occurrence in the text.
Then, for each event in this list the system checks if that event has already a time-anchor
(eAnchor). If this is the case, the time-anchor is included in the list of default timeanchors (def aultAnchor) for the following events of the entity with the same verb tense
(eT ense). If the event does not have an explicit time-anchor but the system has found a
time-anchor for a previous event belonging to the same tense (def aultAnchor[eT ense]),
this time-anchor is also assigned to the current event (eAnchor). If none of the previous
conditions satisfy, the algorithm anchors the event to the Document Creation Time
(DCT) and sets this time-expression as the default time-anchor for the following events
with the same tense.
Note that algorithm 1 strongly depends on the tense of the events. As this information
can be only recovered from verbal predicates, this strategy cannot be applied to events
described by nominal predicates. For these cases just explicit time-anchors are taken into
February 1, 2016
96/148
Algorithm 1 Implicit Time-anchoring
1: eventList = sorted list of events of an entity
2: for event in eventList do
3:
eAnchor = time anchor of event
4:
eT ense = verb tense of event
5:
if eAnchor not N U LL then
6:
def aultAnchor[eT ense] = eAnchor
7:
else if def aultAnchor[eT ense] not N U LL then
8:
eAnchor = def aultAnchor[eT ense]
9:
else
10:
eAnchor = DCT
11:
def aultAnchor[eT ense] = DCT
12:
end if
13: end for
account. The TimeLine is built ordering the events according to the time-anchors obtained
both explicitly and implicitly. We call this system DLT.
We evaluated our two TimeLine extractors on the main track of the SemEval 2015
task 4. Two systems participated in this track, WHUNLP and the module explained
in 5.1.2, with three runs in total. Their performances in terms of Precision (P), Recall
(R) and F1-score (F1) are presented in Table 29. We also present in italics additional
results of both systems. On the one hand, the results of a corrected run of the WHUNLP
system provided by the SemEval organizers. On the other hand, the results of an out-ofcompetition version of the SPINOZAVU module The best run is obtained by the corrected
version of WHUNLP 1 with an F1 of 7.85%. The low figures obtained show the intrinsic
difficulty of the task, especially in terms of Recall.
Table 29 also contains the results obtained by our systems. We present two different
runs. On the one hand, we present the results obtained using just the explicit time-anchors
provided by BTE. As it can be seen, the results obtained by this run are similar to those
obtained by WHUNLP 1. On the other hand, the results of the implicit time-anchoring
approach (DLT) outperforms by far our baseline and all previous systems applied to the
task. To check that these results are not biased by the time-relation extractor we use in
our pipeline (TimePro), we reproduce the performances of BTE and DLT using another
system to obtain the time-relations. For this purpose we used CAEVO by Chambers et
al. (2014). The results obtained in this case show that the improvement obtained by our
approach is quite similar, regardless of the time-relation extractor chosen.
The figures in Table 29 seem to prove our hypothesis. In order to obtain a full timeanchoring annotation, the temporal analysis must be carried out at a document level. The
TimeLine extractor almost doubles the performance by just including a straightforward
strategy as the one described in this section. As expected, Table 29 shows that this
improvement is much more significant in terms of Recall.
February 1, 2016
System
SPINOZAVU-RUN-1
SPINOZAVU-RUN-2
WHUNLP 1
OC SPINOZA VU
WHUNLP 1
BTE
DLT
BTE caevo
DLT caevo
97/148
P
R
F1
7.95
1.96
3.15
8.16
0.56
1.05
14.10
4.90
7.28
7.12
14.59
5.37
7.85
26.42
4.44
7.60
20.67 10.95 14.31
17.56
4.86
7.61
17.02 12.09 14.13
Table 29: Results on the SemEval-2015 task
5.2
Storylines
The TimeLines discussed in the previous section form the basis for StoryLines. Stories are
a pervasive phenomenon in human life. They are explanatory models of the world and of its
happenings (Bruner, 1990). We make reference to the narratology framework of Bal (Bal,
1997) to identify the basic concepts which inform our model. Every story is a mention of a
fabula, i.e., a sequence of chronologically ordered and logically connected events involving
one or more actors. Actors are the agents, not necessarily humans, of a story that perform
actions. In Bal’s framework “acting” refers both to performing and experiencing an event.
Events are defined as transitions from one state to another. Furthermore, every story has a
focalizer, a special actor from whom’s point of view the story is told. Under this framework,
the term “story” is further defined as the particular way or style in which something is
told. A story, thus, does not necessarily follow the chronological order of the events and
may contain more than one fabula.
Extending the basic framework and focusing on the internal components of the fabula,
a kind of universal grammar can be identified which involves the following elements:
• Exposition: the introduction of the actors and the settings (e.g. the location);
• Predicament: it refers to the set of problems or struggles that the actors have to go
through. It is composed by three elements: rising action, the event(s) that increases
the tension created by the predicament, climax, the event(s) which creates the maximal level of tension , and, finally, falling action, the event(s) which resolve the climax
and lower the tension;
• Extrication: it refers to the “end” of the predicament and indicates the ending.
The model allows to focus on each its the components, highlighting different, though
connected, aspects: the internal components of the fabula are event-centered; the actors
and the focalizer allows access to opinions, sentiments, emotions and world views; and,
the medium to the specific genres and styles. We developed two different approaches to
February 1, 2016
98/148
Figure 40: Example of a StoryLine merging the TimeLines of the entities Steve Jobs and
Iphone 4.
create StoryLine structures that focus on different aspects of the fabula. The first approach
aggregates stories from separate TimeLines for different actors through co-participation.
The second approach aggregates stories from climax events and bridging relations with
other events that precede and follow the climax.
5.2.1
StoryLines aggregated from entity-centered TimeLines
Timelines as described in the previous section are built for single entities. However, stories
usually involve more than one entity. In this section, we present a proposal to create
StoryLines by merging the individual TimeLines of two or more different entities, provided
that they are co-participants of at least one relevant event.
In general, given a set of related documents, any entity appearing in the corpus is
a candidate to take part in a StoryLine. Thus, a TimeLine for every entity should be
extracted following the requirements described by the SemEval-2015 task. Then, those
TimeLines that share at least one relevant event must be merged. Those entities that do
not co-participate in any event with other entities are not considered participants of any
StoryLine. The expected StoryLines should include both the events where the entities
interact and the events where the entities selected for the StoryLines participate individNewsReader: ICT-316404
February 1, 2016
99/148
ually. The events must be ordered and anchored in time in the same way as individual
TimeLines, but it is also mandatory to include the entities that take part in each event.
Figure 40 presents the task idea graphically. In the example, two TimeLines are extracted using 5 sentences from 3 different documents, one for the entity Steve Jobs and
another one for the entity Iphone 4. As these two entities are co-participants of the events
introducing and introduced, the TimeLines are merged in a single StoryLine. As a result,
the StoryLine contains the events of both entities. The events are represented by the ID of
the file, the ID of the sentence, the extent of the event mention and the participants (i.e.
entities) of the event.
timelines from SemEval
storylines
events
events / storyline
interacting-events
interacting-events / storyline
entities
entities / storyline
Apple Inc. Airbus GM Stock Total
6
13
11
13
43
1
2
1
3
7
129
135
97
188
549
129
67.5
97
62.7
78.4
5
12
2
11
30
5
6
2
3.7
4.3
4
9
4
9
26
4
4.5
4
3
3.7
Table 30: Figures of the StoryLine gold dataset.
Dataset
As a proof-of-concept, we start from the dataset provided in SemEval-2015. It is composed
of 120 Wikinews articles grouped in four different corpora about Apple Inc.; Airbus and
Boeing; General Motors, Chrysler and Ford; and Stock Market. The Apple Inc. set of 30
documents serve as trial data and the remaining 90 documents as the test set. We consider
each corpus a topic to extract StoryLines. Thus, for each corpus, we have merged the
interacting individual TimeLines to create a gold standard for StoryLines. As a result of
this process, from a total of 43 TimeLines we obtained 7 gold-standard StoryLines spread
over 4 topics. Table 30 shows the distribution of the StoryLines and some additional figures
about them. Airbus, GM and Stock corpora are similar in terms of size but the number of
gold StoryLines varies from 1 to 3. We also obtain 1 StoryLine from the Apple Inc. corpus,
but in this case the number of TimeLines is lower. The number of events per StoryLine
is quite high in every corpus, but the number of interacting events is very low. Finally,
26 out of 43 target entities in SemEval-2015 belong to a gold StoryLine. Note that in
real StoryLines all interacting entities should be annotated whereas now we only use those
already selected by the TimeLines task.
Evaluation
The evaluation methodology proposed in SemEval-2015 is based on the evaluation metric
used for TempEval-3 (UzZaman et al., 2013a) which captures the temporal awareness of
February 1, 2016
100/148
an annotation (UzZaman and Allen, 2011). For that, they first transform the TimeLines
into a set of temporal relations. More specifically, each time anchor is represented as a
TIMEX3 so that each event is related to the corresponding TIMEX3 by means of the
SIMULTANEOUS relation. In addition, SIMULTANEOUS and BEFORE relation types
are used to connect the events. As a result, the TimeLine is represented as a graph and
evaluated in terms of recall, precision and F1-score.
As a first approach, the same graph representation can be used to characterize the
StoryLines. Thus, for this trial we reuse the same evaluation metric as the one proposed
in SemEval-2015. However, we already foresee some issues that need to be addressed for
a proper StoryLines evaluation. For example, when evaluating TimeLines, given a set of
target entities, the gold standard and the output of the systems are compared based on the
F1 micro average scores. In contrast, when evaluating StoryLines, any entity appearing
in the corpus is a candidate to take part in a StoryLine, and several StoryLines can be
built given a set of related documents. Thus, we cannot compute the micro-average of
the individual F1-scores for each StoryLine because the number of StoryLines is not set in
advance. In addition, we also consider necessary to capture the cases in which having one
gold standard StoryLine a system obtains more than one StoryLine. This could happen
when a system is not able to detect all the entities interacting in events but only some
of them. We consider necessary to offer a metric which takes into account this type of
outputs and also scores partial StoryLines. Obviously, a deeper study of the StoryLines
casuistry will lead to a more complete and detailed evaluation metric.
Example of a system-run
In order to show that the dataset and evaluation strategy proposed are ready to be used
on StoryLines, we follow the strategy described to build the gold annotations to implement
an automatic system. This way, we create a simple system which merges automatically
extracted TimeLines. To build the TimeLines, we use the system explained in section 5.1.3.
For each target entity, we first obtain the corresponding Timeline. Then, we check which
TimeLines share the same events. In other words, which entities are co-participants of the
same event and we build StoryLines from the TimeLines sharing events. This implies that
more than two TimeLines can be merged into one single StoryLine.
The system builds 2 StoryLines in the Airbus corpus. One StoryLine is derived from
the merging of the TimeLines of 2 target entities and the other one from the merging of 4
TimeLines. In the case of the GM corpus, the system extracts 1 StoryLine where 2 target
entities participate. For the Stock corpus, one StoryLine is built merging 3 TimeLines. In
contrast, in the Apple corpus, the system does not obtain any StoryLine. We evaluated
our StoryLine extractor system in the cases where it builts StoryLines. The evaluation
results are presented in Table 31.
Based on the corpus, the results of our strategy vary. The system is able to create
StoryLines which share data with the gold-standard in the Airbus corpus, but it fails to
create comparable StoryLines in the GM and Stock corpora. Finding the interacting events
is crucial for the extraction of the StoryLines. If these events are not detected for all their
participant entities, their corresponding TimeLines cannot be merged. For that reason,
February 1, 2016
Corpus
Airbus
GM
Stock
101/148
Precision
6.92
0.00
0.00
Recall Micro-F
14.29
4.56
0.00
0.00
0.00
0.00
Table 31: Results of the StoryLine extraction process.
our dummy system obtains null results for the GM and Stock corpus.
However, this is an example of a system capable of creating StoryLines. Of course,
more sophisticated approaches or approaches that do not follow the TimeLine extraction
approach could obtain better results.
5.2.2
Storylines aggregated from climax events
The above method aggregates StoryLines across TimeLines. However, it still does not
provide an explanatory notion for the sequences of events. In this section, we present a
model that starts from a climax event that motivates the selection of events. In our model
we use the term StoryLine to refer to an abstract structured index of connected events
which provides a representation matching the internal components of the fabula (rising
action(s), climax, falling action(s) and resolution). On the other hand, we reserve the term
Story for the textual expression of such an abstract structure34 . Our model, thus, does
not represent texts but event data from which different textual representations could be
generated. The basic elements of a StoryLine are:
• A definition of events, participants (actors), locations and time-points (settings)
• Anchoring of events to time
• A TimeLine (or basic fabula): a set of events ordered for time (chronological order)
• Bridging relations: a set of relations between events with explanatory and predictive
value(s) (rising action, climax and falling action)
Storylines are built on top of the instance level of representation, as illustrated in 2,
and TimeLines. Given a TimeLine for a specific period of time, we define a StoryLine S
as n-tuples T, E, R such that:
Timepoints = (t1 , t2 , , ..., tn )
Events = (e1 , e2 , ..., en )
Relations = (r1 , r1 , ..., rn )
34
Note that a StoryLine can be used to generate a textual summary as a story, comparable to (cross)document text summarization.
February 1, 2016
102/148
T consists of an ordered set of points in time, E is a set of events and R is a set of bridging
relations between these events. Each e in E is related to a t in T. Furthermore, for any
pair of events ei and ej , where ei precedes ej there holds a bridging relation [r, ei , ej ] in R.
We assume that there is a set of TimeLines L for every E, which is any possible sequence
of events temporally ordered. Not every temporal sequence l of events out of L makes a
good StoryLine. We want to approximate a StoryLine that people value by defining a
function that maximizes the set of bridging relations across different sequences of events l
in L. We therefore assume that there is one sequence l that maximizes the values for R and
that people will appreciate this sequence as a story. For each l in L, we therefore assume
that there is a bridging function B over l that sums the strength of the relations and that
the news StoryLine S is the sequence l with the highest score for B :
S(E) = M AX(B(l))
B(l)) =
n
X
C(r, ei , ej )
i,j=1
Our bridging function B sums the connectivity strength C of the bridging relations
between all time-ordered pairs of events from the set of temporally ordered events l. The
kind of bridging relation r and the calculation of the connectivity strength C can be filled
in many ways: co-participation, expectation, causality, enablement, and entailment, among
others. In our model, we leave open what type of bridging relations people value. This
needs to be determined empirically in future research.
The set L for E can be very large. However, narratology models state that stories are
structured around climax events. The climax event makes the story worthwhile to tell.
Other preceding and following events are grouped around the climax to explain it. It thus
makes sense to consider only those sequences l that include at least one salient event as a
climax and relate other events to this climax event. Instead of calculating the score B for
all l in L, we thus only need to build event sequences around events that are most salient
as a climax event and select the other events on the basis of the strength of their bridging
relation with that climax or with each other. For any climax event ec , we can therefore
define:
n
M AX(B(ec E)) = max C(r, ei , ec )
i=1
The climax value for an event can be defined on the basis of salience features, such as:
• prominent position in a source;
• number of mentions;
• strength of sentiment or opinion;
• salience of the involved actors with respect to the source.
February 1, 2016
103/148
An implementation should thus start from the event with the highest climax score.
Next, it can select the preceding event el with the strongest value for r. Note that this
is not necessarily the event that is most close in time. After that, the event el with the
strongest connectivity is taken as a new starting point to find any event ek preceding this
event with the highest value for r. This is repeated until there are no preceding events in
the TimeLine l. The result is a sequence of events up to ec with the strongest values for
r. The same process is repeated forward in time starting from ec and adding em with the
strongest connectivity value for r, followed by en with the strongest connectivity score r
to em . The result is a sequence of events with local maxima spreading from ec :
...ek , rmax , el , rmax , ec , rmax , em , rmax , en ...
This schema models the optimized StoryLine starting from a climax event. By ranking the
events also for their climax score, the climax events will occupy the highest position and
the preceding and following events the lower positions approximating the fabula or plot
graph shown in Figure 37.
Storyline Extraction System
The StoryLine extraction system is composed by three components: a.) TimeLine extraction; b.) climax event identification; c.) rising and falling actions identification. The
TimeLine structures are obtained from the system described in Caselli et al. (2015a).
Although, all events may enter in a TimeLine, including speech-acts such as say, not
every sequence of ordered events makes a StoryLine. Within the total set of events in a
TimeLine, we compute for each event its prominence on the basis of the mention sentence
number and the number of mentions in the source documents. We currently sum the
inverse sentence
P number of each mention of an event in the source documents:
P (e) =
(1/S(em )).
em =1→N
This formula combines the number of references made to an event with the position
in the text that a word is mentioned: early mentions counts more than late mentions and
more mentions make it more prominent. All event instances are then ranked according to
the degree of prominence P.
We implemented a greedy algorithm in which the most prominent event will become
the climax event.35 Next, we determine the events with the strongest bridging relation
preceding and following the climax event in an iterative way until there are no preceding
and following events with a bridging relation. Once an event is claimed for a StoryLine,
we prevent it from being re-used for another StoryLine. For all remaining events (not
connected to the event with the highest climax score), we again select the event with the
highest climax score of the remaining events and repeat the above process. Remaining
events thus can create parallel StoryLines although with a lower score. When descending
the climax scores, we ultimately are left with events with low climax score that are not
added to any StoryLine and do not constitute StoryLines themselves.
35
Future versions of the system can include other properties such as emotion or salience of actors
February 1, 2016
104/148
For determining the value of the bridging relations we use various features and resources,
where we make a distinction between structural and implicit relations:
• Structural relations:
– co-participation;
– explicit causal relations;
– explicit temporal relations;
• Implicit relations:
– expectation based on corpus co-occurrence data;
– causal WordNet relation;
– frame relatedness in FrameNet;
– proximity of mentions;
– entailment;
– enablement.
Our system can in principle use any of the above relations and resources. However, in
the current version, we have limited ourselves to co-participation and FrameNet frame
relations. Co-participation is the case when two events share at least one participant URI
which has a PropBank relation A0, A1 or A2. The participant does not need to have
the same relation in the two events. Events are related to FrameNet frames if there is any
relation between their frames in FrameNet up to a distance of 3. Below we show an example
of a larger StoryLine extracted from the corpus used in the SemEval 2015 TimeLine task.
:Airbus
29
3
[C]61
23
6
1
15
22
39
20040101
20041001
20040301
20050613
20050613
20050613
20061005
20070228
20070319
21
12
3
21
13
4
20
20070319
20070609
20070708
20080201
20090730
20041124
20141213
["manufacturer","factory","manufacture"] :Boeing:European_aircraft_manufacturer_Airbus:Airbus
["charge","level","kill"] :United_States_Department_of_Defense:the_deal
["purchase"] :People_s_Republic_of_China:Airbus_aircraft
["win"] :European_aircraft_manufacturer_Airbus:Boeing
["aid","assistance","assist"] :Airbus:Boeing:for_the_new_aircraft
["spark"] :Airbus
["compensate"] :Airbus:of_its_new_superjumbo_A380s
["cut","have","reduction","make"] :Airbus:the_company
["supply","provide","resource","supplier","fund","tube"] :European_Aeronautic_Defence_and_Space_Company_EADS_N.V.
:Airbus:United_States_Department_of_Defense
["carry","carrier"]:the_airplane:Airbus_will
["jet"]:Airbus:Airbus_A320_family
["write","letter"]:Airbus:Boeing
["ink","contract","sign"]:Royal_Air_Force:Airbus
["lead","give","offer"] :France:Airbus
["personnel","employee"] :Airbus:Former_military_personnel
["carry","flight","fly"] :The_plane:Airbus
Figure 41: Storyline for Airbus and Boeing from the SemEval 2015 Task 4 dataset.
The StoryLine is created from a climax event ["purchase"] involving Airbus with
a score of 61. The climax event is marked with C at the beginning of the line. After
connecting the other events, they are sorted according to their time anchor. Each line
February 1, 2016
Figure 42: Airbus StoryLines order
per climax event
105/148
Figure 43: Airbus StoryLine for climax event [61] “purchase”
is a unique event instance (between square brackets) anchored in time, preceded by the
climax score and followed by major actors involved.36 We can see that all events reflect the
commercial struggle between Airbus and Boeing and some role played by governments.
In Figure 42, we visualise the extracted StoryLines ordered per climax event. Every
row in the visualisation is a StoryLine grouped per climax event, ordered by the climax
score. The label and weight of the climax event are reported in the vertical axis together
with the label of the first participant with an A1 Propbank role, which is considered to
be most informative. Within a single row each dot presents an event in time. The size
of the dot represents the climax score. Currently, the bridging relations are not scored.
A bridging relation is either present or absent. If there is no bridging relation, the event
is not included in the StoryLine. When clicking on a target StoryLine a pop up window
opens showing the StoryLine events ordered in time (see Figure 43). Since we present
events at the instance level across different mentions, we provide a semantic class grouping
these mentions based on WordNet which is shown on the first line. Thus the climax
event “purchase” is represented with the label more general label “buy” that represents a
hypernym synset. If a StoryLine is well structured, the temporal order and climax weights
mimic the fabula internal structure, as in this case. We expect that events close to the
climax have larger dots than more distant events in time.37 Stories can be selected per
target entity through the drop-down menu on top of the graph. In the Figure 42, all stories
concerning Airbus are marked in red. An online version of this visualisation can be found
on the project website at http://ic.vupr.nl/timeline. You can upload NAF files from
which StoryLines are extracted or JSON files extracted from a collection of NAF files.
Comparing the StoryLine representation with the TimeLine (see Figure 38) some differences can be easily observed. In a StoryLine, events are ordered in time and per climax
weight. The selection of events in the StoryLine is motivated by the bridging relations
which exclude non-relevant events, such as say. We used the visualisation to inspect
the results. We observed that some events were missed because of metonymic relations
between participants, e.g. Airbus and Airbus 380 are not considered as standing in a coparticipation relation by our system because they have different URIs. In other cases, we
36
37
We manually cleaned and reduced the actors for reasons of space.
In future work, we will combine prominence with a score for the strength of the bridging.
February 1, 2016
106/148
see more or less the opposite: a StoryLine reporting on journeys by Boeing is interrupted
by a plane crash from Airbus due to overgenerated bridging relations. What is the optimal
combination of features still needs to be determined empirically.
Storyline Evaluation: Unsolved Issues
At this stage phase we are not able to provide an extensive evaluation of the system yet.
Evaluation methods for StoryLines are not trivial. Most importantly, they cannot be evaluated with respect to standard measures such as Precision and Recall. The value of a story
depends a lot on the interest of a user. Evaluation of StoryLines thus should be based on
relevance rather than precision and recall. In this section, we describe and propose a set
of evaluation methods to be used as a standard reference method for this kind of tasks.
The evaluation of a StoryLine must be based, at least, on two aspects: informativeness
and interest. A good StoryLine is a StoryLine which interests the user, provides all relevant
and necessary information with respect to a target entity, and is coherent. We envisage
two types of evaluation: direct and indirect. Direct evaluation necessarily needs human
interaction. This can be achieved in two ways: using experts and using crowdsourcing
techniques.
Experts can evaluate the data provided with the StoryLines with respect to a set of
reference documents and check the informativeness and coherence parameters. Following
Xu et al. (2013), two types of questions can be addressed at the micro-level and at the
macro-level of knowledge. Both evaluation types address the quality of the generated StoryLines. The former addresses the efficiency of the StoryLines in retrieving the information
while the latter addresses the quality of the StoryLines with respect to a certain topic (e.g.
the commercial “war” between Boeing and Airbus). Concerning metrics, micro-knowledge
can be measured by the time the users need to gather the information, while the macroknowledge can be measured as text proportion, i.e. how many sentences of the source
documents composing the StoryLine are used to write a short summary.
Crowdsourcing can be used to evaluate the StoryLines by means of simplified tasks.
One task can ask the crowd to identify salient events in a corpus and then validate if the
identified events correlate with the climax events of the StoryLines.
Indirect evaluation can be based on a cross-document Summarization tasks. The ideal
situation is the one in which the StoryLine contains the most salient and related events and
nothing else. These data sets can be used either to recover the sentences in a collection of
documents and generate an extractive summary (story) or used to produce an abstractive
summary. Summarization measures such as ROUGE can then be used to evaluate the
quality of summaries and, indirectly, of the StoryLines (Nguyen et al., 2014; Huang and
Huang, 2013; Erkan and Radev, 2004).
February 1, 2016
5.2.3
107/148
Workshop on Computing News Storylines
The notion of computational StoryLines for streams of news articles is new. We organised
a workshop (Caselli et al., 2015b)38 at ACL in 2015 to discuss this as a new paradigm for
research. The workshop brought together researchers from different communities working
on representing and extracting narrative structures in news, a text genre which is highly
used in NLP but which received little attention with respect to narrative structure, representation and analysis. Currently, advances in NLP technology have made it feasible
to look beyond scenario-driven, atomic extraction of events from single documents and
work towards extracting story structures from multiple documents, while these documents
are published over time as news streams. Policy makers, NGOs, information specialists
(such as journalists and librarians) and others are increasingly in need of tools that support them in finding salient stories in large amounts of information to more effectively
implement policies, monitor actions of ”big players” in the society and check facts. Their
tasks often revolve around reconstructing cases either with respect to specific entities (e.g.
persons or organizations) or events (e.g. hurricane Katrina). We received 12 submissions
and accepted 9. Overall, we had 20 participants to the workshop. The two approaches developed in NewsReader were also presented at the workshop. In final discussion additional
lines of research around questions such as:
• which properties do make a sequence of events a story?
• how can we identify the salience of events and differentiate it from more subjective
notions such as interestingness and importance?
• is there a “pattern of type of events” which guides the writing of stories in news?
• how can we move from an entity-based approach to StoryLines to coarser-grained
representations?
• what is the best granularity of representation of a StoryLine?
• is a gold standard dataset feasible and useful for evaluating StoryLines?
We submitted a follow-up workshop proposal for ACL2016 in Berlin.
38
https://sites.google.com/site/computingnewsstorylines2015/
February 1, 2016
6
108/148
Perspectives
In this section, we describe the design and first steps towards the implementation of a
perspective and attribution module. The implementation so far is preliminary, and will
be updated in the coming period. The attribution module described in 6.2 is currently
restricted to factuality. Other attribution values will be derived in further updates.
6.1
Basic Perspective Module
Events are distinguished into contextual events about the domain and source events;
speech-acts and cognitive events that relate sources to the statements about the contextual
events. In NewsReader, we represent both types of events as instances. They are both
events involving participants and bound by time and possibly a place. The perspective
layer is generated on top of this initial event representation. Perspectives are complex: they
consist of what someone says (the information they choose to provide) and how he/she sais
it (choices in expression, opinions and sentiment). A complete representation of perspectives thus includes the basic information provided, who it is provided by (the author or a
quoted source), the certainty with which the source makes a statement, whether the source
is speculating about the future, confirming or denying something, uses plain language or
expressions that carry sentiment, expresses an explicit opinion, etc. Within NewsReader,
we focus on identifying the source and establishing attribution values relating to factuality
(how certain, future or not, and confirmed or denied) and the sentiment. This information
is obtained using information from various layers of the NAF representation.
Perspectives are expressed through mentions in text. We find both cases where various
sources confirm a perspective and other cases where a different perspective is expressed on
the same contextual statement: for example, this would be the case if one source denies
a statement while another source confirms it. Moreover, the same source can express
different perspectives over time. Because we (1) only aim to represent the perspective a
source expresses and do not wish to make a claim about the perspective (s)he has and (2)
for this reason feel that perspectives are linked to mentions, we also model perspectives at
a mention level.
In our current setup and model, a perspective consists of:
• The instance representation of the source of a statement, e.g. the CEO of Porsche
Wiedeking.
• The statement made by the source in the form of triples representing an instance of
a contextual event, e.g. Porsche increasing sales in 2004.
• A mention of the statement (which is linked to the original statement)
• The attribution values that define the relation between the source and the contextual
statement: denial/confirmation, positive/negative, future/current, certain/probable/possible, etc.
February 1, 2016
109/148
• A link to the source of the mention which can either be the document it is situated
in (which in turn is linked to an author or publisher) or a quoted source.
According to this specification, we abstract from how the source made the statement, e.g.
saying, shouting, thinking, which is represented by the source event itself that is the basis
for deriving the perspective. However, we do represent implications that can be derived
from the way in which the source is made, e.g. promise to implies future, claim implies
certainty, guess implies that the source thinks something is probable and hope that it is
possible, therefore is positive.
To derive the perspective relations between sources and the contextual statement, we
implemented a perspective module that takes several layers as input:
• source events with entities that have roles like prop:A0, fn:Statement@Speaker,
fn:Expectation@Cognizer, fn:Opinion@Cognizer.
• the contextual events that can be related to the source event through a message or
topic role as defined in FrameNet or PropBank.
• the attribution layer in NAF that indicates the future/non-future tense, the confirmation or denial and/or the certainty of the contextual event according to the
source.
• the opinion layer in NAF that indicates whether the source has a positive or negative
opinion with respect to some target.
• the NAF header which provides information about the magazine, publisher and author (if known).
In order to combine these pieces of information, we need to intersect the layers. In this
first version of the module, we use a very basic and simple algorithm to do this:
1. For all source events in the triples we check if it has a valid relation to an object
of the proper semantic type such as person or organization. Valid relations are
e.g. pb:A0, fn:Statement@Speaker, fn:Expectation@Cognizer, fn:Opinion@Cognizer.
If not we ignore it.
2. We access the SRL layer in NAF to check if the event mention also has message
role, e.g. fn:Statement@Message or fn:Statement@Topic. If no such role is found we
ignore the source event.
3. We take the span of the message role and we check for any triples in the contextual
data to see if they have mentions equal to or embedded within the span of the
message. For all these triples, we create a perspective relation between the source
and contextual triples.
February 1, 2016
110/148
4. We check the attribution layer of each NAF file from which the source events and
selected contextual events originate to see of there are attribution properties assigned
to the event mention. If so, we copy them to the perspective relation.
5. We check the opinion layer in NAF to find opinion-targets. Next, we check if the
spans of the above contextual triples also match or are embedded in the span of the
opinion-target. If there is a match, we take over the opinion value (positive/negative)
as a value for the perspective relation.
The module can be adapted by defining the role constraints in the algorithm.
After applying the above algorithms there will be event mentionsz with an explicit
perspective value assigned to an explicit source and another set of event mentions for
which no perspective is represented. The latter mentions will be assigned to the document
as a source. We use the document URI as the source URI. If the event mention is within
an opinion-target we represent the sentiment as a value for the attribution relation. If
there is a factuality value associated with the mention, we also assign it as a value of
the attribution relation. The author, publisher and magazine are meta-properties of the
document URI. Finally, all event mentions are either assigned to the document using
prov:wasAttributedToor to an explicit source in the document using gaf:wasAttributedTo.
Let us consider the following example to illustrate the algorithm:
Chrysler expects to sell 5,000 diesel Liberty SUVs,
President Dieter Zetsche says at a DaimlerChrysler Innovation Symposium in New York.
There are two perspectives expressed with respect to the selling of 5,000 diesel Liberty
SUVs. First, the fact that this is an expectation of Chrysler and, secondly, that this is
a statement made by the President Dieter Zetsche. Our text processing generates the
following 3 predicate-role structures for this sentence as shown in Figure 44 starting from
line 1, 26 and 50 respectively, where we abbreviated some of the lists of span elements.
Next, the NAF2SEM modules generates two source events (line 1 and line 30) from this
data involving the entities Chrysler (line 6) and Zetsche (line 26), as shown in Figure 45.
The relations between the events and the entities are expressed starting from line 15 for
expect and line 40 for say.
Both source events expect and say meet the first constraints that they have an entity of the
proper type with a role of the type source: fn:Expectation@Cognizer and fn:Statement@Speaker.
Within the set of contextual triples, we find the event sell and its corresponding triples as
shown in Figure 46.
Next, we intersect the mentions of the contextual event (line 4) with the role layer to
see if they can be connected to the source events in the proper way. The SRL has roles for
Expectation@Topic, Statement@Topic and Statement@Message. Their spans are defined
as a list of term identifiers that need to be matched with tokens that can be matched
with their offsets. In this case, we can conclude that the offsets for sell, Chrysler and
Livery SUVs match with these roles. Therefore, the software decides that these triples fall
within the scope of the perspective.
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
111/148
<!−−t 3 6 e x p e c t s : A0 [ t 3 5 C h r y s l e r ] A1 [ t 3 7 t o]−−>
 <!−−e x p e c t s −−>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” E x p e c t a t i o n ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”O p i n i o n ”/>
<e x t e r n a l R e f r e s o u r c e =”EventType ” r e f e r e n c e =” c o g n i t i o n ”/>
< r o l e i d =” r l 1 6 ” semRole=”A0”> <!−−C h r y s l e r −−>
<t a r g e t i d =”t 3 5 ” head=” y e s ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”E x p e c t a t i o n @ C o g n i z e r ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”O p i n i o n @ C o g n i z e r ”/>
</ r o l e >
< r o l e i d =” r l 1 7 ” semRole=”A1”> <!−−t o s e l l 5 , 0 0 0 d i e s e l L i b e r t y SUVs−−>
<t a r g e t i d =”t 3 7 ” head=” y e s ”/ >... < t a r g e t i d =”t 4 2 ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Expectation@Phenomenon”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”E x p e c t a t i o n @ T o p i c ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Opinion@Topic”/>
</ r o l e >
<!−−t 3 8 s e l l : A0 [ t 3 5 C h r y s l e r ] A1 [ t 3 9 5,000]−−>
 <!−− s e l l −−>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”C o m m e r c e s e l l ”/>
<e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” S e l l i n g ”/>
< r o l e i d =” r l 1 8 ” semRole=”A0”> <!−−C h r y s l e r −−>
<t a r g e t i d =”t 3 5 ” head=” y e s ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” C o m m e r c e s e l l @ S e l l e r ”/>
<e x t e r n a l R e f r e s o u r c e =”ESO” r e f e r e n c e =” S e l l i n g @ p o s s e s s i o n −o w n e r 1 ”/>
</ r o l e >
< r o l e i d =” r l 1 9 ” semRole=”A1”> <!−−5,000 d i e s e l L i b e r t y SUVs−−>
<t a r g e t i d =”t 3 9 ”/ >... < t a r g e t i d =”t 4 2 ” head=” y e s ”/>
<e x t e r n a l R e f r e s o u r c e =”VerbNet ” r e f e r e n c e =” g i v e −13.1@Theme”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Commerce sell@Goods”/>
<e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” s e l l . 0 1 @1”/>
</ r o l e >
<!−−t 4 7 s a y s : A1 [ t 3 5 C h r y s l e r ] A0 [ t 4 4 P r e s i d e n t ] AM−LOC [ t 4 8 a t]−−>
 <!−−s a y s −−>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”S t a t e m e n t ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” T e x t c r e a t i o n ”/>
< r o l e i d =” r l 2 0 ” semRole=”A1”>
<!−− C h r y s l e r e x p e c t s t o s e l l 5 , 0 0 0 d i e s e l L i b e r t y SUVs−−>
<t a r g e t i d =”t 3 5 ”/> <t a r g e t i d =”t 3 6 ” head=” y e s ”/ >... < t a r g e t i d =”t 4 2 ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Statement@Message”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Statement@Topic”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =” T e x t c r e a t i o n @ T e x t ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”Choosing@Chosen”/>
</ r o l e >
< r o l e i d =” r l 2 1 ” semRole=”A0”> <!−− P r e s i d e n t D i e t e r Z e t s c h e −−>
<t a r g e t i d =”t 4 4 ”/ >... < t a r g e t i d =”t 4 6 ” head=” y e s ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”S t a t e m e n t @ S p e a k e r ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”T e x t c r e a t i o n @ A u t h o r ”/>
<e x t e r n a l R e f r e s o u r c e =”FrameNet ” r e f e r e n c e =”C h o o s i n g @ C o g n i z e r ”/>
</ r o l e >
< r o l e i d =” r l 2 2 ” semRole=”AM−LOC”>
<!−−a t a D a i m l e r C h r y s l e r I n n o v a t i o n Symposium i n New York−−>
<t a r g e t i d =”t 4 8 ” head=” y e s ”/ >... < t a r g e t i d =”t 5 5 ”/>
</ r o l e >
Figure 44: Semantic Role elements in NAF for expect, say and sell
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
112/148
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#e v 1 6 E x p e c t
a
sem : Event , f n : E x p e c t a t i o n , f n : O p i n i o n , n w r o n t o l o g y : SPEECH COGNITIVE ;
rdfs : label
” expect ” ;
g a f : denotedBy
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r = 2 0 8 , 2 1 5 .
dbp : r e s o u r c e / C h r y s l e r
rdfs : label
” C h r y s l e r ” , ” C h r y s l e r Group ” ;
g a f : denotedBy
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r = 3 6 , 5 0 ,
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =740 ,748 ,
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =199 ,207 ,
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =1114 ,1122 >
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =130 ,132 .
,
nwr : / d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s
a
n w r o n t o l o g y : MISC ;
rdfs : label
” Liberty suvs ” ;
g a f : denotedBy
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r =237 ,249
fn : Expectation@Cognizer
dbp : r e s o u r c e / C h r y s l e r ;
f n : Expectation@Phenomenon
nwr : / d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s ;
fn : Expectation@Topic
nwr : / d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s ;
f n : Opinion@Topic
.
.
dbp : r e s o u r c e / D i e t e r Z e t s c h e
rdfs : label
” Dieter Zetsche ” ;
g a f : denotedBy
.
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#ev11Say
a
sem : Event , n w r o n t o l o g y : SPEECH COGNITIVE , f n : S t a t e m e n t ;
rdfs : label
” say ” ;
g a f : denotedBy
.
nwr : / d a t a / c a r s / e n t i t i e s / D a i m l e r C h r y s l e r I n n o v a t i o n S y m p o s i u m
a
n w r o n t o l o g y : ORGANIZATION ;
rdfs : label
” D a i m l e r C h r y s l e r I n n o v a t i o n Symposium ” ;
g a f : denotedBy
.
f n : Statement@Speaker
dbp : r e s o u r c e / D i e t e r Z e t s c h e ;
sem : h a s P l a c e
nwr : / d a t a / c a r s / e n t i t i e s / D a i m l e r C h r y s l e r I n n o v a t i o n S y m p o s i u m .
Figure 45: SEM-RDF extracted from in NAF for expect, say and sell
1
2
3
4
5
6
7
8
9
10
11
12
13
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#e v 1 7 S e l l
a
sem : Event , f n : C o m m e r c e s e l l , e s o : S e l l i n g ;
rdfs : label
”sell” ;
g a f : denotedBy
.
rdfs : label
” Liberty suvs ” ;
g a f : denotedBy
fn : Commerce sell@Seller
f n : Commerce sell@Goods nwr : d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s
.
Figure 46: SEM-RDF extracted from in NAF for expect, say and sell
February 1, 2016
1
2
3
4
5
6
113/148
<o p i n i o n i d =”o1”>
<o p i n i o n e x p r e s s i o n p o l a r i t y =” p o s i t i v e ” s t r e n g t h =”1”>
<!−− P r e s i d e n t D i e t e r Z e t s c h e s a y s a t a D a i m l e r C h r y s l e r I n n o v a t i o n Symposium i n New York .−−>
 <t a r g e t i d =”t 4 4 ” / > . . . . <t a r g e t i d =”t 5 6 ”/> 
</ o p i n i o n e x p r e s s i o n >
</ o p i n i o n >
Figure 47: Opinion element in NAF
In a similar way, we check for opinions and attribution values to fill in further details
of the perspective relation. For this example, we find the following opinion information as
shown in Figure 47.
The span of the opinion expression matches event say. From this we derive a positive
value for the attitude of the source towards the triples in its scope.
6.2
Factuality module
The attribution values between a source and the target events are derived from the opinion
layer and the factuality layer. In this section, we describe the factuality module that was
developed for NewsReader. The description of the opinion module can be found in Agerri
et al. (2015). We first describe how factuality needs to be modeled within the attribution
module, then describe the current implementation of the module that identifies these values
and conclude this section with an outline of future work.
6.2.1
Event factuality
Event factuality is a property of the events expressed in a (written or oral) text. We follow
Saurı́ (2008) conception of event factuality, which is understood as the level of information
expressing the commitment of relevant sources towards the factual nature of eventualities
in text. That is, it is in charge of conveying whether eventualities are characterised as
corresponding to a fact, to a possibility, or to a situation that does not hold in the world.
The term eventualities is used here to refer to events, which can be processes or states.
The main characteristics of events are that they have a temporal structure and a set of
participants.
Factuality is not an absolute property, but it is always relative to a source, since events
are always presented from the point of view of someone. The source does not need to
be the author of a text, several sources can be reporting about the same event and the
same source can assign different factuality values to an event along different points in time.
Additionally, we assume that factuality value assignments are made at a specific point in
time.
We also follow Saurı́ in considering three factuality components: source, time and
factuality value. Source refers to the entity that commits to the factuality value of a
certain event. Time is relevant because the factuality values of an event can change not
only depending on the source but also along time. Furthermore, we assume that any
statement made about the future is speculation to a certain extent. It should be noted that,
February 1, 2016
114/148
even though time of the statement and source are considered inherent parts of factuality
components, the current implementation only focuses on factuality values and relative tense
(i.e. did the source talk about the future or not). Time expressions and source identification
are handled by separate components. The previous section explained how we determine
the source of given information.
The factuality values will be characterized across three dimensions: polarity, certainty
and tense. The certainty dimension measures to which extent the source commits to
the correlation of an event with a situation in the world, whereas the polarity dimension
encodes whether the source is making a positive or a negative statement about an event
happening in the world. The certainty dimension can be described as a continuum ranging
from absolute certain to absolute uncertain. For the sake of simplicity, here we will consider
it as a discrete category with three values, certain, probable and possible, following Saurı́.
Polarity is a discrete category which can have two values, positive or negative. Additionally
both categories have also an underspecified value, for cases in which there is not enough
information to assign a value. Events will be assigned one value per dimension. Finally,
our “tense” dimension simply indicates whether a statement is about the future or not.
Certainty
certain (CERT)
probably (PROB)
possible (POSS)
unknown (U)
Polarity
positive (POS)
negative (NEG)
unknown (U)
Tense
non-future (NONFUT)
future (FUT)
unknown (U)
Table 32: Certainty, polarity and tense values
6.2.2
Identifying factualities
Factuality is often expressed by multiple elements in an expression which do not necessarily stand right next to the event the factuality values apply to. A (highly) summarized
overview of potentially relevant information is provided below:
• Tense, aspect
• Lexical markers
– Polarity markers: no, nobody, never, fail, ... . They can act at different structural levels. At the clausal level they scope over the event-referring expression;
at the subclausal level they affect one of the arguments of the event; at the
lexical level by means of affixes. Polarity markers can negate the predicate
expressing the event, the subject, the direct or indirect object.
– Modality markers (epistemic or deontic) include verbal auxiliaries, adverbials
and adjectives: may, might, .perhaps, possible, likely, hopefully, hopeful, ...
February 1, 2016
115/148
– Commissive and volitional predicates (offered, decided) assign the value underspecified to the subordinated event.
– Event selecting predicates (ESP) are predicates that select for an argument
denoting an event. Syntactically they subcategorise for a that-, gerundive or
infinitival clause: claim, suggest, promise, request, manage, finish, decide, offer,
want, etc. ESP project factuality information on the event denoted by its argument. Depending on the type of ESP they project different factuality values.
For example, prevent projects a counterfactual value, while manage projects a
factual value. Some ESP such as claim, are special in that they assign a factuality value to the subcategorised event and at the same time they express who
is the source that commits to that factuality value, without the author of the
statement committing to it. ESP can be source (SIP) or non-source introducing
predicates (NSIP). SIP such as suspect or know are ESP that contribute an
additional source relative to which the factuality of the subcategorised event is
assessed.
• Some syntactic constructions can introduce a factuality value.
– In participial adverbial clauses, the event in the subordinated clause is presupposed as true (e.g. Having won Slovenia’s elections, political newcomer Miro
Cerar will have to make tough decisions if he is to bring stability to a new
government).
– In purpose clauses, the main event is presented as underspecifed (e.g. Government mobilizes public and private powers to solve unemployment in the country).
– In conditional constructions, the factuality value of the main event in the consequent clause is dependent on the factuality of the main event in the antecedent
clause (e.g. If this sentence is not true then it is true).
– In temporal clauses, the event is presupposed to be certain in most cases (e.g.
While the main building was closed for renovation, the architects completed the
Asian Pavilion).
Additionally, some syntactic constructions act as scope islands, which means that
the events in that construction cannot be affected by markers which are outside the
constructions and at the same time the markers in the construction cannot scope
over the events which are outside the construction.
– Non-restrictive relative clauses (e.g. The new law might affect the house owners
who bought their house before 2002 ).
– Cleft sentences (e.g. It could have been the owner who replaced the main entrance
door ).
February 1, 2016
116/148
An event can be within the scope of one or more lexical markers, which means that
in order to compute its factuality, the values of all the markers that scope over it have to
be considered, as well as the ways in which markers interact with each other. It is also
necessary to take into account that some syntactic contexts establish scope boundaries.
The next subsection describes our current factuality module and explains how relevant
factors are integrated in the model.
6.2.3
Factuality module
The factuality module consists of two main components. The first component is a machine
learning component that establishes the certainty and polarity of the event based on a
model trained on FactBank. The second component provides a rule-based interpretation
of the tense of the event indicating whether it is future, past or present or unknown. The
rule based component simply checks the explicit tense marking on the verb or, in case of a
nominal event, the tense marking on the verb that governs it. If no tense values are found
this way, the value is set to ‘unknown’.
The machine learning component improves on the previously existing module in three
ways. FactBank contains several layers of annotation for factuality values. The factuality
values assigned by the direct source making the statement and all factuality values assigned
by any other source. This means that when someone is quoted in an article, the factuality
values from the quoted source are provided as well as the factuality values of the author
of the article. Because the author typically does not provide explicit indication of whether
he or she agrees with their source, the values associated with the author tend to be underspecified for both polarity and certainty. As a result, factuality values at the highest level
(those attributed to the author) are almost exclusively certain-positive and underspecifiedunderspecified. The first difference between the new module and the old one is that we
train on the most embedded layer. This provides more variety in factuality values. The
second difference is that the values are translated from the joint certainty-polarity values
found in FactBank to the individual dimensions used in NewsReader annotations. This
allows us to experiment with training each dimension separately. The third difference is
that we use a much more elaborate set of features based on the relevant elements outlined
above.
The following features are used in the current system:
• lemma and surface form of event and words in direct context
• lemma of the event’s head
• dependency relation
• POS and morphological information about the event and the head word
• dependency chain
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
117/148
< f a c t u a l i t y i d =” f 1 ”>
<f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n T e n s e ” v a l u e =”UNDERSPECIFIED”/>
<f a c t V a l r e s o u r c e =” f a c t b a n k ” v a l u e =”NONE”/>
<f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n C e r t a i n t y ” v a l u e =”CERTAIN”/>
<f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n P o l a r i t y ” v a l u e =”POS”/>
</ f a c t u a l i t y >
< f a c t u a l i t y i d =” f 2 ”>
<f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n T e n s e ” v a l u e =”NON FUTURE”/>
<f a c t V a l r e s o u r c e =” f a c t b a n k ” v a l u e =”CT+”/>
< f a c t u a l i t y i d =” f 2 0 ”>
 <t a r g e t i d =”t 8 1 ”/>
<f a c t V a l r e s o u r c e =”nwr : a t t r i b u t i o n T e n s e ” v a l u e =”FUTURE”/>
<f a c t V a l r e s o u r c e =” f a c t b a n k ” v a l u e =”CT+”/>
Figure 48: NAF examples for factuality
It should furthermore be noted that the system was trained on the events that the
NewsReader pipeline identified in the FactBank data. Events that are not marked as
such in FactBank receive ’NONE’ as a value, since the gold values for these are unknown.
These events form a significant part of the data and the ’NONE’ value is regularly found
by the classifier. We add the interpretation certain and positive to these events, because
this combination forms a strong majority class. Our pipeline identifies more nominal
event references than found in FactBank and the majority class is even stronger for these
nominal references, which further justifies this decision. The output does still indicate that
the original value found by the classifier was ‘NONE’, so that one can distinguish between
these default interpretations and cases where the value was assigned by the classifier.
Figure 48 shows some output examples. The span elements need to be matched with the
event spans to decide on the events to which the factuality applies.
6.2.4
Future work
The module described above covers all main factuality values we are interested in and
takes the most relevant information that can influence factuality into account. It thus
forms a solid basis for investigating factuality detection in text. It is, however, only the
first version of this new approach. In future work, we will work in two directions to improve
the model. First, we will experiment with various forms of training data. This involves
not considering events that are not in FactBank and thus avoiding the ‘NONE’ class as
well as extending the set so that the machine learner can also identify tense features.39
Second, we will experiment with a variety of features aiming for information that is more
specifically linked to factuality markers.
Currently, the factuality module only handles English. In order to adapt the module
to other languages new factuality lexicons will be needed and the syntactic rules to find
39
Note that excluding the NONE-class needs to be taken into account when evaluating on FactBank:
for events that are identified by the pipeline, but that are not in FactBank, we have no idea how well the
classification behaves.
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
118/148
rdfs : label
” expect ” ;
g a f : denotedBy
fn : Expectation@Cognizer
dbp : r e s o u r c e / C h r y s l e r .
rdfs : label
” say ” ;
g a f : denotedBy
f n : Statement@Speaker
dbp : r e s o u r c e / D i e t e r Z e t s c h e ;
sem : h a s P l a c e
nwr : / d a t a / c a r s / e n t i t i e s / D a i m l e r C h r y s l e r I n n o v a t i o n S y m p o s i u m .
rdfs : label
”sell” ;
g a f : denotedBy
fn : Commerce sell@Seller
f n : Commerce sell@Goods nwr : d a t a / c a r s / e n t i t i e s / L i b e r t y S U V s .
.
.
Figure 49: Simplified SEM-RDF triples for events as named-graphs
the scopes of factuality markers will have to be adapted. The lexicons can be probably
translated from the English lexicons and manually revised by a linguistics expert in order
to check whether the same factuality behaviour applies. More costly might be to adapt the
syntactic rules, though part of the cost can be reduced by preprocessing the documents
with parsers that have models for several languages (such as MaltParser). We will make a
predevelopment analysis of the cost of adapting the processor to other languages in order
to design it in such a way that the cost can be maximally reduced.
6.3
A perspective model
The previous sections described how we extract the source of events and how we identify
factuality attributes of events. In this subsection, we explain how we combine this information to represent perspectives in RDF. As mentioned, we associate perspective information
with mentions. After all, people can change their perspective so that a single source may
make incompatible statements about an event. To capture this fact, we represent all information related to perspectives (for now: source, factuality and sentiment) in the mention
layer.
The triples below illustrate how this looks like for the example from Section 6.1, repeated here for convenience:
Chrysler expects to sell 5,000 diesel Liberty SUVs,
President Dieter Zetsche says at a DaimlerChrysler Innovation Symposium in New York.
Consider the following triples associated with these statements in Figure 49 for the
events expect, say and sell, which is a reduced representation according to the SEM-RDF
format. They represent the labels, the SEM relations and the mentions of the three events.
We know from our perspective interpretation algorithm that the source of the statement about selling SUVs (line 13) is Chrysler. Chrysler introduced the statement through
its expectation (line 7, 8 and 9). The source of this expectation is Dieter Zetsche (lines
1 till 4). The factuality module should tell us that Zetsche’s statement (saying) is CERTAIN, POSITIVE and NON-FUTURE according to the article. Zetsche assigns the same
factuality values to Chrysler’s expecting SUV sales. On the other hand, the selling is
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
119/148
# meta d a t a p r o p e r t i e s on t h e document : a u t h o r , magazine , p u b l i s h e r
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml
p r o v : w a s A t t r i b u t e d T o <h t t p : / /www. n e w s r e a d e r −p r o j e c t . eu / p r o v e n a n c e / m a g a z i n e / autoweek . com> .
#a t t r i b u t i o n
: ev11Say
of
Dieter Zetsche
g a f : denotedBy
says
gaf : hasAttribution
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l d A t t r 1
.
.
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l d A t t r 1
rdf : value
g a f : CERTAIN NON−FUTURE POS
;
prov : wasAttributedTo
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml .
#a t t r i b u t i o n o f C h r y s l e r e x p e c t s
: e v 1 6 E x p e c t g a f : denotedBy
.
gaf : generatedBy
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l s A t t r 2
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l s A t t r 2
rdf : value
g a f : CERTAIN NON−FUTURE POS
g a f : wasAttributedTo
dbp : r e s o u r c e / D i e t e r Z e t s c h e .
;
.
;
#a t t r i b u t i o n o f s e l l
: e v S e l l g a f : denotedBy nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . xml#c h a r = 2 1 9 , 2 2 3 .
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l s A t t 3 ;
gaf : generatedBy
nwr : / d a t a / c a r s / 2 0 0 3 / 0 1 / 0 1 / 4 7VH−FG30−010D−Y3YG . x m l s A t t 3
rdf : value
g a f : PROBABLE FUTURE POS,
g a f : wasAttributedTo
dbp : r e s o u r c e / C h r y s l e r

.
.
;
Figure 50: Perspective RDF triples for event mentions
PROBABLE, POSITIVE and FUTURE (according to Chrysler). In principle, each statement comes from a source and has factuality values.40 In some cases, we have multiple
pieces of information that constitute the source (e.g. author, magazine and publisher of
an article). In order to avoid massive multiplications of triples, we attribute statements
and factuality values that come directly from the article to the article itself. In turn, the
article can be attributed to an author, publisher or magazine. This way, information is
not repeated for each mention.
Figure 50 shows the output triples for the provenance and factuality of the statements
about Zetsche and Chrysler from our example, attached to the mentions of the events
given in Figure 49. Starting with line 1, we first give the document source properties
through the prov:wasAttributed predicate. The sentence originates from the autoweek.com
website which is given as the magazine of the source text. The source text is represented
through its NewsReader URL. Next, we find attribution relations for each event mention.
For convenience, we repeat the gaf:denotedBy triple that links the mention to the event
URI: lines 6, 16, 27. Each mention is then associated with specific factuality values.
These factuality values also come from the source of the mention. We therefore use an
intermediate node related to rdf:value to model factuality and sentiment values and
provenance of the statement. This means that each event mention is linked to its own
attribution node. This attribution node is linked to the factuality/sentiment values of the
40
Note that unknown factuality values are also factuality values. For instance, the sentence I do not
know whether Zetsche said that forms a case where polarity and certainty are ’unknown’ by the source.
February 1, 2016
120/148
statement and to the source of the mention.
Specifically, line 9 shows the attribution triple for say, line 20 for expect and line 30
for sell. The attribution itself is a URI for which we can define any set of properties. For
example, we can see in lines 11, 12, 13 that :dAttr1 has rdf:value gaf:CERTAIN NONFUTURE POS and that it is attributed (gaf:hasAttribution) to the document. The
document properties then point to the magazine autoweek.com. We can also see (lines 18,
19, 20) that :sAttr2 has the same rdf:value but is assigned to the source Dieter Zetsche.
Finally, :sAtt3 is attributed to the source Chrysler, has factuality values
gaf:PROBABLE FUTURE POS and the sentiment value positive.
Note that we distinguish between attribution to the source that printed the statement
from attribution to a source mentioned in text, where we use prov:wasAttributedTo to
indicate the provenance of the article and gaf:wasAttributedTo to indicate the provenance of a quoted source. The attribution relation defined in the grounded annotation
framework establishes the same relation between subject and object as the original relation from the PROV-O (Moreau et al., 2012). The difference is that in the case of
prov:wasAttributedTo, we are modelling the fact that we pulled the information from a
specific source. In the case of gaf:wasAttributedTo we are modelling the fact that the
information is attributed to a specific source by someone else.
The factuality component indicates which factuality value is associated with the event
or statement. We use composed values that contain the three factuality elements described above: certainty, polarity and tense. The ontology defines each of these values
separately providing the components of the complex values used in the example above
(e.g. CERTAIN NON FUTURE POS has properties certainty CERTAIN, polarity POSITIVE and tense NON-FUTURE). As the information on perspectives grows by adding
information such as specific emotions, the ontology will be extended to contain not only
these new values, but also more combined values so that we can continue using the compact
perspective representation that is presented here. However, the model is flexible enough
to also allow separated values for different perspective values, so that we can separate
sentiment and emotion from factuality in our representation.
Finally, the current implementation creates a perspective RDF file in addition to the
SEM RDF output. To connect the mentions of events in the perspective RDF with the
event instances in the SEM RDF, a query needs to be formulated that matches the perspective mentions with the gaf:denotedBy triples in SEM RDF.
February 1, 2016
7
121/148
Cross-lingual extraction
The processing of text in NewsReader takes place in two steps. First, language-specific
processors interpret text in terms of mentions of entities, events, time-expressions and
relations between them. Secondly, we create a language-independent representation of the
instances of entities, events, time-expressions and their relations in RDF-TRiG, as shown
schematically in Figure 51:
Figure 51: Semantic interoperability across NAF representations
Although the first step is specific for each language, the representation of the output in
NAF is the same for all the 4 NewsReader languages. Differences are restricted to the tokens
that make up the mentions of things. In the case of entities, we normalize these tokens by
using the DBpedia URIs, where Dutch, Spanish and Italian DBpedia URIs are mapped to
English URIs. In the case of time-expressions, such as Thursday and yesterday, we interpret
them to ISO dates. Events are however still represented by the tokens of the predicates.
To make the events interoperable across languages, we use the GlobalWordnetGrid to
map each predicate to concepts in the InterLingualIndex (ILI, (Vossen et al., 2016; Bond
et al., 2016)). As explained in section 2, coreference and therefore identity of events is
defined as a function of the identity of its components: the similarity of the action, the
participants, the place and time. The latter 3 are semantically interoperable across the
NAF representations in the four languages. The events can be compared through their
ILI mapping based on the wordnets in their language, as we have been doing so far for
English as well. In Figures 52 and 53, we show two fragments in English and Spanish
respectively for the same Wikinews article with the representation of an entity, a predicate
with roles and a time-expression. The English representation of the entity Boeing 787 has
an external reference to the English DBpedia, where the Spanish entity maps to both the
Spanish and English DBpedia entries. By taking the English reference for Spanish, we can
map both entity references to each other across the documents. We can see the same for the
time-expressions in both languages that are normalized to the same value: 2005-01-29. In
February 1, 2016
122/148
case of the predicates, both examples show external references to various ontologies among
which FrameNet and ESO and also to WordNet ILI-concepts. The predicates of these two
examples can therefore be mapped into each other through the reference ili-30-02207206-v.
A similar example is shown in Figure 54 for Dutch NAF where the verb kopen is mapped
to the same ILI record and in 55 for the Italian NAF in which acquisterá is mapped to the
same concept.
To compare the capacity of the different language pipelines to extract the same information, we use the NewsReader MEANTIME corpus (van Erp et al. (2015)). The
MEANTIME corpus consists of 120 English news articles taken from Wikinews on 4 topics:
Airbus, Apple, automotive industry (GM, Chrysler and Ford), stock market news. Each
topic has 30 articles. We translated the 120 English articles to Spanish, Dutch and Italian.
The texts in all languages have been annotated according to the same NewsReader annotation scheme. We processed the English, Spanish and Dutch Wikinews articles through the
respective pipelines, as described in Agerri et al. (2015). After generating the NAF files, we
applied the NAF2SEM process to each language data set. We made a small modification
to the original algorithm that compares events on the basis of their lemmas. To be able
to compare events across languages, we used the ILI-concept references of the predicates
to represent events. Spanish, Italian and Dutch events can thus be mapped to English
events provided they were linked to the same concept.41 We generated RDF-TRiG files
for the 4 Wikinews corpora for English, Spanish, Italian and Dutch. We implemented a
triple counter to compare the data created for each language data set. The triple counter
generates statistics for the following information:
1. Entities with a DBpedia URI with the number of mentions
2. Entities without a DBpedia URI with the number of mentions
3. Events represented by their ILI-reference with the number of mentions
4. Frequency count of the roles relating events and entities
5. Frequency counts of the triples relating events with ILI-references and entities through
these roles
We compared the Spanish, Italian and Dutch data against the English data, calculating the
coverage of the English mentions by the other languages. Note that we cannot calculate
recall through this method. If the software detects a DBpedia entity E in English in
sentence A and not in the translated Spanish sentence T(A) but the same entity E is
detected in another Spanish sentence T(B) while it is not detected in the English sentence
B, then our current method counts 1 mention in English and 1 mention in Spanish with
100% coverage. This may still count as coverage but not as recall because we do not know
41
This also means that we lump together event-instances that are normally kept separate because they
do not share the same time and participant. The comparison therefore does not tell us anything about
the precise cross-lingual event-coreference but merely gives a rough indication
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
123/148
<e n t i t y i d =”e 1 ” t y p e=”MISC”>
<!−−B o e i n g 787−−>
<t a r g e t i d =”t 8”/>< t a r g e t i d =”t 9 ”/>
<e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / DBpedia . o r g / r e s o u r c e / B o e i n g 7 8 7 D r e a m l i n e r ”
r e f t y p e =”en ” r e s o u r c e =” s p o t l i g h t v 1 ””/>
</ e x t e r n a l R e f >
</ e n t i t y >

<!−−p u r c h a s e −−>
<e x t e r n a l R e f r e f e r e n c e =” p u r c h a s e . 0 1 ” r e s o u r c e =”PropBank”/>
<e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 ” r e s o u r c e =”VerbNet”/>
<e x t e r n a l R e f r e f e r e n c e =”o b t a i n −13.5.2 −1” r e s o u r c e =”VerbNet”/>
<e x t e r n a l R e f r e f e r e n c e =”Commerce buy ” r e s o u r c e =”FrameNet”/>
<e x t e r n a l R e f r e f e r e n c e =”Buying ” r e s o u r c e =”ESO”/>
<e x t e r n a l R e f r e f e r e n c e =” c o n t e x t u a l ” r e s o u r c e =”EventType”/>
<e x t e r n a l R e f r e f e r e n c e =” i l i −30−02207206−v ” r e s o u r c e =”WordNet”/>
< r o l e i d =” r l 9 ” semRole=”A0”>
<!−− O f f i c i a l s from t h e P e o p l e ’ s R e p u b l i c o f China−−>
<e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 @Agent” r e s o u r c e =”VerbNet”/>
<e x t e r n a l R e f r e f e r e n c e =”Commerce buy@Buyer ” r e s o u r c e =”FrameNet”/>
<e x t e r n a l R e f r e f e r e n c e =” p u r c h a s e . 0 1 @0” r e s o u r c e =”PropBank”/>
<e x t e r n a l R e f r e f e r e n c e =”B u y i n g @ p o s s e s s i o n −o w n e r 2 ” r e s o u r c e =”ESO”/>
 <t a r g e t head=” y e s ” i d =”t 1 7 ”/ >... < t a r g e t i d =”t 2 4 ”/>
</ r o l e >
< r o l e i d =” r l 1 0 ” semRole=”A1”>
<!−−60 B o e i n g 787 D r e a m l i n e r a i r c r a f t −−>
<e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 @Theme” r e s o u r c e =”VerbNet”/>
<e x t e r n a l R e f r e f e r e n c e =”Commerce buy@Goods ” r e s o u r c e =”FrameNet”/>
<t a r g e t i d =”t 2 9 ” / > . . . <t a r g e t head=” y e s ” i d =”t 3 3 ”/>
</ r o l e >
< r o l e i d =” r l 1 1 ” semRole=”AM−LOC”>
<!−− i n a d e a l worth US$ 7 . 2 bn−−>

<t a r g e t head=” y e s ” i d =”t 3 4 ”/ >... < t a r g e t i d =”t 4 0 ”/>

</ r o l e >
<!−−today−−>
<t a r g e t i d =”w183”/> 
</timex3>
Figure 52: Example of representation of entities, events and roles from an English Wikinews
fragment
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
124/148

<!−−B o e i n g 787−−>
<t a r g e t i d =”t 8 ”/>
<t a r g e t i d =”t 9 ”/>

<e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / e s . DBpedia . o r g / r e s o u r c e / B o e i n g 7 8 7 ”
r e f t y p e =” e s ” r e s o u r c e =” s p o t l i g h t v 1 ”>
<e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / DBpedia . o r g / r e s o u r c e / B o e i n g 7 8 7 D r e a m l i n e r ”
r e f t y p e =”en ” r e s o u r c e =” w i k i p e d i a −db−esEn”/>
</ e n t i t y >
<!−−comprar−−>
<e x t e r n a l R e f r e f e r e n c e =”comprar . 1 . b e n e f a c t i v e ” r e s o u r c e =”AnCora”/>
<e x t e r n a l R e f r e f e r e n c e =”g e t − 1 3 . 5 . 1 ” r e s o u r c e =”VerbNet”/>
<e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 ” r e s o u r c e =”VerbNet”/>
<e x t e r n a l R e f r e f e r e n c e =”o b t a i n −13.5.2 −1” r e s o u r c e =”VerbNet”/>
<e x t e r n a l R e f r e f e r e n c e =”Commerce buy ” r e s o u r c e =”FrameNet”/>
<e x t e r n a l R e f r e f e r e n c e =”buy . 0 1 ” r e s o u r c e =”PropBank”/>
<e x t e r n a l R e f r e f e r e n c e =”Buying ” r e s o u r c e =”ESO”/>
<e x t e r n a l R e f r e f e r e n c e =” c o n t e x t u a l ” r e s o u r c e =”EventType”/>

<t a r g e t i d =”t 3 4 ”/>

< r o l e i d =” r l 5 ” semRole=”a r g 1”>
<!−−60 a v i o n e s B o e i n g 787 D r e a m l i n e r −−>
<e x t e r n a l R e f r e f e r e n c e =”g e t − 1 3 . 5 . 1 @Theme” r e s o u r c e =”VerbNet”/>
<e x t e r n a l R e f r e f e r e n c e =”o b t a i n − 1 3 . 5 . 2 @Theme” r e s o u r c e =”VerbNet”/>
<e x t e r n a l R e f r e f e r e n c e =”Commerce buy@Goods ” r e s o u r c e =”FrameNet”/>
<e x t e r n a l R e f r e f e r e n c e =”buy . 0 1 @1” r e s o u r c e =”PropBank”/>

<t a r g e t i d =”t 3 5 ”/>

</ r o l e >
< r o l e i d =” r l 6 ” semRole=”argM”>
<!−−en un a c u e r d o p o r v a l o r de 7 . 2 0 0 m i l l o n e s de US\$−−>


</ r o l e >
<t i m e x 3 i d =”t x 2 ” t y p e=”DATE” v a l u e =”2005−01−29”>
<!−−29 de e n e r o d e l 2005−−>

<t a r g e t i d =”w20 ” / > . . . <t a r g e t i d =”w24”/>

</timex3>
Figure 53: Example of representation of entities, events and roles from a Spanish Wikinews
fragment
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
125/148

<!−−A i r b u s A320−−>
<t a r g e t i d =” t 1 3 ”/>
<t a r g e t i d =” t 1 4 ”/>

<e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / n l . DBpedia . o r g / r e s o u r c e / A i r b u s A 3 2 0 ”
r e f t y p e =” n l ” r e s o u r c e =” s p o t l i g h t v 1 ”>
<e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / DBpedia . o r g / r e s o u r c e / A i r b u s A 3 2 0 f a m i l y ”
r e f t y p e =”en ” r e s o u r c e =” w i k i p e d i a −db−nlEn”/>
</ e n t i t y >
<!−−kopen−−>
<e x t e r n a l R e f r e f e r e n c e =”r v −4101” r e s o u r c e =”C o r n e t t o ”/>

<t a r g e t i d =” t 1 7 ”/>

< r o l e i d =”r 8 ” semRole=”Arg1”>
<!−− t w i n t i g A i r b u s A320 p a s s a g i e r s v l i e g t u i g e n −−>

<t a r g e t i d =” t 1 2 ” / > . . . <t a r g e t head=” y e s ” i d =” t 1 5 ”/>

</ r o l e >
< r o l e i d =”r 1 0 ” semRole=”ArgM−PNC”>
<!−−v o o r een−−>

<t a r g e t head=” y e s ” i d =” t 1 8 ”/>
<t a r g e t i d =” t 1 9 ”/>

</ r o l e >
<!−−donderdag−−>

<t a r g e t i d =”w5”/>

</timex3>
Figure 54: Example of representation of entities, events and roles from a Dutch Wikinews
fragment
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
126/148
<e n t i t y i d =”e 4 ” t y p e=”ORGANIZATION”>
<!−−A320−−>

<t a r g e t i d =”t 9 ” />

<e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t v 1 ”
r e f e r e n c e =”h t t p : / i t . d b p e d i a . o r g / r e s o u r c e / A i r b u s A 3 2 0 f a m i l y ”
c o n f i d e n c e =”1.0” r e f t y p e =” i t ” s o u r c e =” i t ”>
<e x t e r n a l R e f r e s o u r c e =” w i k i p e d i a −db−i t E n ”
r e f e r e n c e =”h t t p : / d b p e d i a . o r g / r e s o u r c e / A i r b u s A 3 2 0 f a m i l y ”
c o n f i d e n c e =”1.0” r e f t y p e =”en ” s o u r c e =” i t ” />
</ e n t i t y >
<!−−t 4 a c q u i s t e r \ ’ { a } : A0 [ t 1 China ] A1 [ t 5 v e n t i ]−−>
<!−− a c q u i s t e r \ ’ { a}−−>

<t a r g e t i d =”t 4 ” />

<e x t e r n a l R e f r e s o u r c e =”EventType ” r e f e r e n c e =”OCCURRENCE” />
<e x t e r n a l R e f r e s o u r c e =”PropBank ” r e f e r e n c e =” a c q u i s t a r e . 0 1 ” />
<e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−00079018−n” />
<e x t e r n a l R e f r e s o u r c e =”WordNet” r e f e r e n c e =” i l i −30−02207206−v ” />
<!−−China E a s t e r n A i r l i n e s −−>

<t a r g e t i d =”t 1 ” head=” y e s ” />
<t a r g e t i d =”t 2 ” />
<t a r g e t i d =”t 3 ” />

</ r o l e >
<!−− v e n t i n u o v i j e t A i r b u s A320−−>

<t a r g e t i d =”t 5 ” />
<t a r g e t i d =”t 6 ” />
<t a r g e t i d =”t 7 ” head=” y e s ” />
<t a r g e t i d =”t 8 ” />
<t a r g e t i d =”t 9 ” />

</ r o l e >
<!−−18 g i u g n o 2009−−>

<t a r g e t i d =”w10” />
<t a r g e t i d =”w11” />
<t a r g e t i d =”w12” />

</timex3>
Figure 55: Example of representation of entities, events and roles from an Italian Wikinews
fragment
February 1, 2016
127/148
whether the Spanish mention in T(B) corresponds with the English mention in A.42 The
coverage scores we give below are thus just a rough approximation. A true comparison
requires a cross-lingual annotation that also aligns the mentions of entities and events with
respect to the same instances.
The NAF2SEM module is ignorant of the language of the NAF file. This means that
we can process English, Spanish, Italian and Dutch NAF files as if they are different
sources, just as we processed multiple English NAF files. The module will merge all
entities and events according to the implemented heuristics and generate a single RDFTRiG for all the sources. In the ideal case, the same data should be extracted across
the languages as would be extracted for any of the languages separately due to their
translation relation. Therefore, merging the NAF files across languages should result in
exactly the same numbers of entities, events and triples. Merging the data set can be seen
as an extreme test for the cross-lingual extraction and compatibility of the different NLPpipelines. Not all extracted information can be compared. Events that are not mapped to
WordNet concepts (ILI-records) are represented by their labels, which are different across
the languages. The same holds for so-called dark entities that are not mapped to DBpedia.
They are represented by their linguistic form which is usually different. For non-entities,
i.e. expressions not detected as entities but that play an important role in the event, string
matches across languages are very unlikely.
In Figures 56, we show the RDF-TRiG result of merging English, Spanish, Italian and
Dutch NAF files for some entities. The entity Airbus was found in NAF files for 3 languages
with 6 mentions in the English source, 7 mentions in the Spanish source and 4 mentions
in the Dutch source. The entity Airbus A380 on the other hand was only detected by the
Italian (7) and Dutch (21 mentions) pipelines. In Figure 57, we see events detected across
the languages. The first event, represented through the ILI ili-30-00634472-v, is matched
across all 4 languages, the other events across different subsets of languages. In the next
subsections, we show the statistics for the 4 corpora and 4 languages. We also provide
some statistics for the merging of data against the English results.
7.1
Crosslingual extraction of entities
In Table 33, we give the totals of DBpedia entities extracted for English and the other
languages: Spanish, Italian and Dutch. For English, we give the unique instances (I)
for each corpus and the mentions (M). For the other languages, we give the same but in
addition the overlapping mentions (O) and the macro (per document) and micro average
of coverage.
The English figures show that there is some variation across the 4 corpora. The
stock market corpus contains only few instances and mentions and the airbus and gm
corpus contain most instances and mentions. Ratios between instances and mentions differ
slightly across the corpora. Spanish has a similar amount and ratio for airbus but different
42
In some cases a single English sentence has been translated in more than one sentence in another
language
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
128/148
<h t t p : / DBpedia . o r g / r e s o u r c e / A i r b u s>
rdfs : label
” Airbus ” , ” Airbus ,” ;
g a f : denotedBy
nwr : dutch−w i k i n e w s / 1 8 1 6 A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h 1 5 b n#c h a r =31 ,37 ,
nwr : dutch−w i k i n e w s /816 A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h 1 5 b n#c h a r =564 ,570 ,
nwr : s p a n i s h −w i k i n e w s / 1 8 1 6 A i r b u s w i n s#c h a r =0 ,6 ,
nwr : s p a n i s h −w i k i e n e w s / 1 8 1 6 A i r b u s w i n s#c h a r =945 ,951 ,
nwr : e n g l i s h −w i k i e n e w s / A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h $ 1 5 b n#c h a r =1 ,7 ,
nwr : e n g l i s h −w i k i e n e w s / A i r b u s w i n s Q a t a r A i r w a y s o r d e r w o r t h $ 1 5 b n#c h a r =872 ,878 .
<h t t p : / d b p e d i a . o r g / r e s o u r c e / A i r b u s A 3 8 0>
rdfs : label
” A380 ” , ” A i r b u s A380 ” , ” A i r b u s 380” ;
g a f : denotedBy
nwr : i t a l i a n −w i k i n e w s /20583 − t e x t p r o . t x t . txp#c h a r =88 ,99 ,
nwr : i t a l i a n −w i k i n e w s /3828− t e x t p r o . t x t . txp#c h a r =112 ,123 ,
nwr : i t a l i a n −w i k i n e w s /3828− t e x t p r o . t x t . txp#c h a r =205 ,216 ,
nwr : i t a l i a n −w i k i n e w s /31769 − t e x t p r o . t x t . txp#c h a r =109 ,120 >
,
nwr : i t a l i a n −w i k i n e w s /23242 − t e x t p r o . t x t . txp#c h a r =17,28>
,
,
,
,
nwr : dutch−w i k i n e w s / 1 0 0 2 6 F i r s t A 3 8 0 e n t e r s c o m m e r c i a l s e r v i c e#c h a r =117 ,128 >
nwr : dutch−w i k i n e w s / 1 0 0 2 6 F i r s t A 3 8 0 e n t e r s c o m m e r c i a l s e r v i c e#c h a r =840 ,844 ,
nwr : dutch−w i k i n e w s / 1 0 0 2 6 F i r s t A 3 8 0 e n t e r s c o m m e r c i a l s e r v i c e#c h a r =954 ,958 ,
nwr : dutch−w i k i n e w s / 6 4 7 5 S i n g a p o r e A i r l i n e s t o b e c o m p e n s a t e d f o r A 3 8 0 d e l a y s#c h a r =53,57>
,
,
nwr : dutch−w i k i n e w s / 6 4 7 5 S i n g a p o r e A i r l i n e s t o b e c o m p e n s a t e d f o r A 3 8 0 d e l a y s#c h a r =190 ,194 >
nwr : dutch−w i k i n e w s / 6 4 7 5 S i n g a p o r e A i r l i n e s t o b e c o m p e n s a t e d f o r A 3 8 0 d e l a y s#c h a r =1193 ,1197 >
,
nwr : dutch−w i k i n e w s /555 B o e i n g u n v e i l s l o n g −r a n g e 7 7 7#c h a r =1742 ,1746 > ;
nwr : dutch−w i k i n e w s / 7 9 2 4 A 3 8 0 m a k e s m a i d e n f l i g h t t o U S#c h a r =48 ,59 ,
nwr : dutch−w i k i n e w s /260 A i r b u s l a u n c h e s w o r l d l a r g e s t p a s s e n g e r p l a n e#c h a r =785 ,789 ,
nwr : dutch−w i k i n e w s / 1 3 8 0 W o r l d l a r g e s t p a s s e n g e r a i r l i n e r m a k e s f i r s t f l i g h t#c h a r =84 ,88 ,
nwr : dutch−w i k i n e w s / 1 3 8 0 W o r l d l a r g e s t p a s s e n g e r a i r l i n e r m a k e s f i r s t f l i g h t#c h a r =166 ,177 ,
,
nwr : dutch−w i k i n e w s / 1 3 8 0 W o r l d l a r g e s t p a s s e n g e r a i r l i n e r m a k e s f i r s t f l i g h t#c h a r =841 ,845 >
nwr : dutch−w i k i n e w s / 1 3 8 0 W o r l d l a r g e s t p a s s e n g e r a i r l i n e r m a k e s f i r s t f l i g h t#c h a r =1037 ,1041 >
,
,
nwr : dutch−w i k i n e w s / 8 9 3 5 B o e i n g u n v e i l s n e w 7 8 7 D r e a m l i n e r#c h a r =1709 ,1713 >
nwr : dutch−w i k i n e w s / 2 0 0 7 / 7 / 9 / 8 9 3 5 B o e i n g u n v e i l s n e w 7 8 7 D r e a m l i n e r#c h a r =1871 ,1882 >
,
,
nwr : dutch−w i k i n e w s / 1 0 0 2 1 F i r s t A i r b u s A 3 8 0 d e l i v e r e d#c h a r =52,56>
nwr : dutch−w i k i n e w s / 1 0 0 2 1 F i r s t A i r b u s A 3 8 0 d e l i v e r e d#c h a r =617 ,621 >
,
nwr : dutch−w i k i n e w s / 3 2 3 5 E n g i n e t r o u b l e s d e l a y A i r b u s s u p e r j u m b o t o u r#c h a r =624 ,634 >
,
nwr : dutch−w i k i n e w s / 7 7 4 2 A i r b u s a n n o u n c e s j o b c u t s o f 1 0 ,000# c h a r =1403 ,1407 >
,
skos : prefLabel
” A380 ” , ” A i r b u s A380 ” , ” A i r b u s 380” .
<h t t p : / d b p e d i a . o r g / r e s o u r c e / White House>
rdfs : label
” Witte H u i s ” , ” Casa B i a n c a ” ;
g a f : denotedBy
,
,
,
,
nwr : dutch−w i k i n e w s / 1 4 0 8 3 B a r a c k O b a m a p r e s e n t s r e s c u e p l a n a f t e r G M d e c l a r a t i o n o f b a n k r u p t c y#
c h a r =383 ,393 > ,
nwr : dutch−w i k i n e w s / 1 3 1 4 3 W h i t e H o u s e c o n s i d e r i n g a u t o r e s c u e p l a n#c h a r =0,10>
,
nwr : dutch−w i k i n e w s / 1 3 1 4 3 W h i t e H o u s e c o n s i d e r i n g a u t o r e s c u e p l a n#c h a r =175 ,185 >
,
,
;
skos : prefLabel
” Witte H u i s ” , ” Casa B i a n c a ” .
Figure 56: RDF-TRiG representation of entities merged from English, Spanish, Italian and
Dutch Wikinews
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
129/148
i l i : i l i −30−00634472−v
a
sem : Event , n w r o n t o l o g y : s o u r c e E v e n t , f n : C o m i n g t o b e l i e v e , f n : R e a s o n i n g ;
rdfs : label
” c o n c l u i r ” , ” concluderen ” , ” concludere ” , ” reason ” , ” conclude ” ;
g a f : denotedBy
nwr : s p a n i s h −w i k i n e w s / 1 2 0 4 7 G o v e r n m e n t A c c o u n t a b i l i t y . t x t#c h a r =875 ,883 >
,
,
,
,
nwr : dutch−w i k i n e w s / 1 2 0 4 7 G o v e r n m e n t A c c o u n t a b i l i t y O f f i c e r e q u e s t s r e r u n o f U S A i r F o r c e t a n k e r b i d#
c h a r =850 ,863 >
,
nwr : e n g l i s h −w i k i n e w s / I n d o n e s i a ’ s t r a n s p o r t m i n i s t e r t e l l s a i r l i n e s n o t t o b u y E u r o p e a n a i r c r a f t d u e t o E U
b a n#c h a r =1801 ,1807 >
,
nwr : e n g l i s h −w i k i n e w s / G o v e r n m e n t A c c o u n t a b i l i t y O f f i c e r e q u e s t s r e r u n o f U S A i r F o r c e t a n k e r b i d#
c h a r =724 ,732 > .
i l i : i l i −30−01656788−v
a
sem : Event , f n : B u i l d i n g , f n : C r e a t i n g , n w r o n t o l o g y : c o n t e x t u a l E v e n t
;
rdfs : label
” ensamblar ” , ” assemblare ” , ” s a m e n s t e l l e n ” ;
g a f : denotedBy
nwr : s p a n i s h −w i k i n e w s / 1 2 0 4 7 G o v e r n m e n t A c c o u n t a b i l i t y . t x t#c h a r =481 ,490 >
,
nwr : s p a n i s h −w i k i n e w s / 1 1 1 6 9 Northrop Grumman . t x t#c h a r =1845 ,1856 > ,
,
,
nwr : dutch−w i k i n e w s / 5 1 3 5 B o e i n g d e l i v e r s f i n a l 7 1 7 t o A i r T r a n , e n d i n g D o u g l a s e r a#c h a r =487 ,494 >
.
i l i : i l i −30−02680814−v
a
sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , f n : A c t i v i t y s t o p , e s o : S t o p p i n g A n A c t i v i t y ,
fn : P r o c e s s s t o p , fn : Quitting , eso : LeavingAnOrganization , fn : Halt ;
rdfs : label
” c e a s e ” , ” d i s c o n t i n u e ” , ” o p h e f f e n ” , ” c e s a r ” , ” op doeken ” ;
g a f : denotedBy
nwr : s p a n i s h −w i k i n e w s / 1 4 0 8 4 C E O o f G M o u t l i n e s p l a n . t x t#c h a r =3049 ,3054 ,
nwr : e n g l i s h −w i k i n e w s / C E O o f G M o u t l i n e s p l a n f o r \%22New GM\%22 a f t e r a u t o c o m p a n y d e c l a r e d
b a n k r u p t c y#c h a r =2795 ,2800 ,
nwr : e n g l i s h −w i k i n e w s / F o r d T a u r u s t o b e r e v i v e d#c h a r =930 ,942 ,
nwr : e n g l i s h −w i k i n e w s / P e n s k e A u t o s e l e c t e d t o b u y G e n e r a l M o t o r s ’ S a t u r n u n i t#c h a r =444 ,456 ,
nwr : dutch−w i k i n e w s / 3 9 7 1 F o r d M o t o r C o m p a n y c u t t i n g 3 0 , 0 0 0 j o b s b y 2 0 1 2#c h a r =1309 ,1318 ,
nwr : dutch−w i k i n e w s / 1 3 7 7 4 G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a#
char =1784 ,1793.
i l i : i l i −30−00156601−v−and− i l i −30−00153263−v
a
sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , f n : C h a n g e p o s i t i o n o n a s c a l e ,
f n : C a u s e c h a n g e o f p o s i t i o n o n a s c a l e , e s o : I n c r e a s i n g , e s o : QuantityChange ;
rdfs : label
” aumentare ” , ” i n c r e a s e ” , ” i n c r e m e n t a r ” ;
g a f : denotedBy
nwr : i t a l i a n −w i k i n e w s /12718 − t e x t p r o . t x t . txp#c h a r =1095 ,1104 ,
nwr : s p a n i s h −w i k i n e w s / 1 2 6 6 7 M a r k e t s r a l l y a s w o r l d c e n t r a l b a n k s i n f u s e c a s h . t x t#c h a r =912 ,923 ,
nwr : e n g l i s h −w i k i n e w s / S h a r e s w o r l d w i d e s u r g e d u e t o U S g o v e r n m e n t p l a n#c h a r =129 ,138 ,
nwr : e n g l i s h −w i k i n e w s / S h a r e s w o r l d w i d e s u r g e d u e t o U S g o v e r n m e n t p l a n#c h a r =503 ,511 ,
nwr : e n g l i s h −w i k i n e w s / M a r k e t s d o w n a c r o s s t h e w o r l d ; D o w J o n e s f a l l s b e l o w 9 ,000# c h a r =1356 ,1364
nwr : e n g l i s h −w i k i n e w s / B a n k o f A m e r i c a r e p o r t s l o s s e s o f o v e r U S $ 2 . 2 b i l l i o n#c h a r =244 ,253 ,
nwr : e n g l i s h −w i k i n e w s / B a n k o f A m e r i c a r e p o r t s l o s s e s o f o v e r U S $ 2 . 2 b i l l i o n#c h a r =816 ,824 ,
nwr : e n g l i s h −w i k i n e w s / S t o c k m a r k e t s w o r l d w i d e f a l l d r a m a t i c a l l y#c h a r =478 ,486 ,
nwr : e n g l i s h −w i k i n e w s / S t o c k m a r k e t s w o r l d w i d e f a l l d r a m a t i c a l l y#c h a r =1574 ,1582 ,
nwr : e n g l i s h −w i k i n e w s / U S s t o c k m a r k e t s h a v e t h e i r b e s t w e e k s i n c e N o v e m b e r#c h a r =446 ,454 ,
nwr : e n g l i s h −w i k i n e w s / U S s t o c k m a r k e t s h a v e t h e i r b e s t w e e k s i n c e N o v e m b e r#c h a r =1266 ,1275 ,
nwr : e n g l i s h −w i k i n e w s / R u s s i a n s t o c k m a r k e t s s u s p e n d e d a m i d m a r k e t t u r m o i l#c h a r =688 ,696 ,
nwr : e n g l i s h −w i k i n e w s / R u s s i a n s t o c k m a r k e t s s u s p e n d e d a m i d m a r k e t t u r m o i l#c h a r =1046 ,1054 ,
nwr : e n g l i s h −w i k i n e w s / W o r l d w i d e m a r k e t s f a l l p r e c i p i t o u s l y#c h a r =461 ,469 ,
nwr : e n g l i s h −w i k i n e w s / W o r l d w i d e m a r k e t s f a l l p r e c i p i t o u s l y#c h a r =800 ,809 ,
nwr : e n g l i s h −w i k i n e w s / M a r k e t s r a l l y a s w o r l d ’ s c e n t r a l b a n k s i n f u s e c a s h#c h a r = 7 1 7 , 7 2 5 .
,
i l i : i l i −30−01128193−v
a
sem : Event , n w r o n t o l o g y : c o n t e x t u a l E v e n t , i l i : i 3 9 7 0 2 , f n : P r o t e c t i n g
;
rdfs : label
” p r o t e c t i o n ” , ” p r o t e c t ” , ” p r o t e g e r ” , ” beschermen ” ;
g a f : denotedBy
nwr : s p a n i s h −w i k i n e w s / 1 3 7 7 4 G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a . t x t#
c h a r =411 ,419 ,
nwr : e n g l i s h −w i k i n e w s / C E O o f G M o u t l i n e s p l a n f o r \%22New GM\%22 a f t e r a u t o c o m p a n y d e c l a r e d
b a n k r u p t c y#c h a r =252 ,262 ,
nwr : e n g l i s h −w i k i n e w s / U S a u t o m a k e r G M r e p o r t s l o s s e s o f $ 6 b i l l i o n#c h a r =841 ,851 ,
nwr : e n g l i s h −w i k i n e w s / G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a#c h a r =384 ,391 ,
nwr : e n g l i s h −w i k i n e w s / G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a#c h a r =2545 ,2555
nwr : e n g l i s h −w i k i n e w s / B a r a c k O b a m a p r e s e n t s r e s c u e p l a n a f t e r G M d e c l a r a t i o n o f b a n k r u p t c y#c h a r =326 ,336 ,
nwr : e n g l i s h −w i k i n e w s /U . S . m a n u f a c t u r e r G e n e r a l M o t o r s s e e k s b a n k r u p t c y p r o t e c t i o n#c h a r =50 ,60 ,
nwr : e n g l i s h −w i k i n e w s /U . S . m a n u f a c t u r e r G e n e r a l M o t o r s s e e k s b a n k r u p t c y p r o t e c t i o n#c h a r =171 ,181 ,
nwr : e n g l i s h −w i k i n e w s / P e n s k e A u t o s e l e c t e d t o b u y G e n e r a l M o t o r s ’ S a t u r n u n i t#c h a r =157 ,167 ,
nwr : dutch−w i k i n e w s / 1 3 7 7 4 G M a n d C h r y s l e r r e c e i v e C a n a d i a n l o a n s a m i d U S r e s t r u c t u r i n g u l t i m a t a#
c h a r =416 ,426 >
.
Figure 57: RDF-TRiG representation of events merged from English, Spanish, Italian and
Dutch Wikinews
February 1, 2016
,
130/148
numbers for the other data sets, although also here the stock market has least instances
and mentions. Overlap is highest for airbus and apple, up to 60%, 10 points less for gm
and very low for the stock market. If we look at the Italian and Dutch results, we see that
they perform very similar but for all data sets lower than Spanish, in terms of instances,
mentions and coverage, except for the stock market. When averaged over the data sets, the
coverage across the languages is very close. This means that the pipelines are reasonably
compatible and interoperable across the languages for entity detection and linking.
Table 33: DBpedia entities extracted for English, Spanish, Italian and Dutch
Wikinews with proportion of coverage, measured as macro and micro coverage.
I=instances, M=mentions, O=overlap, maC=macro-average over all document results,
miC=microAverage over all mentions
airbus
apple
gm
stock
Total
English
I
M
157
795
96
680
118
757
5
61
376
2293
I
142
124
93
12
371
M
756
644
627
42
2069
Spanish
O
maC
489
51.6
424
52.1
393
35.7
2
2.8
1308
35.5
miC
61.5
62.4
51.9
3.3
44.8
I
110
91
76
77
354
M
446
490
369
202
1507
Italian
O
352
344
244
23
963
maC
32.8
31.7
24.1
60.0
37.1
miC
44.3
50.6
32.2
37.7
41.2
I
121
91
82
100
394
M
557
445
540
380
1922
Dutch
O
360
321
337
23
1041
maC
35.6
34.6
31.0
60.0
40.3
miC
45.3
47.2
44.5
37.7
43.7
The Tables 34, 35, 36, 37 show the top 15 entities most frequent in English with the
number in Spanish, Italian and Dutch for all 4 corpora. For each entity, we show the
number of mentions and the proportion of English mentions covered. If there are more
mentions in Spanish, Italian or Dutch, the coverage is maximized to 100%. There are a
two interesting observations to be made. First of all, United States dollar with 35, 16, 59
and 36 mentions in English across the data sets, turned out to be a systematic error in
the English pipeline that is not mirrored by the other languages. The English pipeline
erroneously linked mentions of the US to the dollar instead of the country. The second
observation relates to the granularity of the mapping. For example in the case of the
airbus data, Boeing is the most frequent entity in all 4 languages. The more specific entity
Boeing Commercial Airplanes is however only detected in English and not in any of the
other languages. This is due to the fact that the mappings across Wikipedia from the
other language to English are at a more coarse-grained level. The example in Figure 58
shows that this is partly due to the absence of the specific link in the DBpedias of the
specific languages (Italian and Spanish) or to the absence of a link from the specific page
in a language to English (the Dutch example).
7.2
Crosslingual extraction of events
As explained above, we represent events through the ILI-concepts that are associated with
their lemmas. This approximates the representation of the concept. Furthermore in some
cases, more than one concept is assigned to a single lemma. To compare such lists of
concepts, we checked if there was at least one intersecting ILI-concept across events to
decide on a match.
We can see in Table 38 that there is a broader set of instances (I) than for entities.
The proportions of matched event mentions from Spanish, Italian and Dutch to English
February 1, 2016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
131/148
<e n t i t y i d =”e 1 4 ” t y p e=”ORGANIZATION”>
<!−−B o e i n g Commercial A i r p l a n e s −−>

<t a r g e t i d =”t 2 3 3 ” />
<t a r g e t i d =”t 2 3 4 ” />
<t a r g e t i d =”t 2 3 5 ” />

<e x t e r n a l R e f r e s o u r c e =” s p o t l i g h t v 1 ” r e f e r e n c e =”h t t p : / / i t . d b p e d i a . o r g / r e s o u r c e / B o e i n g ”
c o n f i d e n c e =”1.0” r e f t y p e =” i t ” s o u r c e =” i t ”>
<e x t e r n a l R e f r e s o u r c e =” w i k i p e d i a −db−i t E n ” r e f e r e n c e =”h t t p : / / d b p e d i a . o r g / r e s o u r c e / B o e i n g ”
c o n f i d e n c e =”1.0” r e f t y p e =”en ” s o u r c e =” i t ” />
</ e n t i t y >
<e n t i t y i d =”e 2 1 ” t y p e=”ORG”>

<t a r g e t i d =”t 2 2 9 ”/>
<t a r g e t i d =”t 2 3 0 ”/>
<t a r g e t i d =”t 2 3 1 ”/>

<e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / / e s . d b p e d i a . o r g / r e s o u r c e / B o e i n g ” r e f t y p e =” e s ”
r e s o u r c e =” s p o t l i g h t v 1 ” s o u r c e =” e s ”>
<e x t e r n a l R e f c o n f i d e n c e =”1.0” r e f e r e n c e =”h t t p : / / d b p e d i a . o r g / r e s o u r c e / B o e i n g ” r e f t y p e =”en ”
r e s o u r c e =” w i k i p e d i a −db−esEn ” s o u r c e =” e s ”/>
</ e n t i t y >
<e n t i t y i d =”e 2 2 ” t y p e=”ORG”>

<t a r g e t i d =” t 2 1 0 ”/>
<t a r g e t i d =” t 2 1 1 ”/>
<t a r g e t i d =” t 2 1 2 ”/>

<e x t e r n a l R e f c o n f i d e n c e = ” 0 . 9 9 9 9 9 9 6 4 ” r e f e r e n c e =”h t t p : / / n l . d b p e d i a . o r g / r e s o u r c e / B o e i n g C o m m e r c i a l A i r p l a n e s ”
r e f t y p e =” n l ” r e s o u r c e =” s p o t l i g h t v 1 ” s o u r c e =” n l ”/>
</ e n t i t y >
Figure 58: Cross-lingual entity linking
February 1, 2016
132/148
Table 34: DBpedia entities in the Wikinews Airbus corpus most frequent in English with
Spanish, Italian and Dutch frequencies
Airbus
Boeing
Airbus
United States dollar
European Union
Boeing Commercial Airplanes
Boeing 787 Dreamliner
United States Air Force
Singapore
France
Airbus A320 family
Ryanair
Aeroflot
Government Accountability Office
Aer Lingus
Boeing 747
English
131
85
35
34
33
29
29
18
16
15
14
13
12
12
11
Spanish
136 100.00
116 100.00
0
0.00
17
50.00
0
0.00
12
41.38
20
68.97
8
44.44
14
87.50
3
20.00
12
85.71
15 100.00
0
0.00
9
75.00
0
0.00
Italian
103 78.63
75 88.24
0
0.00
12 35.29
0
0.00
3 10.34
4 13.79
9 50.00
10 62.50
3 20.00
10 71.43
9 69.23
1
8.33
11 91.67
2 18.18
Dutch
78 59.54
74 87.06
0
0.00
20 58.82
0
0.00
10 34.48
18 62.07
8 44.44
10 62.50
4 26.67
10 71.43
8 61.54
0
0.00
9 75.00
0
0.00
Table 35: DBpedia entities in the Wikinews Apple corpus most frequent in English with
Apple
Apple Inc.
Steve Jobs
Steve Waugh
The Beatles
Intel
Cisco Systems
Microsoft
James Cook
Mac OS X Lion
United Kingdom
Motorola
Software development kit
IBM
Apple Worldwide Developers Conference
English
312
49
27
22
16
16
14
12
10
10
8
8
8
8
7
Spanish
240
76.92
21
42.86
0
0.00
9
40.91
0
0.00
11
68.75
9
64.29
3
25.00
4
40.00
9
90.00
6
75.00
4
50.00
4
50.00
10 100.00
5
71.43
Italian
218 69.87
35 71.43
0
0.00
2
9.09
0
0.00
10 62.50
7 50.00
0
0.00
3 30.00
2 20.00
3 37.50
1 12.50
0
0.00
6 75.00
0
0.00
Dutch
179
57.37
31
63.27
0
0.00
7
31.82
1
6.25
4
25.00
8
57.14
4
33.33
1
10.00
0
0.00
8 100.00
1
12.50
5
62.50
5
62.50
2
28.57
February 1, 2016
133/148
Table 36: DBpedia entities in the Wikinews GM, Chrysler, Ford corpus most frequent in
English with Spanish, Italian and Dutch frequencies
GM
General Motors
Ford Motor Company
Chrysler
Fiat
Ford Motor Company of Australia
United Auto Workers
Daimler AG
Barack Obama
United States
Henderson, Nevada
Federal government of the United States
Canada
Toyota
Clarence Thomas
English
155
81
76
59
30
22
21
21
16
15
14
13
12
11
9
Spanish
143
92.26
71
87.65
31
40.79
0
0.00
21
70.00
0
0.00
0
0.00
3
14.29
8
50.00
70 100.00
0
0.00
0
0.00
10
83.33
11 100.00
0
0.00
Italian
107
69.03
49
60.49
0
0.00
0
0.00
0
0.00
0
0.00
0
0.00
3
14.29
3
18.75
54 100.00
0
0.00
0
0.00
6
50.00
6
54.55
0
0.00
Dutch
119
76.77
50
61.73
43
56.58
0
0.00
0
0.00
0
0.00
0
0.00
13
61.90
10
62.50
107 100.00
0
0.00
0
0.00
21 100.00
8
72.73
0
0.00
Table 37: DBpedia entities in the Wikinews stock market corpus most frequent in English
with Spanish, Italian and Dutch frequencies
Stock
United States
United Kingdom
FTSE 100 Index
Andy Kaufman
Washington, D.C.
Buenos Aires
United States House of Representatives
Dow Jones Industrial Average
Reuters
JPMorgan Chase
Afghanistan
Ben Bernanke
State (polity)
France
English
36
14
5
4
2
0
0
0
0
0
0
0
0
0
0
Spanish
0
0.00
2 14.29
0
0.00
0
0.00
0
0.00
0
0.00
0
0.00
1
0.00
0
0.00
0
0.00
0
0.00
0
0.00
0
0.00
0
0.00
0
0.00
0
29
11
4
0
2
0
0
9
2
1
1
0
1
0
Italian
0.00
100.00
100.00
100.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0
90
11
12
0
1
1
1
29
1
0
1
1
0
3
Dutch
0.00
100.00
100.00
100.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
February 1, 2016
134/148
are just a little bit lower than for DBpedia entities but not much. This is promising since
matching events is more difficult than matching entities. We see again that Dutch scores
a bit lower than Spanish and Italian. This is due to the fact that the Spanish and Italian
wordnets are a direct extension of the English wordnet and have been developed for many
years, whereas the Open Dutch wordnet is recently built and partly built independently.
Across the different data sets, the results are very similar.
Table 38: ILI-based events extracted for English, Spanish, Italian and Dutch Wikinews with
proportion of coverage, measured as macro and micro coverage. I=instances, M=mentions,
O=overlap, maC=macro-average over all document results, miC=microAverage over all
mentions
airbus
apple
gm
stock
Total
English
I
M
365
848
342
1007
319
1140
283
673
1309
3668
I
166
152
142
140
600
M
484
476
387
325
1672
Spanish
O
217
242
257
163
879
maC
26.4
25.8
24.7
28.6
26.4
miC
25.6
24.0
22.5
24.2
24.1
I
535
500
504
450
1989
M
984
1090
1055
895
4024
Italian
O
248
202
209
163
822
maC
33.3
29.8
33.1
34.0
32.6
miC
29.3
20.1
18.3
24.2
23.0
I
199
170
170
147
686
M
483
498
622
362
1965
Dutch
O
164
170
192
92
618
maC
20.0
19.1
20.4
17.9
19.3
miC
19.3
16.9
16.8
13.7
16.7
In the Tables 39, 40, 41, 42 we show the ILI-based events that are most frequent in
English for the 4 corpora. If there is more than one ILI-record assigned, we only list
the synonyms for the first synset. For individual events, the results vary a lot across the
different languages. There does not appear to be any pattern in this. A typical case is
represented by ili-30-02207206-v[buy] in Table 42, which has a good match in Spanish,
only 1 in Italian and 0 in Dutch. The Dutch equivalent kopen is linked to a hypernym of
buy and the Italian equivalent acquistare is linked to another meaning.
Table 39: ILI-based events in the Wikinews Airbus corpus most frequent in English with
Airbus
ili-30-01438304-v[deliver]
ili-30-02204692-v[have]
ili-30-00764222-v;ili-30-02657219-v;ili-30-00805376-v[agree]
ili-30-00974367-v;ili-30-00975427-v[announce]
ili-30-02207206-v[buy]
ili-30-01653442-v[construct]
ili-30-00755745-v;ili-30-00719734-v[ask;expect]
ili-30-02413480-v;ili-30-02410855-v[work]
ili-30-00705227-v[be after]
ili-30-02257767-v;ili-30-00162688-v[interchange;replace]
ili-30-02244956-v;ili-30-02242464-v[deal;sell]
ili-30-00998399-v[record]
ili-30-02641957-v;ili-30-00459776-v[delay]
ili-30-01955984-v;ili-30-01957529-v;ili-30-02102398-v;ili-30-01847676-v[ride]
ili-30-01583142-v;ili-30-01654628-v[construct;build]
7.3
English
17
17
16
15
14
14
13
12
12
10
9
8
8
8
8
3
0
0
2
16
2
0
0
3
7
5
0
7
0
4
Spanish
17.65
0.00
0.00
13.33
100.00
14.29
0.00
0.00
25.00
70.00
55.56
0.00
87.50
0.00
50.00
7
3
1
3
3
0
1
1
0
0
1
1
5
0
6
Italian
41.18
17.65
6.25
20.00
21.43
0.00
7.69
8.33
0.00
0.00
11.11
12.50
62.50
0.00
75.00
9
0
0
14
4
0
0
2
0
0
5
0
0
0
0
Dutch
52.94
0.00
0.00
93.33
28.57
0.00
0.00
16.67
0.00
0.00
55.56
0.00
0.00
0.00
0.00
Crosslingual extraction of relations
Finally, we compared the actual triples extracted across the languages. The triples represent the actual statements, where we only consider triples where the ILI-based event is
the subject. Table 43 gives the predicates that are most frequent in the English data. We
limited ourselves here to the generic SEM predicates (hasActor, hasTime and hasPlace) as
well as the more specific temporal relations added in NewsReader and the most -frequent
February 1, 2016
135/148
Table 40: ILI-based events in the Wikinews Apple corpus most frequent in English with
Apple
ili-30-01224744-v;ili-30-01525666-v[control;function]
ili-30-00674607-v[choose]
ili-30-02204692-v[have]
ili-30-02421374-v[free]
ili-30-00721889-v;ili-30-02351010-v[price]
ili-30-02630189-v[feature]
ili-30-00933821-v[break]
ili-30-01642437-v[innovate]
ili-30-02735282-v;ili-30-02501278-v;ili-30-01486312-v[suit;adjudicate;case]
ili-30-00341917-v;ili-30-02743921-v;ili-30-01849221-v[come;come up]
ili-30-00802318-v[allow]
ili-30-00756338-v[claim]
ili-30-00515154-v[process]
English
64
48
35
34
34
20
19
16
15
15
15
14
12
12
12
0
1
2
0
30
22
0
0
0
0
0
6
10
0
0
Spanish
0.00
2.08
5.71
0.00
88.24
100.00
0.00
0.00
0.00
0.00
0.00
42.86
83.33
0.00
0.00
2
0
2
3
10
3
0
45
1
0
0
1
0
0
0
Italian
3.13
0.00
5.71
8.82
29.41
15.00
0.00
100.00
6.67
0.00
0.00
7.14
0.00
0.00
0.00
2
0
0
0
17
17
0
0
14
2
0
0
0
0
1
Dutch
3.13
0.00
0.00
0.00
50.00
85.00
0.00
0.00
93.33
13.33
0.00
0.00
0.00
0.00
8.33
Table 41: ILI-based events in the Wikinews GM, Chrysler, Ford corpus most frequent in
English with Spanish, Italian and Dutch frequencies
GM
ili-30-00674607-v;ili-30-00679389-v[choose]
ili-30-01621555-v;ili-30-01640207-v;ili-30-01753788-v;ili-30-01617192-v[create]
ili-30-00705227-v[be after]
ili-30-02204692-v[have]
ili-30-02511551-v[order]
ili-30-00561090-v[cut]
ili-30-02410175-v[keep on]
ili-30-02324182-v[lend]
ili-30-02547586-v[aid]
ili-30-00358431-v;ili-30-00354845-v[buy the farm;die]
ili-30-01182709-v;ili-30-02327200-v[provide;furnish]
ili-30-02613487-v;ili-30-02297142-v[offer up;proffer]
ili-30-02207206-v[buy]
English
153
60
36
32
26
25
24
24
23
18
16
15
12
12
11
2
13
1
8
0
0
45
19
0
0
4
0
4
7
8
Spanish
1.31
21.67
2.78
25.00
0.00
0.00
100.00
79.17
0.00
0.00
25.00
0.00
33.33
58.33
72.73
2
3
1
0
7
0
0
4
0
0
4
0
6
2
1
Italian
1.31
5.00
2.78
0.00
26.92
0.00
0.00
16.67
0.00
0.00
25.00
0.00
50.00
16.67
9.09
2
29
2
0
0
0
0
22
0
0
3
0
5
0
0
Dutch
1.31
48.33
5.56
0.00
0.00
0.00
0.00
91.67
0.00
0.00
18.75
0.00
41.67
0.00
0.00
Table 42: ILI-based events in the Wikinews stock market corpus most frequent in English
with Spanish and Dutch frequencies
Stock
ili-30-01307142-v;ili-30-00356649-v[even out;level off]
ili-30-02204692-v[have]
ili-30-00658052-v;ili-30-00660971-v[grade;rate]
ili-30-00153263-v;ili-30-00156601-v[increase]
ili-30-02324182-v[lend]
ili-30-02000868-v;ili-30-00589738-v;ili-30-02445925-v;ili-30-01998432-v[follow;be]
ili-30-01645601-v[cause]
ili-30-00721889-v;ili-30-02351010-v[price]
ili-30-00998399-v[record]
ili-30-02678438-v[concern]
ili-30-02259005-v;ili-30-02260085-v[swap;trade in]
ili-30-01778568-v;ili-30-01780434-v;ili-30-01780202-v[fear;dread]
ili-30-00352826-v]ili-30-01620854-v[end]
ili-30-02421374-v[free]
English
40
26
21
18
14
13
12
10
10
10
9
9
9
8
8
7
4
0
1
0
5
0
12
0
2
0
12
0
1
4
2
1
Spanish
10.00
0.00
4.76
0.00
35.71
0.00
100.00
0.00
20.00
0.00
100.00
0.00
11.11
50.00
25.00
14.29
1
0
0
0
3
0
3
1
1
0
6
0
0
6
2
0
Italian
2.50
0.00
0.00
0.00
21.43
0.00
25.00
10.00
10.00
0.00
66.67
0.00
0.00
75.00
25.00
0.00
5
0
0
0
2
0
13
0
0
0
0
0
0
3
1
0
Dutch
12.50
0.00
0.00
0.00
14.29
0.00
100.00
0.00
0.00
0.00
0.00
0.00
0.00
37.50
12.50
0.00
February 1, 2016
136/148
PropBank relations. The hasActor, hasTime and hasPlace generalize over the others. We
can see that Spanish scores a little better for hasActor than Italian and Dutch except for
the stock market data set where the Italian system has a coverage of almost 84 which
is twice as high as for the other data sets. In the case of airbus and apple, the Italian
pipeline scores high for A2 in comparison to the others but for stock market it is A0 and
A1 that score high. For Spanish, we see that results are more consistent but with a high
score for A1 in the apple data set. Dutch score lower on the actors overall except for A0
in the stock market data set. The Dutch pipeline is apparently very successful in recovering locations compared to the others, where the Italian pipeline is successful in recovering
temporal relations.
Table 43: Triple predicates that are most frequent in the English Wikinews corpus with
coverage in Spanish, Italian and Dutch
Role
Airbus
Apple
GM
Stock
A0
A1
A2
AM-LOC
hasActor
hasAtTime
hasFutureTime
hasPlace
hasTime
A0
A1
A2
AM-LOC
hasActor
hasAtTime
hasFutureTime
hasPlace
hasTime
A0
A1
A2
AM-LOC
hasActor
hasAtTime
hasFutureTime
hasPlace
hasTime
A0
A1
A2
AM-LOC
hasActor
hasAtTime
hasPlace
hasTime
English
343
388
97
37
857
1494
74
51
1568
282
248
68
21
608
1809
51
22
1860
307
330
87
29
734
1580
101
29
1681
337
1043
283
40
1714
1700
42
1732
Spanish
112
32.65
216
55.67
64
65.98
0
0.00
528
61.61
852
57.03
0
0.00
0
0.00
852
54.34
113
40.07
207
83.47
48
70.59
0
0.00
471
77.47
1021
56.44
0
0.00
0
0.00
1021
54.89
105
34.20
182
55.15
51
58.62
0
0.00
421
57.36
768
48.61
0
0.00
0
0.00
768
45.69
162
48.07
575
55.13
89
31.45
0
0.00
950
55.43
977
57.47
0
0.00
977
56.41
Italian
140
40.82
178
45.88
91
93.81
2
5.41
410
47.84
1326
88.76
170
100.00
2
3.92
1496
95.41
122
43.26
165
66.53
55
80.88
7
33.33
342
56.25
1321
73.02
144
100.00
7
31.82
1465
78.76
101
32.90
140
42.42
46
52.87
3
10.34
287
39.10
1371
86.77
196
100.00
3
10.34
1567
93.22
435
100.00
851
81.59
153
54.06
4
10.00
1439
83.96
1335
78.53
4
9.52
1443
83.31
151
193
40
38
403
921
0
42
921
90
162
32
29
299
968
0
32
968
119
145
22
33
303
916
0
34
916
371
470
63
52
967
875
56
875
Dutch
44.02
49.74
41.24
100.00
47.02
61.65
0.00
82.35
58.74
31.91
65.32
47.06
100.00
49.18
53.51
0.00
100.00
52.04
38.76
43.94
25.29
100.00
41.28
57.97
0.00
100.00
54.49
100.00
45.06
22.26
100.00
56.42
51.47
100.00
50.52
Table 44 shows the coverage results for the actual triples that relate ILI-based events
with entities through the above predicates. We only considered the hasActor and hasPlace
relations. Obviously, the coverage is low since this is a very difficult task: all 3 elements
need to match exactly. Spanish results are on average 3%, and Italian and Dutch less than
1%.
Since triples usually are mentioned only once and at most a few times in the corpora
(30 articles only), it makes no sense to show frequency tables of triples. In Figure 59, we
give some examples of triples shared by all 4 languages.
February 1, 2016
137/148
Table 44: ILI-based Triples extracted for English, Spanish, Italian and Dutch
Wikinews with proportion of coverage, measured as macro and micro coverage.
I=instances, M=mentions, O=overlap, maC=macro-average over all document results,
miC=microAverage over all mentions
airbus
apple
gm
stock
Total
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
English
I
M
775
775
525
525
647
647
1463
1473
853
3410
I
369
292
262
497
355
M
369
292
262
500
1420
Spanish
O
maC
24
3.1
32
6.1
20
3.1
0
0
1423
3.1
miC
3.1
6.1
3.1
0
3.1
I
390
312
273
1312
572
M
390
312
273
1326
2287
Italian
O
7
7
1
0
2301
maC
1.0
1.3
0.2
0
0.6
miC
0.9
1.3
0.2
0
0.6
I
381
251
273
773
420
M
381
251
273
801
1678
Dutch
O
6
8
2
0
1706
maC
0.8
1.5
0.3
0
0.7
Triples in a l l 4 languages
i l i −30−00975427−v ; i l i −30−00974367−v [ announce ] : h a s A c t o r : B o e i n g
i l i −30−00975427−v ; i l i −30−00974367−v [ announce ] : h a s A c t o r : A i r b u s
i l i −30−02646757−v ; i l i −30−02207206−v [ buy ] : h a s A c t o r : E u r o p e a n U n i o n
i l i −30−00761713−v [ n e g o c i a t e ] : h a s A c t o r : A e r o f l o t
i l i −30−02244956−v ; i l i −30−02242464−v [ d e a l ; s e l l ] : h a s A c t o r : A i r b u s
i l i −30−00634472−v [ c o n c l u d e ] : h a s A c t o r : B o e i n g
i l i −30−00882948−v ; i l i −30−00875141−v [ commend ; a d v o c a t e ] : h a s A c t o r : A i r b u s
i l i −30−00354845−v ; i l i −30−00358431−v [ d i e ; b u y t h e f a r m ] : h a s A c t o r : S t e v e J o b s
i l i −30−01734502−v ; i l i −30−00246217−v [ d u p l i c a t e ; d o u b l e ] : h a s A c t o r : A p p l e I n c .
i l i −30−00975427−v ; i l i −30−00974367−v [ announce ] : h a s A c t o r : S t a r b u c k s
i l i −30−00975427−v ; i l i −30−00974367−v ; i l i −30−00820801−v ; i l i −30−01010118−v [ announce ; d e c l a r e ] :
hasActor : U n i t e d S t a t e s
i l i −30−01182709−v ; i l i −30−02327200−v ; i l i −30−02479323−v [ p r o v i d e ; f u r n i s h ; i s s u e ] :
hasActor : General Motors
h a s A c t o r : Ford Motor Company
i l i −30−02244956−v ; i l i −30−02242464−v [ d e a l ; s e l l ] : h a s A c t o r : Opel
hasActor : General Motors
Figure 59: Identical triples across different languages
7.4
Conclusions
We described the results of cross-lingual semantic processing of text. To our knowledge,
there is no other system that can perform such a task. Being able to merge the interpretation of text across language is a big achievement and it shows the opportunities for
interoperability of the NewsReader system. We have also seen that for most data types
coverage still leaves room for improvement. We have also seen that differences in implementation have an impact on the comparability. Spanish results are more closer to English
because most of the NLP modules for English and Spanish are developed by the same
group, whereas the Dutch and Italian pipelines are mostly based on different software.
That does not mean that the output of the Spanish software is better than the Dutch software. It only means that it is more compatible. For a qualitative evaluation, we need to
use the cross-lingual annotation of the Wikinews. This is reported in Agerri et al. (2015).
February 1, 2016
miC
0.8
1.5
0.3
0
0.7
8
138/148
Conclusions
In this deliverable, we described the final project results on event modelling, as part of WP5
activities. We explained in detail the conversion process from NAF to SEM representation
according to a batch and streaming architecture. This process explains how we get from
text to RDF specifications of textual content. The core problem here is event-coreference.
We describe the different approaches implemented and the evaluations on cross-document
coreference on the ECB+ data set. The NAF2SEM process resolving event-coreference has
been applied to over 3 million car documents generating more than half a billion triples.
To move beyond event structures, we need to relate events to time and to each other.
We describe our modules for extracting event relations, one for temporal relations and one
for causal relations. The output of these modules can be used to create timelines, for which
we organised a SemEval task in 2015 with evaluation results. Timelines form the basis for
creating Storylines. Our approach has been presented at the ACL workshop on this topic
that we organized as well in 2015.
Not all events are real-world events. Many expressions in news reflect perspectives on
real-world events. We explained our perspective module that takes various NAF layers as
input to model the attribution relation of sources with respect to their beliefs and opinions.
Finally, we reported the results on the cross-lingual processing of news document, obtained by comparing generated RDF-TRiG files for the Wikinews corpora for English,
Spanish, Italian and Dutch.
February 1, 2016
9
139/148
Appendix
Table 45: FrameNet frames for contextualEvents
Absorb heat
Abundance
Abusing
Adding up
Adjusting
Adorning
Aging
Amalgamation
Amounting to
Apply heat
Arranging
Arriving
Assemble
Assistance
Attaching
Attack
Avoiding
Becoming a member
Becoming detached
Behind the scenes
Being attached
Being employed
Being in category
Being in operation
Being located
Body movement
Breathing
Bringing
Building
Bungling
Catastrophe
Cause change
Cause change of consistency
Cause change of phase
Cause change of position on a scale
Cause change of strength
Cause expansion
Cause fluidic motion
Cause harm
Cause impact
Cause motion
Cause temperature change
Cause to amalgamate
Cause to be dry
Cause to be sharp
Cause to be wet
Cause to experience
Cause to fragment
Cause to make noise
Cause to make progress
Cause to move in place
Cause to start
Cause to wake
Change direction
Change event duration
Change event time
Change of consistency
Change of leadership
Change of phase
Change operational state
Change position on a scale
Change posture
Change tool
Closure
Collaboration
Colonization
Come together
Coming up with
Commerce buy
Commerce collect
Commerce pay
Commerce sell
Compatibility
Competition
Compliance
Conquering
Cooking creation
Corroding
Corroding caused
Cotheme
Create physical artwork
Create representation
Creating
Cure
Cutting
Damaging
Daring
Death
Defend
Delivery
Departing
Destroying
Detaching
Dimension
Dispersal
Dodging
Dressing
Duplication
Earnings and losses
Eclipse
Education teaching
Elusive goal
Emitting
Employing
Emptying
Escaping
Evading
Examination
Exchange
Exchange currency
Exclude member
Excreting
Expansion
Expensiveness
Experience bodily harm
Experiencer obj
Filling
Fining
Firing
Fleeing
Fluidic motion
Forging
Forming relationships
Friction
Frugality
Gathering up
Getting
Getting up
Giving
Grinding
Grooming
Hiding objects
Hiring
Hit target
Holding off on
Hostile encounter
Imitating
Immobilization
Impact
Imprisonment
Inchoative attaching
Inchoative change of temperature
Ingest substance
Ingestion
Inspecting
Installing
Institutionalization
Intentional traversing
Intentionally create
Kidnapping
Killing
Knot creation
Leadership
Light movement
Limiting
Location of light
Locative relation
Make acquaintance
Make noise
Manipulate into doing
Manipulation
Manufacturing
Mass motion
Motion
Motion directional
Motion noise
Moving in place
Operate vehicle
Operational testing
Path shape
Perception
Personal relationship
Piracy
Placing
Posture
Precipitation
Preserving
Processing materials
Prohibiting
Provide lodging
Quarreling
Quitting
Quitting a place
Reading
Receiving
Recording
Recovery
Rejuvenation
Releasing
Removing
Render nonfunctional
Renting
Renting out
Replacing
Reshaping
Residence
Resolve problem
Resurrection
Revenge
Rewards and punishments
Ride vehicle
Robbery
Rope manipulation
Rotting
Scouring
Scrutiny
Seeking
Self motion
Sending
Separating
Setting fire
Shoot projectiles
Shopping
Sign agreement
Similarity
Sleep
Smuggling
Soaking
Social event
Sound movement
Storing
Supply
Surpassing
Surviving
Take place of
Taking
Text creation
Theft
Translating
Travel
Traversing
Undergo change
Undressing
Use firearm
Visiting
Waiting
Waking up
Wearing
Weather
Execution
Inhibit movement
Proliferating in number
February 1, 2016
140/148
Table 46: FrameNet from for sourceEvents
Achieving first
Adding up
Adducing
Agree or refuse to act
Appointing
Attempt suasion
Bail decision
Be in agreement on assessment
Be translation equivalent
Become silent
Behind the scenes
Being named
Body movement
Bragging
Categorization
Chatting
Choosing
Claim ownership
Coming up with
Commitment
Communicate categorization
Communication
Communication manner
Communication means
Communication noise
Communication response
Compatibility
Complaining
Compliance
Confronting problem
Contacting
Criminal investigation
Deny permission
Deserving
Discussion
Distinctiveness
Encoding
Eventive cognizer affecting
Evidence
Experiencer obj
Expressing publicly
Forgiveness
Gesture
Grant permission
Have as translation equivalent
Heralding
Imposing obligation
Judgment
Judgment communication
Judgment direct address
Justifying
Labeling
Linguistic meaning
Make agreement on action
Make noise
Making faces
Motion noise
Name conferral
Notification of charges
Omen
Pardon
Predicting
Prevarication
Prohibiting
Questioning
Referring by name
Regard
Reporting
Request
Respond to proposal
Reveal secret
Rite
Seeking
Sign
Silencing
Simple naming
Speak on topic
Spelling and pronouncing
Statement
Suasion
Subjective influence
Successfully communicate message
Talking into
Telling
Text creation
Verdict
Appearance
Categorization
Chemical-sense description
Locating
Perception active
Perception body
Perception experience
Seeking
Trust
Adopt selection
Assessing
Awareness
Becoming aware
Categorization
Cause emotion
Certainty
Choosing
Cogitation
Coming to believe
Daring
Desiring
Differentiation
Emotion active
Estimating
Expectation
Experiencer focus
Experiencer obj
Familiarity
Feeling
Feigning
Grasp
Importance
Judgment
Occupy rank
Opinion
Partiality
Place weight on
Preference
Purpose
Reliance
Scrutiny
Seeking
Taking sides
Topic
Table 47: FrameNet frames for grammaticalEvents
Accomplishment
Achieving first
Activity finish
Activity ongoing
Activity prepare
Activity start
Activity stop
Amassing
Arriving
Assistance
Attempt
Avoiding
Becoming
Birth
Causation
Cause change
Cause to continue
Cause to end
Coming to be
Containing
Cooking creation
Cotheme
Creating
Departing
Detaining
Dough rising
Emanating
Event
Evidence
Execute plan
Existence
Experiencer obj
Grant permission
Halt
Have as requirement
Hindering
Holding off on
Inclusion
Influence of event on cognizer
Intentionally act
Intentionally affect
Launch process
Left to do
Manufacturing
Motion
Operating a system
Permitting
Possession
Preventing
Process continue
Process end
Process resume
Process start
Process stop
Reasoning
Relative time
Remainder
Ride vehicle
Self motion
Setting fire
Setting out
Sidereal appearance
State continue
Storing
Success or failure
Successful action
Taking
Taking time
Thriving
Thwarting
Topic
Undergo change
Using
February 1, 2016
141/148
References
Rodrigo Agerri, Josu Bermudez, and German Rigau. IXA pipeline: Efficient and Ready
to Use Multilingual NLP tools. In Proceedings of the Ninth International Conference on
Language Resources and Evaluation (LREC-2014), 2014. 00013.
Rodrigo Agerri, Itziar Aldabe, Zuhaitz Beloki, Egoitz Laparra, German Rigau, Aitor Soroa,
Marieke van Erp, Antske Fokkens, Filip Ilievski, Ruben Izquierdo, Roser Morante, and
Piek Vossen. Event detection, version 2. NewsReader Deliverable 4.2.3, 2015.
Amit Bagga and Breck Baldwin. Algorithms for scoring coreference chains. In Proceedings
of the International Conference on Language Resources and Evaluation (LREC), 1998.
Collin F. Baker, Charles J. Fillmore, and John B. Lowe. The Berkeley FrameNet project.
In COLING-ACL ’98: Proceedings of the Conference, pages 86–90, Montreal, Canada,
1998.
Mieke Bal. Narratology: Introduction to the theory of narrative. University of Toronto
Press, 1997.
Cosmin Adrian Bejan and Sanda Harabagiu. Unsupervised event coreference resolution
with rich linguistic features. In Proceedings of the 48th Annual Meeting of the Association
for Computational Linguistics, Uppsala, Sweden, 2010.
Cosmin Adrian Bejan and Sanda Harabagiu. Unsupervised event coreference resolution
with rich linguistic features. In Proceedings of the 48th Annual Meeting of the Association
for Computational Linguistics, Uppsala, Sweden, 2010.
Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python:
http://nltk.org/book. O’Reilly Media Inc., 2009.
Anders Björkelund, Love Hafdell, and Pierre Nugues. Multilingual semantic role labeling. In Proceedings of the Thirteenth Conference on Computational Natural Language
Learning: Shared Task, CoNLL ’09, pages 43–48, Boulder, Colorado, USA, 2009.
Eduardo Blanco and Dan Moldovan. Leveraging verb-argument structures to infer semantic relations. In Proceedings of the 14th Conference of the European Chapter of the
Association for Computational Linguistics, pages 145–154, Gothenburg, Sweden, 2014.
David M Blei and Peter I Frazier. Distance dependent chinese restaurant processes. The
Journal of Machine Learning Research, 12:2461–2488, 2011.
Francis Bond, Piek Vossen, John P. McCrae, and Christiane Fellbaum. Cili: the collaborative interlingual index. Proceedings of the Eighth meeting of the Global WordNet
Conference (GWC 2016), Bucharest, 2016.
Jerome S Bruner. Acts of meaning. Harvard University Press, 1990.
February 1, 2016
142/148
Tommaso Caselli, Antske Fokkens, Roser Morante, and Piek Vossen. Spinoza vu: An nlp
pipeline for cross document timelines. In Proceedings of the 9th International Workshop
on Semantic Evaluation (SemEval 2015), pages 787–791, Denver, Colorado, June 2015.
Association for Computational Linguistics.
Tommaso Caselli, Marieke van Erp, Anne-Lyse Minard, Mark Finlayson, Ben Miller, Jordi
Atserias, Alexandra Balahur, and Piek Vossen, editors. Proceedings of the First Workshop on Computing News Storylines. Association for Computational Linguistics, Beijing,
China, July 2015.
Tommaso Caselli, Piek Vossen, Marieke van Erp, Antske Fokkens, Filip Ilievski,
Ruben Izquierdo Bevia, Minh Le, Roser Morante, and Marten Postma. When it’s all
piling up: investigating error propagation in an nlp pipeline. In WNACP2015, 2015.
Taylor Cassidy, Bill McDowell, Nathanael Chambers, and Steven Bethard. An annotation
framework for dense event ordering. In Proceedings of the 52nd Annual Meeting of the
Association for Computational Linguistics (Volume 2: Short Papers), pages 501–506,
Baltimore, Maryland, June 2014. Association for Computational Linguistics.
Nathanael Chambers, Taylor Cassidy, Bill McDowell, and Steven Bethard. Dense event ordering with a multi-pass architecture. Transactions of the Association for Computational
Linguistics, 2:273–284, 2014.
Nate Chambers. Navytime: Event and time ordering from raw text. In Proceedings of
the Seventh International Workshop on Semantic Evaluation, SemEval ’13, pages 73–77,
Atlanta, Georgia, USA, 2013.
Zheng Chen and Heng Ji. Event coreference resolution: Feature impact and evaluation. In
Proceedings of Events in Emerging Text Types (eETTs) Workshop, 2009.
Zheng Chen and Heng Ji. Graph-based event coreference resolution. In TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing,
pages 54–57, 2009.
Bin Chen, Jian Su, Sinno Jialin Pan, and Chew Lim Tan. A unified event coreference
resolution by integrating multiple resolvers. In Proceedings of the 5th International Joint
Conference on Natural Language Processing, Chiang Mai, Thailand, 2011.
Francesco Corcoglioniti, Marco Rospocher, Roldano Cattoni, Bernardo Magnini, and Luciano Serafini. Interlinking unstructured and structured knowledge in an integrated
framework. In Proc. of 7th IEEE International Conference on Semantic Computing
(ICSC), Irvine, CA, USA, 2013. (to appear).
Agata Cybulska and Piek Vossen. Semantic relations between events and their time, locations and participants for event coreference resolution. In Proceedings of Recent Advances
in Natural Language Processing (RANLP-2013), pages 156–163, 2013.
February 1, 2016
143/148
Agata Cybulska and Piek Vossen. Semantic relations between events and their time, locations and participants for event coreference resolution. In Proceedings of recent advances
in natural language processing, 2013.
Agata Cybulska and Piek Vossen. Guidelines for ecb+ annotation of events and their
coreference, 2014.
Agata Cybulska and Piek Vossen. Using a sledgehammer to crack a nut? lexical diversity
and event coreference resolution. In Proceedings of the International Conference on
Language Resources and Evaluation (LREC 2014), 2014b.
Agata Cybulska and Piek Vossen. ”bag of events” approach to event coreference resolution.
supervised classification of event templates. In proceedings of the 16th Cicling 2015 (colocated: 1st International Arabic Computational Linguistics Conference), Cairo, Egypt,
April 14–20 2015.
Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. Improving efficiency
and accuracy in multilingual entity extraction. In Proceedings of the 9th International
Conference on Semantic Systems (I-Semantics), 2013.
Günes Erkan and Dragomir R Radev. Lexrank: graph-based lexical centrality as salience
in text summarization. Journal of Artificial Intelligence Research, pages 457–479, 2004.
Antske Fokkens, Marieke van Erp, Piek Vossen, Sara Tonelli, Willem Robert van Hage,
Luciano Serafini, Rachele Sprugnoli, and Jesper Hoeksema. GAF: A grounded annotation framework for events. In Proceedings of the first Workshop on Events: Definition,
Dectection, Coreference and Representation, Atlanta, USA, 2013.
Antske Fokkens, Aitor Soroa, Zuhaitz Beloki, Niels Ockeloen, German Rigau,
Willem Robert van Hage, and Piek Vossen. Naf and gaf: Linking linguistic annotations. In Proceedings 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic
Annotation, page 9, 2014.
Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. PPDB: The paraphrase database. In Proceedings of NAACL-HLT, pages 758–764, Atlanta, Georgia,
June 2013. Association for Computational Linguistics.
Matthew Gerber and Joyce Chai. Semantic role labeling of implicit arguments for nominal
predicates. Computational Linguistics, 38(4):755–798, December 2012.
Paul Grice. Logic and conversation. Syntax and semantics. 3: Speech acts, pages 41–58,
1975.
David Herman, Manfred Jahn, and Marie-Laure Ryan, editors. Routlege Encyclopedia of
Narrative Theory. Routledge, 2010.
February 1, 2016
144/148
Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. YAGO2:
A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artif. Intell.,
194:28–61, 2013.
Lifu Huang and Lian’en Huang. Optimized event storyline generation based on mixtureevent-aspect model. In Proceedings of the 2013 Conference on Empirical Methods in
Natural Language Processing, pages 726–735, Seattle, Washington, USA, October 2013.
Association for Computational Linguistics.
Kevin Humphreys, Robert Gaizauskas, and Saliha Azzam. Event coreference for information extraction. In Proceedings of a Workshop on Operational Factors in Practical,
Robust Anaphora Resolution for Unrestricted Texts, 1997.
Taku Kudo and Yuji Matsumoto. Fast Methods for Kernel-based Text Analysis. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume
1, ACL ’03, pages 24–31, Stroudsburg, PA, USA, 2003.
Egoitz Laparra and German Rigau. Impar: A deterministic algorithm for implicit semantic role labelling. In Proceedings of the 51st Annual Meeting of the Association for
Computational Linguistics (ACL 2013), pages 33–41, 2013.
LDC. Ldc. ace (automatic content extraction) english annotation guidelines for events ver.
5.4.3 2005.07.01., 2005.
Claudia Leacock and Martin Chodorow. Combining local context with wordnet similarity
for word sense identification, 1998.
Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and
Dan Jurafsky. Stanford’s multi-pass sieve coreference resolution system at the conll2011 shared task. In Proceedings of the Fifteenth Conference on Computational Natural
Language Learning: Shared Task, CONLL Shared Task ’11, Portland, Oregon, 2011.
Heeyoung Lee, Marta Recasens, Angel Chang, Mihai Surdeanu, and Dan Jurafsky. Joint
entity and event coreference resolution across documents. In Proceedings of the 2012
Conference on Empirical Methods in Natural Language Processing and Natural Language
Learning (EMNLPCoNLL), 2012.
Zhengzhong Liu, Jun Araki, Eduard Hovy, and Teruko Mitamura. Supervised withindocument event coreference using information propagation. In Proceedings of the International Conference on Language Resources and Evaluation, 2014.
Hector Llorens, Estela Saquete, and Borja Navarro. Tipsem (english and spanish): Evaluating crfs and semantic roles in tempeval-2. In Proceedings of the 5th International
Workshop on Semantic Evaluation, pages 284–291. Association for Computational Linguistics, 2010.
February 1, 2016
145/148
A. Xiaoqiang Luo, Sameer Pradhan, Marta Recasens, and Eduard Hovy. Scoring coreference partitions of predicted mentions: A reference implementation. In Proceedings of
the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore,
MD, June 2014.
Xiaoqiang Luo. On coreference resolution performance metrics. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural
Language Processing (EMNLP-2005), 2005.
Inderjeet Mani, Marc Verhagen, Ben Wellner, Chong Min Lee, and James Pustejovsky.
Machine learning of temporal relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for
Computational Linguistics, ACL-44, pages 753–760, Stroudsburg, PA, USA, 2006.
Anne-Lyse Minard, Manuela Speranza, Eneko Agirre, Itziar Aldabe, Marieke van Erp,
Bernardo Magnini, German Rigau, Rubén Urizar, and Fondazione Bruno Kessler.
SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering. In Proceedings
of the 9th International Workshop on Semantic Evaluation (SemEval 2015). Association
for Computational Linguistics, 2015.
Paramita Mirza and Sara Tonelli. An analysis of causality between events and its relation to
temporal information. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2097–2106, Dublin, Ireland,
August 2014. Dublin City University and Association for Computational Linguistics.
Paramita Mirza and Sara Tonelli. Classifying Temporal Relations with Simple Features.
In Proceedings of the 14th Conference of the European Chapter of the Association for
Computational Linguistics, pages 308–317, Gothenburg, Sweden, 2014.
Paramita Mirza, Rachele Sprugnoli, Sara Tonelli, and Manuela Speranza. Annotating
causality in the tempeval-3 corpus. In Proceedings of the EACL 2014 Workshop on
Computational Approaches to Causality in Language (CAtoCL), pages 10–19, Gothenburg, Sweden, April 2014. Association for Computational Linguistics.
Luc Moreau, Paolo Missier, Khalid Belhajjame, Reza B’Far, James Cheney, Sam Coppens,
Stephen Cresswell, Yolanda Gil, Paul Groth, Graham Klyne, Timothy Lebo, Jim McCusker, Simon Miles, James Myers, Satya Sahoo, and Curt Tilmes. PROV-DM: The
PROV Data Model. Technical report, W3C, 2012.
Kiem-Hieu Nguyen, Xavier Tannier, and Veronique Moriceau. Ranking multidocument
event descriptions for building thematic timelines. In Proceedings of COLING‘14, pages
1208–1217, 2014.
Martha S. Palmer, Deborah A. Dahl, Rebecca J. Schiffman, Lynette Hirschman, Marcia
Linebarger, and John Dowding. Recovering implicit information. In Proceedings of
February 1, 2016
146/148
the 24th annual meeting on Association for Computational Linguistics, ACL ’86, pages
10–19, New York, New York, USA, 1986.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python. In
Journal of Machine Learning Research, 12: 2825–2830, 2011.
Emanuele Pianta, Christian Girardi, and Roberto Zanoli. The textpro tool suite. In
Proceedings of the Sixth International Conference on Language Resources and Evaluation
(LREC’08), Marrakech, Morocco, may 2008.
Emily Pitler and Ani Nenkova. Using syntax to disambiguate explicit discourse connectives
in text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort
’09, pages 13–16, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
Sameer Pradhan, Lance Ramshaw, Mitchell Marcus, Martha Palmer, Ralph Weischedel,
and Nianwen Xue. Conll-2011 shared task: Modeling unrestricted coreference in
ontonotes. In Proceedings of CoNLL 2011: Shared Task, 2011.
Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi,
and Bonnie Webber. The penn discourse treebank 2.0. In In Proceedings of LREC, 2008.
James Pustejovsky, Bob Ingria, Roser Sauri, Jose Castano, Jessica Littman, Rob
Gaizauskas, Andrea Setzer, Graham Katz, and Inderjeet Mani. The specification language timeml. The language of time: A reader, pages 545–557, 2005.
Willard V. Quine. Events and reification. In Actions and Events: Perspectives on the
Philosophy of Davidson, pages 162–71. Blackwell, 1985.
Marta Recasens and Eduard Hovy. Blanc: Implementing the rand index for coreference
evaluation. In Natural Language Engineering,17, (4), pages 485–510, 2011.
Marco Rospocher, Anne-Lyse Minard, Paramita Mirza, Piek Vossen, Tommaso Caselli,
Agata Cybulska, Roser Morante, and Itziar Aldabe. Event narrative module, version 2.
NewsReader Deliverable 5.1.1, 2015.
Marco Rospocher, Marieke van Erp, Piek Vossen, Antske Fokkens, Itziar Aldabe, German Rigau, Aitor Soroa, Thomas Ploeger, and Tessel Bogaard. Building event-centric
knowledge graphs from news. Journal of Web Semantics, 2016.
Marie-Laure Ryan. Possible Worlds, Artificial Intelligence and Narrative Theory. Bloomington: Indian University Press, 1991.
Roser Saurı́, Jessica Littman, Robert Gaizauskas, Andrea Setzer, and James Pustejovsky.
TimeML Annotation Guidelines, Version 1.2.1, 2006.
February 1, 2016
147/148
Roser Saurı́. A factuality profiler for eventualities in text. PhD thesis, Brandeis University,
Waltham, MA, USA, 2008.
William F Styler IV, Steven Bethard, Sean Finan, Martha Palmer, Sameer Pradhan, Piet C
de Groen, Brad Erickson, Timothy Miller, Chen Lin, Guergana Savova, et al. Temporal
annotation in the clinical domain. Transactions of the Association for Computational
Linguistics, 2:143–154, 2014.
Joel R. Tetreault. Implicit role reference. In International Symposium on Reference Resolution for Natural Language Processing, pages 109–115, Alicante, Spain, 2002.
The PDTB Research Group. The PDTB 2.0. Annotation Manual. Technical Report IRCS08-01, Institute for Research in Cognitive Science, University of Pennsylvania, 2008.
Sara Tonelli, Rachele Sprugnoli, Manuela Speranza, and Anne-Lyse Minard. NewsReader
Guidelines for Annotation at Document Level. Technical Report NWR2014-2-2, Fondazione Bruno Kessler, 2014. http://www.newsreader-project.eu/files/2014/12/
NWR-2014-2-2.pdf.
Naushad UzZaman and James Allen. Temporal evaluation. In Proceedings of the 49th
Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies, pages 351–356, Portland, Oregon, USA, 2011.
Naushad UzZaman, Hector Llorens, James Allen, Leon Derczynski, Marc Verhagen, and
James Pustejovsky. Tempeval-3: Evaluating events, time expressions, and temporal
relations. arXiv preprint arXiv:1206.5333, 2012.
Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen, and
James Pustejovsky. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions,
events, and temporal relations. In Proceedings of the Seventh International Workshop
on Semantic Evaluation, SemEval ’13, pages 1–9, Atlanta, Georgia, USA, 2013.
Naushad UzZaman, Hector Llorens, Leon Derczynski, Marc Verhagen, James Allen, and
James Pustejovsky. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions,
events, and temporal relations, 2013.
Teun A. van Dijk. News As Discourse. Routledge, 1988.
Marieke van Erp, Piek Vossen, Rodrigo Agerri, Anne-Lyse Minard, Manuela Speranza,
Ruben Urizar, and Egoitz Laparra. Annotated data, version 2. NewsReader Deliverable
3.3.2, 2015.
Willem Robert van Hage, Véronique Malaisé, Roxane Segers, Laura Hollink, and Guus
Schreiber. Design and use of the Simple Event Model (SEM). J. Web Sem., 9(2):128–
136, 2011. http://dx.doi.org/10.1016/j.websem.2011.03.003.
February 1, 2016
148/148
Marc Verhagen, Roser Sauri, Tommaso Caselli, and James Pustejovsky. Semeval-2010
task 13: Tempeval-2. In Proceedings of the 5th international workshop on semantic
evaluation, pages 57–62. Association for Computational Linguistics, 2010.
Marc Vilain, John Burger, John Aberdeen, Dennis Connolly, and Lynette Hirschman. A
model theoretic coreference scoring scheme. In Proceedings of MUC-6, 1995.
Piek Vossen, Francis Bond, and John P. McCrae. Toward a truly multilingual global
wordnet grid. Proceedings of the Eighth meeting of the Global WordNet Conference
(GWC 2016), Bucharest, 2016.
Greg Whittemore, Melissa Macpherson, and Greg Carlson. Event-building through rolefilling and anaphora resolution. In Proceedings of the 29th annual meeting on Association
for Computational Linguistics, ACL ’91, pages 17–24, Berkeley, California, USA, 1991.
Phillip Wolff and Grace Song. Models of causation and the semantics of causal verbs.
Cognitive Psychology, 47(3):276–332, 2003.
Phillip Wolff. Representing causation. Journal of experimental psychology: General,
136(1):82, 2007.
Shize Xu, Shanshan Wang, and Yan Zhang. Summarizing complex events: a cross-modal
solution of storylines extraction and reconstruction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1281–1291, Seattle,
Washington, USA, October 2013. Association for Computational Linguistics.
Bishan Yang, Claire Cardie, and Peter I. Frazier. A hierarchical distance-dependent
bayesian model for event coreference resolution. CoRR, abs/1504.05929, 2015.
February 1, 2016

Event Narrative Module, version 3 Deliverable D5.1.3

Transcription

Similar documents

888 Casino Telephone Number

product brochure

An initiative of the Fannie E. Rippel Foundation

The NWR Packet

Diapozitivul 1

A Narrative-based Virtual Environment as a Research Tool

a PDF of this presentation