Constructing Narrative Event Evolutionary Graph For Script .

2y ago
39 Views
2 Downloads
294.82 KB
7 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Matteo Vollmer
Transcription

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)Constructing Narrative Event Evolutionary Graph for Script Event PredictionZhongyang Li, Xiao Ding and Ting Liu Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology{zyli, xding, tliu}@ir.hit.edu.cnAbstractCharactersX CustomerScript event prediction requires a model to predictthe subsequent event given an existing event context. Previous models based on event pairs or eventchains cannot make full use of dense event connections, which may limit their capability of eventprediction. To remedy this, we propose constructing an event graph to better utilize the event network information for script event prediction. Inparticular, we first extract narrative event chainsfrom large quantities of news corpus, and then construct a narrative event evolutionary graph (NEEG)based on the extracted chains. NEEG can be seenas a knowledge base that describes event evolutionary principles and patterns. To solve the inferenceproblem on NEEG, we present a scaled graph neural network (SGNN) to model event interactionsand learn better event representations. Instead ofcomputing the representations on the whole graph,SGNN processes only the concerned nodes eachtime, which makes our model feasible to largescale graphs. By comparing the similarity betweeninput context event representations and candidateevent representations, we can choose the most reasonable subsequent event. Experimental results onwidely used New York Times corpus demonstratethat our model significantly outperforms state-ofthe-art baseline methods, by using standard multiple choice narrative cloze evaluation.1walk(X, restaurant), seat(X), read(X, menu), order(X, food),serve(Y, X, food), eat(X, food), make(X, payment),c1: signed(X, scholarship)c2: drive(X, mile)c3: spent(X, time)c4: recalled(X, designer)c5: leave(X, restaurant)?Figure 1: An example of script event prediction. The correct subsequent event is marked in bold, and other four choices are randomlyselected.IntroductionUnderstanding events described in text is crucial to many artificial intelligence (AI) applications, such as discourse understanding, intention recognition and dialog generation. Scriptevent prediction is the most challenging task in this line ofwork. This task was first proposed by Chambers and Jurafsky[2008], who defined it as giving an existing event context, oneneeds to choose the most reasonable subsequent event from acandidate list (as shown in Figure 1).Previous studies built prediction models either based onevent pairs [Chambers and Jurafsky, 2008; Granroth-Wilding Y WaiterEvent ContextCorresponding author: tliu@ir.hit.edu.cn4201and Clark, 2016] or event chains [Wang et al., 2017]. Although success in using event pairs and chains, rich connections among events are not fully explored. To better modelevent connections, we propose to solve the problem of scriptevent prediction based on event graph structure and infer thecorrect subsequent event based on network embedding.Figure 2(a) gives an example to motive our idea of usingmore broader event connections (say graph structure). Givenan event context A(enter), B(order), C(serve), we need tochoose the most reasonable subsequent event from the candidate list D(eat) and E(talk), where D(eat) is the correct answer and E(talk) is a randomly selected candidate event thatoccurs frequently in various scenarios. Pair-based and chainbased models trained on event chains datasets (as shown inFigure 2(b)) are very likely to choose the wrong answer E, astraining data show that C and E have a stronger relation thanC and D. As shown in Figure 2(c), by constructing an eventgraph based on training event chains, context events B, C andthe candidate event D compose a strongly connected component, which indicates that D is a more reasonable subsequentevent, given context events A, B, C.Abstract event evolutionary principles and patterns arevaluable commonsense knowledge, which is crucial for understanding narrative text, human behavior and social development. We use the notion of event evolutionary graph(EEG) to denote the knowledge base that stores this kindof knowledge. Structurally, EEG is a directed cyclic graph,whose nodes are events and edges stand for the relations be-

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)(eat)(enter) (order) (serve)ABDCthe correct answerE the randomly selected(talk) wrong answer(a) Given an event context (A, B, C), choose the subsequent event from D and E.ACD(order)BCEBACEDBCAABC(enter)(b) Training event chains.312(eat)1D1C(serve)2SCCE(talk)(c) Event graph based on event chains in (b).Figure 2: In (b), the training event chains show that C and E have astronger relation than C and D, therefore event pair-based and chainbased models are very likely to choose the wrong, random candidate E. In (c), events B, C, and D compose a strongly connectedcomponent. This special graph structure contains dense connectionsinformation, which can help learn better event representations forchoosing the correct subsequent event D.tween events, e.g. temporal and causal relations. In this paper, we construct an event evolutionary graph based on narrative event chains, which is called narrative event evolutionarygraph (NEEG). Having a NEEG in hand, another challengingproblem is how to infer the subsequent event on the graph.A possible solution is to learn event representations based onnetwork embedding.Duvenaud et al. [2015] introduced a convolutional neuralnetwork that could operate directly on graphs, which couldbe used for end-to-end learning of prediction tasks whose inputs are graphs of arbitrary size and shape. Kipf and Welling[2016] presented a scalable approach for semi-supervisedlearning on graphs that was based on an efficient variant ofconvolutional neural networks. They chose a localized firstorder approximation of spectral graph convolutions as theconvolutional architecture, to scale linearly in the number ofgraph edges and learn hidden layer representations that encode both local graph structure and features of nodes. However, their models require the adjacency matrix to be symmetric and can only operate on undirected graphs. Gori etal. [2005] proposed graph neural network (GNN), whichextended recursive neural networks and could be applied onmost of the practically useful kinds of graphs, including directed, undirected, labeled and cyclic graphs. However, thelearning algorithm in their model required running the propagation to convergence, which could have trouble propagatinginformation across a long range in a graph. To remedy this, Liet al. [2015] introduced modern optimization techniques ofgated recurrent units to GNN. Nevertheless, their models canonly operate on small graphs. In this paper, we further extendthe work of Li et al. [2015] by proposing a scaled graph neural network (SGNN), which is feasible to large-scale graphs.We borrow the idea of divide and conquer in the trainingprocess that instead of computing the representations on thewhole graph, SGNN processes only the concerned nodes eachtime. By comparing between context event representationsand candidate event representations learned from SGNN, we4202can choose the correct subsequent event.This paper makes the following two key contributions: We are among the first to propose constructing event graphinstead of event pairs and event chains for the task of scriptevent prediction. We present a scaled graph neural network, which canmodel event interactions on large-scale dense directedgraphs and learn better event representations for prediction.Empirical results on widely used New York Times corpus show that our model achieves the best performance compared to state-of-the-art baseline methods, by using standard multiple choice narrative cloze (MCNC) evaluation.The data and code are released at https://github.com/eecrazy/ConstructingNEEG IJCAI 2018.2ModelAs shown in Figure 3, our model consists of two steps. Thefirst step is to construct an event evolutionary graph basedon narrative event chains. Second, we present a scaled graphneural network to solve the inference problem on the constructed event graph.2.1 Narrative Event Evolutionary GraphConstructionNEEG construction consists of two steps: (1) we extractnarrative event chains from newswire corpus; (2) constructNEEG based on the extracted event chains.In order to compare with previous work, we adopt thesame news corpus and event chains extraction methods as[Granroth-Wilding and Clark, 2016]. We extract a set ofnarrative event chains S {s1 , s2 , s3 , ., sN }, where si {T, e1 , e2 , e3 , ., em }. For example, si can be {T customer,walk(T , restaurant, -), seat(T , -, -), read(T , menu, -), order(T ,food, -), serve(waiter, food, T ), eat(T , food, fork)}. T is theprotagonist entity shared by all the events in this chain. eiis an event that consists of four components {p(a0 , a1 , a2 )},where p is the predicate verb, and a0 , a1 , a2 are the subject,object and indirect object to the verb, respectively.NEEG can be formally denoted as G {V, E}, whereV {v1 , v2 , v3 , ., vP } is the node set, and E {l1 , l2 , l3 , ., lQ } is the edge set. In order to overcome thesparsity problem of events, we represent event ei by its abstract form (vi , ri ), where vi is denoted by a non-lemmatizedpredicate verb, and ri is the grammatical dependency relation of vi to the chain entity T , for example ei (eats,subj). This kind of event representation is called predicateGR [Granroth-Wilding and Clark, 2016]. We count all thepredicate-GR bigrams in the training event chains, and regard each predicate-GR bigram as an edge li in E. Each li isa directed edge vi vj along with a weight w, which canbe computed by:count(vi , vj )w(vj vi ) k count(vi , vk )(1)where count(vi , vj ) means the frequency of the bigram(vi , vj ) appears in the training event chains.The constructed NEEG G has 104,940 predicate-GRnodes, and 6,187,046 directed and weighted edges. Figure

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)p(vc v1 ,v2 ,v3)e1: (enter, subj) e2: (order, subj)(order, subj)1(walk, subj)0.0350.0300.2(seat, subj)h(0)0.50(read, subj)(pay, subj)(a)3(b)(serve, obj)34g (v1,vc)ec: (eat, subj) e3: (serve, obj)0.100.030.0420.060.04(wait, subj)120.080.090.00.0290.0(eat, subj)0.0220.060.30(enter, subj)1 (leave, subj)AAg (v2,vc)g (v3,vc)h(1)h(2)h(3)GGNNGGNNGGNN GGNNa(3)Aa(t)Aa(1)a(2)Ah(t)h(0)f (vp ,va0 ,va1 ,va2)f (vp ,va0 ,va1 ,va2)e1e2f (vp ,va0 ,va1 ,va2)f (vp ,va0 ,va1 ,va2) e3ec(c)Figure 3: Framework of our proposed SGNN model. Suppose (a) is our constructed NEEG. Each time, a subgraph that contains all thecontext events (node 1,2,3) and all the corresponding candidate events (here we only draw one candidate event which is node 4) is chosen.The initial hidden representations h(0) and the adjacency matrix A are fed into GGNN to get the final event representations h(t) , which areused to calculate the relatedness scores between the context and candidate events.3(a) illustrates a local subgraph in G, which describes thepossible events involved in the restaurant scenario. Unlikelyevent pairs or event chains, event graph has dense connectionsamong events and contains more abundant event interactionsinformation.2.2 Scaled Graph Neural NetworkGNN was first proposed by Gori et al. [2005]. Li et al. [2015]further introduced modern optimization technique of backpropagation through time and gated recurrent units to GNN,which is called gated graph neural network (GGNN). Nevertheless, GGNN needs to take the whole graph as inputs, thusit cannot effectively handle large-scale graph with hundredsof thousands of nodes. For the purpose of scaling to largescale graphs, we borrow the idea of divide and conquer inthe training process that we do not feed the whole graph intoGGNN. Instead, only a subgraph (as shown in Figure 3(b))with context and candidate event nodes is fed into it for eachtraining instance. Finally, the learned node representationscan be used to solve the inference problem on the graph.As shown in Figure 3(c), the overall framework of SGNNhas three main components. The first part is a representationlayer, which is used to learn the initial event representation.The second part is a gated graph neural network, which isused to model the interactions among events and update theinitial event representations. The third part is used to compute the relatedness scores between context and candidateevents, according to which we can choose the correct subsequent event.Learning Initial Event RepresentationsWe learn the initial event representation by composing pretrained word embeddings of its verb and arguments. Forarguments that consist of more than one word, we follow[Granroth-Wilding and Clark, 2016] and only use the head4203word identified by the parser. Out-of-vocabulary words andabsent arguments are represented by zero vectors.Given an event ei {p(a0 , a1 , a2 )} and the word embeddings of its verb and arguments vp , va0 , va1 , va2 Rd (dis the dimension of embeddings), there are several ways toget the representation of the whole event vei by a mappingfunction vei f (vp , va0 , va1 , va2 ). Here, we introduce threewidely used semantic composition methods: Average: Use the mean value of the verb and all argumentsvectors as the representation of the whole event. Nonlinear Transformation [Wang et al., 2017]:ve tanh(Wp · vp W0 · va0 W1 · va1 W2 · va2 b) (2)where Wp , W0 , W1 , W2 , b are model parameters. Concatenation [Granroth-Wilding and Clark, 2016]: Concatenate the verb and all argument vectors as the representation of the whole event.Updating Event Representations Based on GGNNAs introduced above, GGNN is used to model the interactionsamong all context and candidate events. The main challengeis how to train it on a large-scale graph. To train the GGNNmodel on NEEG with more than one hundred thousand eventnodes, each time we feed into it a small subgraph, instead ofthe whole graph, to make it feasible to large-scale graphs.Inputs to GGNN are two matrices h(0) and A, where(0)h {ve1 , ve2 , ., ven , vec1 , vec2 , ., veck } (n is 8 and k is 5,the same as [Granroth-Wilding and Clark, 2016]), containsthe initial context and subsequent candidate event vectors,and A R(n k) (n k) is the corresponding subgraph adjacency matrix, here:{A [i, j] w(vj vi ),0,if vi vj E,others.(3)

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)The adjacency matrix A determines how nodes in the subgraph interact with each other. The basic recurrence ofGGNN is:(t)a(t) A h(t 1) b(4)z (t) σ(W z a(t) U z h(t 1) )(5)r(t)r (t) σ(W ar (t 1) U h)c(t) tanh(W a(t) U (r(t) h(t 1) ))(6)(7)h(t) (1 z (t) ) h(t 1) z (t) c(t)(8)GGNN behaves like the widely used gated recurrent unit(GRU) [Cho et al., 2014]. Eq. (4) is the step that passes information between different nodes of the graph via directedadjacency matrix A. a(t) contains activations from edges.The remainings are GRU-like updates that incorporate information from the other nodes and from the previous time stepto update each node’s hidden state. z (t) and r(t) are the update and reset gate, σ is the logistic sigmoid function, and is element-wise multiplication. We unroll the above recurrentpropagation for a fixed number of steps K. The output h(t) ofGGNN can be used as the updated representations of contextand candidate events.Choosing the Correct Subsequent EventAfter obtaining the hidden states for each event, we modelevent pair relations using these hidden state vectors. Astraightforward approach to model the relation betweentwo events is using a Siamese network [Granroth-Wildingand Clark, 2016]. The output of GGNN for context(t)(t)(t)events are h1 , h2 , ., hn and for the candidate events are(t)(t)(t)(t)hc1 , hc2 , ., hck . Given a pair of events hi (i [1.n])(t)and hcj (j [1.k]), the relatedness score is calculated by(t)(t)sij g(hi , hcj ), where g is the score function.There are multiple choices for the score function g in ourmodel. Here, we introduce four common used similaritycomputing metrics that can serve as g in the followings. Manhattan Similarity is the Manhattan distance of two (t)(t)(t)(t) hi hcj .vectors: manhattan(hi , hcj ) Cosine Similarity is the cosine distance of two vectors:(t)(t)cosine(hi , hcj ) to calculate the relative importance of each context event according to the subsequent event candidates:(t)j(t)(t) hi hcj hi ·h(t)cuij tanh(Wh hi Wc h(t)c j bu )exp(uij )αij k exp(ukj )(9)(10)Then the relatedness score is calculated by:(t)sij αij g(hi , h(t)cj )(11)Training DetailsAll the hyperparameters are tuned on the development set,and we use margin loss as the objective function:L(Θ) N k (max(0, margin sIy sIj )) I 1 j 1λ Θ 22where sIj is the relatedness score between the Ith event context and the corresponding jth subsequent candidate event,y is the index of the correct subsequent event. The marginis the margin loss function parameter, which is set to 0.015.Θ is the set of model parameters. λ is the parameter for L2regularization, which is set to 0.00001. The learning rate is0.0001, batch size is 1000, and recurrent times K is 2.We use DeepWalk algorithm [Perozzi et al., 2014] to trainembeddings for predicate-GR on the constructed NEEG (wefind that embeddings trained from DeepWalk on the graph arebetter than that from Word2vec trained on event chains), anduse the Skipgram algorithm [Mikolov et al., 2013] to trainembeddings for arguments a0 , a1 , a2 on event chains. Theembedding dimension d is 128. The model parameters areoptimized by the RMSprop algorithm. Early stopping is usedto judge when to stop the training loop.3EvaluationWe evaluate the effectiveness of SGNN comparing with several state-of-the-art baseline methods. Accuracy (%) ofchoosing the correct subsequent event is used as the evaluation metric.3.1 Baselines. Dot Similarity is the inner product of two vectors:(t)(t)(t)(t)dot(hi , hcj ) hi · hcj . Euclidean Similarity is the euclidean distance of two vec(t)(t)(t)(t)tors: euclid(hi , hcj ) hi hcj .Given the relatedness score sij between each context event(t)(t)hi and each subsequent candidate event hcj , the likelihood of ecj given e1 , e2 , ., en can be calculated as sj n1i 1 sij , then we choose the correct subsequent event bync maxj sj .We also use the attention mechanism [Bahdanau et al.,2014] to the context events, as we believe that different context events may have different weight for choosing the correct subsequent event. We use an attentional neural network4204We compare our model with the following baseline methods. PMI [Chambers and Jurafsky, 2008] is the co-occurrencebased model that calculates predicate-GR event pairs relations based on Pairwise Mutual Information. Bigram [Jans et al., 2012] is the counting-based skipgrams model that calculates event pair relations based onbigram probabilities. Word2vec [Mikolov et al., 2013] is the widely used modelthat learns word embeddings from large-scale text corpora.The learned embeddings for verbs and arguments are usedto compute pairwise event relatedness scores. DeepWalk [Perozzi et al., 2014] is the unsupervised modelthat extends the word2vec algorithm to learn embeddingsfor networks.

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence 40,331103,58310,000103,80510,00055Table 1: Statistics of our datasets.MethodsAccuracyRandomPMI [Chambers and Jurafsky, 2008]Bigram [Jans et al., 2012]Word2vec [Mikolov et al., 2013]DeepWalk [Perozzi et al., 2014]EventComp [Granroth-Wilding and Clark, 2016]PairLSTM [Wang et al., 2017]

time. By comparing between context event representations and candidate event representations learned from SGNN, we can choose the correct subsequent event. This paper makes the following two key contributions: We are among the first to propose constructing event graph instead of event pairs

Related Documents:

Event 406 - Windows Server 2019 58 Event 410 58 Event 411 59 Event 412 60 Event 413 60 Event 418 60 Event 420 61 Event 424 61 Event 431 61 Event 512 62 Event 513 62 Event 515 63 Event 516 63 Event 1102 64 Event 1200 64 Event 1201 64 Event 1202 64 Event 1203 64 Event 1204 64

The totality of these behaviors is the graph schema. Drawing a graph schema . The best way to represent a graph schema is, of course, a graph. This is how the graph schema looks for the classic Tinkerpop graph. Figure 2: Example graph schema shown as a property graph . The graph schema is pretty much a property-graph.

Oracle Database Spatial and Graph In-memory parallel graph analytics server (PGX) Load graph into memory for analysis . Command-line submission of graph queries Graph visualization tool APIs to update graph store : Graph Store In-Memory Graph Graph Analytics : Oracle Database Application : Shell, Zeppelin : Viz .

evolutionary biology. From this point of view, some authors have tried to extend the Darwinian theory universally beyond the domain of evolutionary biology (Cf. Dawkins, 1983), using the three principles of evolutionary theory (inheritance or retention, variation, and adaptation) as a heuristic for evolutionary economic theorizing (Campbell, 1965).

a graph database framework. The proposed approach targets two kinds of graph models: - IFC Meta Graph (IMG) based on the IFC EXPRESS schema - IFC Objects Graph (IOG) based on STEP Physical File (SPF) format. Beside the automatic conversation of IFC models into graph database this paper aims to demonstrate the potential of using graph databases .

1.14 About Oracle Graph Server and Client Accessibility 1-57. 2 . Quick Starts for Using Oracle Property Graph. 2.1 Using Sample Data for Graph Analysis 2-1 2.1.1 Importing Data from CSV Files 2-1 2.2 Quick Start: Interactively Analyze Graph Data 2-3 2.2.1 Quick Start: Create and Query a Graph in the Database, Load into Graph Server (PGX) for .

1.14 About Oracle Graph Server and Client Accessibility 1-46. 2 . Quick Starts for Using Oracle Property Graph. 2.1 Using Sample Data for Graph Analysis 2-1 2.1.1 Importing Data from CSV Files 2-1 2.2 Quick Start: Interactively Analyze Graph Data 2-3 2.2.1 Quick Start: Create and Query a Graph in the Database, Load into Graph Server (PGX) for .

Use the English phonemic alphabet page, which you find at the beginning of good dictionaries, as a guide to pronouncing new words. Effective English Learning ELTC self-study materials Tony Lynch and Kenneth Anderson, English Language Teaching Centre, University of Edinburgh 2012 9 3. Don't forget to learn the word stress of a new word. Every English word has its own normal stress pattern. For .