A Dependency-Based Neural Network For Relation

2y ago
34 Views
2 Downloads
210.29 KB
6 Pages
Last View : 15d ago
Last Download : 3m ago
Upload by : Aydin Oneil
Transcription

A Dependency-Based Neural Network for Relation ClassificationYang Liu1,2 Furu Wei3 Sujian Li1,2 Heng Ji4 Ming Zhou3 Houfeng Wang1,21Key Laboratory of Computational Linguistics, Peking University, MOE, China2Collaborative Innovation Center for Language Ability, Xuzhou, Jiangsu, China3Microsoft Research, Beijing, China4Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, USA{cs-ly, lisujian, wanghf}@pku.edu.cn{furu, mingzhou}@microsoft.com jih@rpi.eduAbstractin different ways. Culotta and Sorensen (2004)designed a dependency tree kernel and attachedmore information including Part-of-Speech tag,chunking tag of each node in the tree. Interestingly, Bunescu and Mooney (2005) provided an important insight that the shortest path between twoentities in a dependency graph concentrates mostof the information for identifying the relation between them. Nguyen et al. (2007) developed theseideas by analyzing multiple subtrees with the guidance of pre-extracted keywords. Previous workshowed that the most useful dependency information in relation classification includes the shortestdependency path and dependency subtrees. Thesetwo kinds of information serve different functionsand their collaboration can boost the performanceof relation classification (see Section 2 for detailedexamples). However, how to uniformly and efficiently combine these two components is stillan open problem. In this paper, we propose anovel structure named Augmented DependencyPath (ADP) which attaches dependency subtreesto words on a shortest dependency path and focuson exploring the semantic representation behindthe ADP structure.Previous research on relation classificationhas verified the effectiveness of using dependency shortest paths or subtrees. Inthis paper, we further explore how to makefull use of the combination of these dependency information. We first proposea new structure, termed augmented dependency path (ADP), which is composedof the shortest dependency path betweentwo entities and the subtrees attached tothe shortest path. To exploit the semanticrepresentation behind the ADP structure,we develop dependency-based neural networks (DepNN): a recursive neural network designed to model the subtrees, anda convolutional neural network to capturethe most important features on the shortestpath. Experiments on the SemEval-2010dataset show that our proposed methodachieves state-of-art results.1IntroductionRelation classification aims to classify the semantic relations between two entities in a sentence. Itplays a vital role in robust knowledge extractionfrom unstructured texts and serves as an intermediate step in a variety of natural language processing applications. Most existing approaches followthe machine learning based framework and focuson designing effective features to obtain betterclassification performance.The effectiveness of using dependency relations between entities for relation classification hasbeen reported in previous approaches (Bach andBadaskar, 2007). For example, Suchanek et al.(2006) carefully selected a set of features fromtokenization and dependency parsing, and extended some of them to generate high order features Recently, deep learning techniques have beenwidely used in exploring semantic representations behind complex structures. This provides usan opportunity to model the ADP structure in aneural network framework. Thus, we propose adependency-based framework where two neuralnetworks are used to model shortest dependencypaths and dependency subtrees separately. Oneconvolutional neural network (CNN) is appliedover the shortest dependency path, because CNNis suitable for capturing the most useful features ina flat structure. A recursive neural network (RNN) is used for extracting semantic representationsfrom the dependency subtrees, since RNN is goodat modeling hierarchical structures. To connectthese two networks, each word on the shortestContribution during internship at Microsoft Research.285Proceedings of the 53rd Annual Meeting of the Association for Computational Linguisticsand the 7th International Joint Conference on Natural Language Processing (Short Papers), pages 285–290,Beijing, China, July 26-31, 2015. c 2015 Association for Computational Linguistics

nsubjdetrcmodprep-withdobjxcompdobjS1: A thief who tried to steal the truck broke the ignition with ubjdobjS2: On the Sabbath the priests broke the commandment with priestly work.amoddetFigure 1: Sentences and their dependency trees.thiefnsubjbroke prep-with screwdriverdetdobjAignitiondetthe(a) Augmented dependency path in S1 .priests nsubj broke prep-with workdetdobj prep-onthe commandment Sabbathdetdetthetheamodpriestly3(b) Augmented dependency path in S2 .Dependency-Based Neural NetworksIn this section, we will introduce how we use neural network techniques and dependency information to explore the semantic connection betweentwo entities. We dub our architecture of modeling ADP structures as dependency-based neuralnetworks (DepNN). Figure 3 illustrates DepNNwith a concrete example. First, we associate eachword w and dependency relation r with a vectorrepresentation xw , xr Rdim . For each wordw on the shortest dependency path, we developan RNN from its leaf words up to the root togenerate a subtree embedding cw and concatenatecw with xw to serve as the final representation ofw. Next, a CNN is designed to model the shortestdependency path based on the representation ofits words and relations. Finally our frameworkcan efficiently represent the semantic connectionbetween two entities with consideration of morecomprehensive dependency information.Figure 2: Augmented dependency paths.path is combined with a representation generatedfrom its subtree, strengthening the semantic representation of the shortest path. In this way, theaugmented dependency path is represented as acontinuous semantic vector which can be furtherused for relation classification.2can distinguish these two paths by virtue of theattached subtrees such as “dobj commandment”and “dobj ignition”. Based on many observations like this, we propose the idea that combinesthe subtrees and the shortest path to form a moreprecise structure for classifying relations. Thiscombined structure is called “augmented dependency path (ADP)”, as illustrated in Figure 2.Next, our goal is to capture the semantic representation of the ADP structure between two entities. We first adopt a recursive neural network tomodel each word according to its attached dependency subtree. Based on the semantic informationof each word, we design a convolutional neuralnetwork to obtain salient semantic features on theshortest dependency path.Problem Definition and MotivationThe task of relation classification can be definedas follows. Given a sentence S with a pair ofentities e1 and e2 annotated, the task is to identifythe semantic relation between e1 and e2 in accordance with a set of predefined relation classes(e.g., Content-Container, Cause-Effect). For example, in Figure 2, the relation between two entities e1 thief and e2 screwdriver is InstrumentAgency.Bunescu and Mooney (2005) first used shortest dependency paths between two entities tocapture the predicate-argument sequences (e.g.,“thief broke screwdriver” in Figure 2), whichprovide strong evidence for relation classification.As we observe, the shortest paths contain moreinformation and the subtrees attached to each nodeon the shortest path are not exploited enough. Forexample, Figure 2a and 2b show two instanceswhich have similar shortest dependency paths butbelong to different relation classes. Methods onlyusing the path will fail in this case. However, we3.1Modeling Dependency SubtreeThe goal of modeling dependency subtrees is tofind an appropriate representation for the words onthe shortest path. We assume that each word wcan be interpreted by itself and its children on thedependency subtree. Then, for each word w on thesubtree, its word embedding xw Rdim and subtree representation cw Rdimc are concatenatedto form its final representation pw Rdim dimc .For a word that does not have a subtree, we setits subtree representation as cLEAF . The subtreerepresentation of a word is derived through transforming the representations of its children words.286

sequence, we set the window size as k. Forexample, when k 3, the sliding windows ofa shortest dependency path with n words are:{[rs w1 r1 ], [r1 w2 r2 ], . . . , [rn 1 wn re ]} wherers and re are used to denote the beginning andend of a shortest dependency path between twoentities.We concatenate k neighboring words (or dependency relations) representations into a newvector. Assume Xi Rdim·k dimc ·nw as theconcatenated representation of the i-th window,where nw is the number of words in one window.A convolution operation involves a filter W1 Rl (dim·k dimc ·nw ) , which operates on Xi to produce a new feature vector Li with l dimensions,Max Over TimeConvolutionalNeural NetworkW1priestsnsubjprep beddingsWdettheWdobj Wprep-oncommandamentRecursiveNeural gWdettheWamodWdettheFigure 3: Illustration of Dependency-based NeuralNetworks.Li W1 Xiwhere the bias term is ignored for simplicity.Then W1 is applied to each possible windowin the shortest dependency path to produce afeature map: [L0 , L1 , L2 , · · · ]. Next, we adopt the widely-used max-over-time pooling operation (Collobert et al., 2011), which can retainthe most important features, to obtain the finalrepresentation L from the feature map. That is,L max(L0 , L1 , L2 , . . . ).During the bottom-up construction of the subtree,each word is associated with a dependency relation such as dobj as in Figure 3. For each dependency relation r, we set a transformation matrixWr Rdimc (dim dimc ) which is learned duringtraining. Then we can get,Xcw f (WR(w,q) · pq b) (1)q Children(w)pq [xq , cq ](2)3.3LearningLike other relation classification systems, we also incorporate some lexical level features suchas named entity tags and WordNet hypernyms,which prove useful to this task. We concatenatethem with the ADP representation L to producea combined vector M . We then pass M to afully connected sof tmax layer whose output isthe probability distribution y over relation labels.where R(w,q) denotes the dependency relation between word w and its child word q. This processcontinues recursively up to the root word on theshortest path.3.2(3)Modeling Shortest Dependency PathTo classify the semantic relation between two entities, we further explore the semantic representation behind their shortest dependency path, whichcan be seen as a sequence of words and dependency relations as the bold-font part in Figure 2. Asthe convolutional neural network (CNN) is goodat capturing the salient features from a sequenceof objects, we design a CNN to tackle the shortestdependency path.A CNN contains a convolution operation overa window of object representations, followed bya pooling operation. As we know, a word won the shortest path is associated with the representation pw through modeling the subtree. Fora dependency relation r on the shortest path,we set its representation as a vector xr Rdim . As a sliding window is applied on theM [L, LEX](4)y sof tmax(W2 M b2 )(5)Then, the optimization objective is to minimizethe cross-entropy error between the ground-truthlabel vector and the sof tmax output.Parameters are learned using the back-propagationmethod (Rumelhart et al., 1988).4ExperimentsWe compare DepNN against multiple baselines onSemEval-2010 dataset (Hendrickx et al., 2010).The training set includes 8000 sentences, andthe test set includes 2717 sentences. There are 9287

Modelrelation types, and each type has two directions.Instances which don’t fall in any of these classesare labeled as Other. The official evaluation metricis the macro-averaged F1-score (excluding Other)and the direction is considered. We use dependency trees generated by the Stanford Parser (Kleinand Manning, 2003) with the collapsed option.4.1SVMMV-RNNCNNFCMDT-RNNDepNNContributions of different componentsbaseline (Path words) Depedency relations Attached subtrees Lexical 6We start with a baseline model using a CNNwith only the words on the shortest path. We thenadd dependency relations and attached subtrees.The results indicate that both parts are effectivefor relation classification. The rich linguistic information embedded in the dependency relationsand subtrees can on one hand, help distinguish different functions of the same word, and on the otherhand infer an unseen word’s role in the sentence.Finally, the lexical features are added and DepNNachieves state-of-the-art results.Comparison with BaselinesIn this subsection, we compare DepNN with several baseline relation classification approaches.Here, DepNN and the baselines are all based onthe 200-d embeddings trained on Gigaword due tothe larger corpus and higher dimensions.SVM (Rink and Harabagiu, 2010): This is thetop performed system in SemEval-2010. It utilizesmany external corpora to extract features from thesentence to build an SVM classifier.182.281.8282.783.073.183.083.6MV-RNN (Socher et al., 2012): This modelfinds the path between the two entities in the constituent parse tree and then learns the distributedrepresentation of its highest node with a matrix foreach word to make the compositions specific.CNN: Zeng et al. (2014) build a convolutionalmodel over the tokens of a sentence to learn thesentence level feature vector. It uses a specialposition vector that indicates the relative distancesof current input word to two marked entities.FCM (Yu et al., 2014): FCM decomposes thesentence into substructures and extracts featuresfor each of them, forming substructure embeddings. These embeddings are combined by sumpooling and input into a sof tmax classifier.DT-RNN (Socher et al., 2014) : This is anRNN for modeling dependency trees. It combinesnode’s word embedding with its children througha linear combination but not a subtree embedding.We adapt the augmented dependency path into adependency subtree and apply DT-RNN.As shown in Table 2, DepNN achieves the bestresult (83.6) using NER features. WordNet features can also improve the performance of DepNN, but not as obvious as NER. Yu et al. (2014)had similar observations, since the larger numberof WordNet tags may cause overfitting. SVMachieves a comparable result, though the qualityof feature engineering highly relies on human experience and external NLP resources. MV-RNNmodels the constituent parse trees with a recursiveprocedure and its F1-score is about 1.8 percentlower than DepNN. Meanwhile, MVR-NN is veryslow to train, since each word is associated with amatrix. Both CNN and FCM use features from thewhole sentence and achieve similar performance.DT-RNN is the worst of all baselines, though itTable 1: Performance of DepNN with differentcomponents.4.2F1Table 2: Results on SemEval-2010 dataset withGigaword embeddings.We first show the contributions from differentcomponents of DepNN. Two different kinds ofword embeddings for initialization are used in theexperiments. One is the 50-d embeddings provided by SENNA (Collobert et al., 2011). Thesecond is the 200-d embeddings used in (Yu etal., 2014), trained on Gigaword with word2vec1 .All the hyperparameters are set with 5-fold crossvalidation.ModelAdditional FeaturesPOS, PropBank, morphologicalWordNet, TextRunner, FrameNetdependency parse, etc.POS, NER, WordNetWordNetNERNERWordNetNER2MV-RNN achieves a higher F1-score (82.7) on SENNAembeddings reported in the original paper.https://code.google.com/p/word2vec/288

also considers the information from shortest dependency paths and attached subtrees. As we analyze, shortest dependency paths and subtrees playdifferent roles in relation classification. However,we can see that DT-RNN does not distinguish themodeling processes of shortest paths and subtrees.This phenomenon is also seen in a kernel-basedmethod (Wang, 2008), where the tree kernel performs worse than the shortest path kernel. We alsolook into the DepNN model and find it can identifydifferent patterns of words and the dependencyrelations. For example, in the Instrument-Agencyrelation, the word “using” and the dependency relation “prep with” are found playing a major role.5Meeting of the Association for Computational Linguistics, pages 423–429.Iris Hendrickx, Zornitsa Kozareva, Preslav Nakov, Sebastian Pad ok, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. SemEval-2010Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals.Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. In Meeting of the Association for Computational Linguistics, pages 423–430.Dat PT Nguyen, Yutaka Matsuo, and Mitsuru Ishizuka.2007. Relation extraction from wikipedia usingsubtree mining. In Proceedings of the National Conference on Artificial Intelligence, volume 22, page1414. Menlo Park, CA; Cambridge, MA; London;AAAI Press; MIT Press; 1999.ConclusionIn this paper, we propose to classify relationsbetween entities by modeling the augmented dependency path in a neural network framework.We present a novel approach, DepNN, to takingadvantages of both convolutional neural networkand recursive neural network to model this structure. Experiment results demonstrate that DepNNachieves state-of-the-art performance.Bryan Rink and Sanda Harabagiu. 2010. Utd: Classifying semantic relations by combining lexical andsemantic resources. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages256–259, Uppsala, Sweden, July. Association forComputational Linguistics.AcknowledgmentsRichard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. InProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing andComputational Natural Language Learning, pages1201–1211, Jeju Island, Korea, July. Association forComputational Linguistics.David E Rumelhart, Geoffrey E Hinton, and Ronald JWilliams. 1988. Learning representations by backpropagating errors. Cognitive modeling, 5.We thank all the anonymous reviewers for theirinsightful comments. This work was partially supported by National Key Basic Research Programof China (No. 2014CB340504), National NaturalScience Foundation of China (No. 61273278 and61370117), and National Social Science Fund ofChina (No: 12&ZD227). The correspondenceauthor of this paper is Sujian Li.Richard Socher, Andrej Karpathy, Quoc V Le, Christopher D Manning, and Andrew Y Ng. 2014. Grounded compositional semantics for finding and describing images with sentences. Transactions of theAssociation for Computational Linguistics, 2:207–218.ReferencesFabian M Suchanek, Georgiana Ifrim, and GerhardWeikum. 2006. Combining linguistic and statisticalanalysis to extract relations from web documents.In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and datamining, pages 712–717. ACM.Nguyen Bach and Sameer Badaskar. 2007. A surveyon relation extraction. Language Technologies Institute, Carnegie Mellon University.Razvan C. Bunescu and Raymond J. Mooney. 2005.A Shortest Path Dependency Kernel for RelationExtraction. In North American Chapter of the Association for Computational Linguistics.Mengqiu Wang. 2008. A re-examination of dependency path kernels for relation extraction. In IJCNLP,pages 841–846.Ronan Collobert, Jason Weston, Léon Bottou, MichaelKarlen, Koray Kavukcuoglu, and Pavel Kuksa.2011. Natural language processing (almost) fromscratch. The Journal of Machine Learning Research, 12:2493–2537.Mo Yu, Matthew Gormley, and Mark Dredze. 2014.Factor-based compositional embedding models. InNIPS Workshop on Learning Semantics.Daojian

3 Dependency-Based Neural Networks In this section, we will introduce how we use neu-ral network techniques and dependency informa-tion to explore the semantic connection between two entities. We dub our architecture of model-ing ADP structures as dependency-based neural networks (De

Related Documents:

20 Chemical Dependency Professional Chemical Dependency Professional Certificate 101YA0400X - Chemical Dependency Professional (CDP) 21 Chemical Dependency Professional Trainee Chemical Dependency Professional Trainee Certificate 101Y99995L - MH & CDPT in training; crosswal

neural networks and substantial trials of experiments to design e ective neural network structures. Thus we believe that the design of neural network structure needs a uni ed guidance. This paper serves as a preliminary trial towards this goal. 1.1. Related Work There has been extensive work on the neural network structure design. Generic algorithm (Scha er et al.,1992;Lam et al.,2003) based .

Different neural network structures can be constructed by using different types of neurons and by connecting them differently. B. Concept of a Neural Network Model Let n and m represent the number of input and output neurons of a neural network. Let x be an n-vector containing the external inputs to the neural network, y be an m-vector

An artificial neuron network (ANN) is a computational model based on the structure and functions of biological neural net-works. Information that flows through the network affects the structure of the ANN because a neural network changes - or learns, in a sense - based on that input and output. Pre pro-cessing Fig. 2 Neural network

A growing success of Artificial Neural Networks in the research field of Autonomous Driving, such as the ALVINN (Autonomous Land Vehicle in a Neural . From CMU, the ALVINN [6] (autonomous land vehicle in a neural . fluidity of neural networks permits 3.2.a portion of the neural network to be transplanted through Transfer Learning [12], and .

Neural Network Programming with Java Unleash the power of neural networks by implementing professional Java code Fábio M. Soares Alan M.F. Souza BIRMINGHAM - MUMBAI . Building a neural network for weather prediction 109 Empirical design of neural networks 112 Choosing training and test datasets 112

neural networks. Figure 1 Neural Network as Function Approximator In the next section we will present the multilayer perceptron neural network, and will demonstrate how it can be used as a function approximator. 2. Multilayer Perceptron Architecture 2.1 Neuron Model The multilayer perceptron neural network is built up of simple components.

lic perceptions of the criminal courts by focusing on a few basic topics. We begin by discussing where the courts fit in the criminal justice system and how the public perceives the courts. Next, attention shifts to the three activities that set the stage for the rest of the book: Finding the courthouse Identifying the actors Following the steps of the process As we will see .