Semi-supervised Open Domain Information Extraction With .

3y ago
30 Views
2 Downloads
2.23 MB
10 Pages
Last View : 13d ago
Last Download : 3m ago
Upload by : Nora Drum
Transcription

Semi-supervised Open Domain Information Extraction with Conditional VAEZhengbao Jiang (zhengbaj) 1 Songwei Ge (songweig) 1Ruohong Zhang (ruohongz) 1 Donghan Yu (dyu2) 1AbstractOpen information extraction (OpenIE) is the taskof extracting open-domain assertions from naturallanguage sentences. However, the lack of annotated data hurts the performance of current modeland made it barely satisfactory. In this paper, weaim to improve the OpenIE model with the helpof the semantic role labeling (SRL) data, whichhas a very similar goal of identifying predicateargument structure from natural language sentences, but with more labeled instances available.We propose a semi-supervised OpenIE model,which jointly optimizes supervised loss and unsupervised loss by treating OpenIE labels as hidden variables to reconstruct observed SRL labels.Conditional variantional autoencoder (CVAE) isused to optimize the lower bound of the data loglikelihood. Different from traditional multitask ortransfer learning, we apply a more direct way toexploit the correlation between OpenIE and SRL.We compare our model with transfer and multitask learning, and the results corroborate that ourframework is able to better utilize such correlationinformation.1. IntroductionOpen information extraction (OpenIE) (Banko et al., 2007a;Fader et al., 2011; Mausam et al., 2012) aims to extractstructured information from unstructured natural language.The target is usually in the form of a n-tuples, consistingof a predicate, and several arguments. OpenIE is beneficial to many downstream tasks, such as question answering,text summarization, and knowledge base construction. Unlike traditional IE where a small set of target relations areprovided in advance, Open information extraction aims atextracting as many potential relations as possible in a textbased on the semantic information. In that way, it facilitatesthe domain-independent discovery of relations extracted1Carnegie Mellon University, Pittsburgh, PA 15213, USA.10708 Class Project, Spring, 2019.Copyright 2019 by the author(s).Figure 1. Illustration of correlation between OpenIE and SRL.from text and scales to large, heterogeneous corpora.However, manually creating training data for this task isvery expansive and time-consuming, because the same relation could be expressed in many ways in the text. Therefore,distant supervision, hand-crafted rules and bootstrapping(Mintz et al., 2009a; Pantel & Pennacchiotti, 2006) are heavily used in this area due to their advantage of only requiringno or a small amount of annotation data. However, thesemethods usually make strong assumptions which yield lowdata quality. Furthermore, some of manually defined rulesand patterns generalize poorly to the different datasets.In this paper, we show another effective way to improveOpenIE with non-annotated datasets by jointly learning withsemantic role labeling (SRL). Both OpenIE and SRL can beformulated as a sequence tagging problem. In addition, SRLcontains very similar output to OpenIE as shown in Figure 1.More importantly, the labels of SRL are relatively easy toobtain. To better exploit such correlation information between the outputs of two tasks, we design a semi-supervisedlearning framework based on the conditional variational autoencoder. We corroborate that our model can take betteradvantage of the SRL information than traditional multitasklearning and transfer learning on this task. Our contributionsare listed as follows: We propose a semi-supervised OpenIE model, whichjointly optimizes supervised loss and unsupervised lossby treating OpenIE labels as hidden variables to reconstruct observed SRL labels. We propose using conditional variational autoencoder

Semi-supervised Open Domain Information Extraction with Conditional VAEto optimize the lower bound of the data log-likelihoodand serveral parameter sharing techniques to enablebetter representation learning and stablize the training. Experiments on benchmark dataset OIE2016 show thatour model performs the best, comparing to other transfer learning and multitask learning models.2. Background2.1. Relation ExtractionWith the explosion of data on the internet and the need toextract useful and sophisticated information from the data,the technology of Information Extraction (IE) and Information Retrieval (IR) has become more and more popularin NLP researches. In particular, relation extraction is animportant task in IE, where raw texts are taken as input data,and the relations between entities are identified from thosesentences in an automatic manner.Since the labeled data is expensive to produce and thuslimited in quantity, the supervised relation extraction suffersfrom a lot of problems. Various unsupervised and semisupervised solutions are proposed.In (Shinyama & Sekine, 2006), a pure unsupervised learningalgorithm was used to extract relations from text documents.The algorithm employed a clustering algorithm for documents in which similar entities names appear, and then use"basic patterns" to group entities that play the same roletogether. In this way, the group entities entail the samerelation and all the relations are extracted automatically.(Banko et al., 2007b) utilizes minimal data to extract relations from large corpus. In their TextRunner architecture, aself-supervised learner was trained a small corpus of samples to distinguish whether a relation tuple is trustworthy ornot. Then, a single-pass extractor uses the learned classifiesto extract potential relations from large corpus. Relationswere obtained based on a redundancy-based assessor whichassigns confidence level to each potential relation.Alternatively, (Mintz et al., 2009b) uses the relation information in Freebase to provide distant supervision for relationextraction to avoid the lack of labeled data. The distantsupervision method assumes that if two entities participatein a relation, any sentences which contains both of the entities are likely to express that relation. In the system by(Mintz et al., 2009b), the relation entities were extractedfrom Freebase and then matched to Wikipedia sentences. Ifa sentence contains a pair of entities in a relation, the systemwill extract features from that sentence. For example, (Virginia, Richmond) are both present in Richmond, the capitalof Virginia, then features from the sentence are extract as apositive example for location-contains relation. Since anysentence can give incorrect relation, a negative samplingtechnique is used to train a multi-class logistic regression.2.2. Semantic Role LabelingIn Natural Language Processing, Semantic Role Labeling isthe process that assigns labels to words or phrases, in orderto discover the predicate-argument structure of a sentence,such as "Who did what to whom", "when" and "where".The early approaches of SRL utilizes the full syntactic tree,and the task has been usually divided into two phase procedures consisting of recognition and labeling of arguments.Various models had been applied to this two procedureSRL task, such as probabilistic models(Gildea & Jurafsky,2002), Max Entroy(Fleischman et al., 2003), generativemodels(Thompson et al., 2003), etc.Later approaches of the SRL systems(Carreras & Màrquez,2005) try to reduce the dependencies on syntactic parsingand use only the paritial syntactic information. This avoidsthe use of full parser and external lexico-semantic knowlegebasis. Most of the systems are based on a SVM taggingsystem, using IOB decisions on the chunk of sentences,and exploring a various choices of partial syntactic features,such as local information on contexts of words, internalstructure of candidate argument, properties of target verbpredicates, or the relation between the verb predicate andthe constituent under consideration.Recently, (He et al., 2017b) proposed a method using deeplearning models to tag sentences with SRL labels withoutany syntactic parsing, which is considered as a prerequisitefor all the previous works. Their model used 8 layers of BiLSTM with highway connection, orthogonal initializationand locked-dropout. They also used BIO, SRL and syntaticconstrain decoding to improve the quality of the final tagging. The deep learning model improved the F1 accuracyand is found to be excel at long-range depencies comparedwith previous syntactic labeling methods. In our project,we will use a very similar LSTM based encoder-decoderarchitecture for tagging the SRL data, but we will extendthe base model to fit in a semi-supervised learning scheme.2.3. Semi-supervised learningSemi-supervised learning can be effective when labeled datais limited or hard to obtain while the number of unlabeleddata is much richer. With the recent advance of deep learning, modeling the distribution of unlabeled data at scaleusing neural based generative model is getting essential.Variational Auto-Encoder (VAE) (Kingma & Welling, 2013)is very successful in modeling data distribution and datageneration. However, vanilla VAE can not generate databased on given context. Thus, Conditional VAE (Sohn et al.,2015) was proposed to solve this problem, where the inputobservations modulate the prior on Gaussian latent variablesthat generate the outputs. That is, for given observation x, zdrawn from the prior distribution pθ (z x), and the output y

Semi-supervised Open Domain Information Extraction with Conditional G1I-ARG1Figure 2. OpenIE as a sequence tagging problem.SeatcurrentlyB-ARG1 B-ARGM-TMParequotedat 361,000bid.OB-VB-ARG2I-ARG21I-ARG2I-ARG2OFigure 3. SRL as a sequence tagging problem.is generated from the distribution pθ (y x, z).3.1. Problem formulationIn our setting, the unlabeled data can also have labels ofanother task different from OpenIE, such as SRL. Thenwe have two kinds of dataset which are target datasethXt , Yt i and auxiliary dataset hXa , Ya i respectively. Transfer Learning (TL) (Pan & Yang, 2010) or Multi-Task Learning (MTL) (Caruana, 1997) can be used to solve this kindof problem. In TL, we usually train a model using auxiliarydataset and replace certain layers with new layers adoptedto the target task, then only utilize target dataset to trainthe new layers while keep the parameters of other layersfixed. MTL jointly train different models for different tasks,and share some parameters or latent feature to constrainthe model. MTL is popular in NLP (Collobert et al., 2011;Zhang & Weiss, 2016; Swayamdipta et al., 2017; Strubellet al., 2018; Yang et al., 2018). For example, LISA (Strubellet al., 2018) combines multi-head self-attention with multitask learning across dependency parsing, part-of-speechtagging, predicate detection and SRL. However, the different tasks share the same X in their setting. while in ourpaper, there’s much less overlapping between SRL datasetand OpenIE dataset. What’s more, our model is more expressive due to the probabilistic latent variable while LISAis totally deterministic.Two data sources are available in our task: a small datasetwith OpenIE annotations hXoie , Yoie i, and a large datasetwith SRL annotations hXsrl , Ysrl i. Xoie and Xsrl are two setsof sentences with minimal or no overlap. In our case, amongthese two datasets, there is a small amount of parallel data,i.e. sentences with both SRL annotations and OpenIE annotations. Each sentence X contains a sequence of words{w1 , w2 , . . . , wn }. For notation brevity, we omit the indexand just use X to denote a sentence either from the OpenIEdataset or the SRL dataset. Yoie contains the correspondingOIE labels for each sentence in Xoie , and Ysrl contains thecorresponding SRL labels for each sentence in Xsrl . Although the ultimate goal of OpenIE and SRL is to extractpredicate-argument structure, we can formulate both problems as a sequence tagging problem (Stanovsky et al., 2018;Jia et al., 2018; He et al., 2017a).3. MethodsIn this section, we explain our methods to train a semisupervised OpenIE model with the help of additional SRLdata. From the previous section, OpenIE and SRL tasksshare a lot things in common, i.e they share a similar tagspace, and they have correspondences among the differenttags. To improve the performance of OpenIE model by utilizing large SRL datasets, we treat the OpenIE tag sequencesas hidden variables, and decode the SRL labels based on thathidden representations. Specifically, we use a conditionalvariational autoencoder (CVAE) as the OpenIE model. Inthe following parts, we first formulate the semi-supervisedproblem; then introduce our proposed models; in the end,we discuss the model implementation in practice.Specifically, given a sentence X (w1 , w2 , ., wn ), thegoal of OpenIE and SRL is to extract n-tuples r (p, a1 , a2 , ., am ), composed of a single predicate p andseveral “arguments”. We assume all components in r arecontiguous spans of words and there is no overlap betweenthem. The major different between OpenIE and RL is in thedefinition of “arguments”. In OpenIE, arguments are justcomponents in the sentences that are related to the predicate.For example, in Figure 2, we have two arguments: ARG0that specifies the subject of the predicate likes and ARG1that specifies the object of the predicate. In SRL, the case becomes a little complex. A predefined set of roles are used toexplicitly represent the relation between each argument andthe predicate. A SRL example is shown in Figure 3, whereARGM-TMP is a role indicating the temporal information ofthe predicate. As a result, we can interpret SRL as a morefine-grained predicate-argument structure identification task.However, it’s worth to mention that there is no trivial mapping between two tag spaces. Instead, the correspondenceusually also depends on the other factors such as semanticinformation and the context of the sequence.Within this framework, a tuple r can be mapped to a

Semi-supervised Open Domain Information Extraction with Conditional VAEunique BIO (Stanovsky et al., 2018; He et al., 2017a) label sequence Y (y1 , y2 , ., yn ) by assigning O to thewords not contained in r, B-V/I-V to the words in p, andB-ARGi/I-ARGi or other roles to the words in ai respectively, depending on whether the word is at the beginningor inside of the span. We use Yoie to denote a OIE label sequence and Ysrl to denote a SRL label sequence. Note thatthe OpenIE and SRL datasets have different tag spaces, i.e.,{yi yi Yoie , Yoie Yoie } 6 {yi yi Ysrl , Ysrl Ysrl }.Given a sentence X, the ultimate goal is to improve the OpenIE model p(Yoie X) using both OIE dataset hXoie , Yoie iand SRL dataset hXsrl , Ysrl i.3.2. Semi-supervised learning with conditional VAEGiven a sentence, we want to predict OpenIE tag sequenceusing pθ (Yoie X), where θ represents the parameters of themodel. Under supervised learning setting, one can directlyoptimize this model on the OpenIE dataset hXoie , Yoie i. Thiscan be achieved by minimizing the negative log-likelihoodusing the ground truth OpenIE tags:Lsup log pθ (Yoie X).However, this dataset is very limited. As a result, the modelcan easily overfit this dataset and has poor generalizationability to the other practical datasets. Therefore, we proposeto combine this supervised learning objective function withanother unsupervised learning objective function from theSRL dataset. Considering the fact that the SRL task is verysimilar to the OpenIE task with respect to the resulting tagsequence, we can explicitly leverage SRL annotations to provide supervisions for OpenIE task, which can be achievedby a conditional variantional antuencoder (CVAE) model.Generative Story In unsupervised learning, given a inputsentence X, we treat the OpenIE tag sequences Yoie ashidden variables, which are then used to reconstruct theSRL labels Ysrl . The basic rationale behind this is that onlythe proper OpenIE tag sequences are useful to reconstructthe SRL tag sequences due to the correspondence betweenthem. The plate notation of our graphical model is shown inFigure 4. The generative model is:Xp(Ysrl X) pθ (Yoie X)pω (Ysrl X, Yoie ),(1)Y oiewhere θ is the parameter of the OpenIE model and ω isthe parameter of the reconstruction model (i.e., decoder),which predicts SRL tags conditioned on both sentence Xand OpenIE tags Yoie .Learning with conditional VAE Due to the large spaceof the hidden variables Yoie , it is intractable to exactly compute the marginal distribution in Equation 3.2. To mitigateX𝜑𝜔𝑌 𝑜𝑖𝑒𝑌 𝑠𝑟𝑙𝑁𝜃Figure 4. The plate notation of our conditional VAE. The solidlines represent prior model pθ (Yoie X) and reconstruction model(decoder) pω (Ysrl Yoie , X) respectively. And the dashed line represents the variational approximation (encoder) qφ (Yoie Ysrl , X)to the intractable posterior distribution.this problem, we introduce a variational posterior distribution, i.e., the encoder qφ (Yoie Ysrl , X), to approximate thetrue posterior distribution.Instead of directly maximizing the marginal distributionwhich is intractable, we maximize the evidence lower boundobjective (ELBO). After sampling some OpenIE tag sequences from the distribution implied by the encoder, thedecoder aims to reconstruct the SRL tag seqeuces based onboth the sentence and these OIE tag samples. In fact, onlyusing the OpenIE tags may not be sufficient to reconstructSRL tags because SRL contains more information than OIE.The unsupervised loss defined as the negative ELBO is:ELBO EYoie qφ [log pω (Ysrl Yoie , X)]oiesrloie KL[qφ (Y Y , X) pθ (Y X)],(2)(3)which includes three components: encoder (posterior model): qφ (Yoie Ysrl , X), whichapproximate the real posterior distribution. decoder (reconstruction model): pω (Ysrl Yoie , X),which reconstructs SRL tags conditioned on both thesentence and the OpenIE tags. prior (OpenIE model): pθ (Yoie X), which is our targemodel that we are eventually interested in.Based on our assumption that only the correctly predictedOpenIE tag sequences are useful to reconstruct the SRLtag sequences due to the correspondence, maximizing thereconstruction loss in Equation 3.2 allows the model to learna better posterior distribution of the OpenIE tag sequences.Consequently, the posterior model is expected to be morepowerful than the prior model due to the extra guidanceprovided by the SRL labels. Simultaneously, by minimizingthe KL distance between the posterior and prior distribution,the prior model is optimized to follow the steps led by the

Semi-supervised Open Domain Information Extraction with Conditional VAE𝑌 𝑠𝑟𝑙ReconstructionLossBi LSTM𝑝𝜔 (𝑌 𝑠𝑟𝑙 𝑌 𝑜𝑖𝑒 ) 4 x Bi-LSTM2 x Bi LSTM𝑌 𝑜𝑖𝑒KL Distance𝑌 𝑜𝑖𝑒𝑋 𝑠𝑟𝑙Bi LSTMBi-LSTM𝑞𝜑 𝑌 𝑜𝑖𝑒 𝑌 𝑠𝑟𝑙 , 𝑋 𝑝𝜃 𝑌 𝑜𝑖𝑒 𝑋4 x Bi-LSTM4 x Bi-LSTM𝑋 𝑠𝑟𝑙2 x Bi LSTM𝑌 𝑠𝑟𝑙Figure 5. Illustration of our conditional VAE model. The solidlines represent the forward direction and the double line representsthe sampling operation. The dashed lines represent the loss computation. The parameters in the red blocks are shared across differentmodules while the parameters in the blue block are model-specific.posterior distribution. In addition, we don’t want the priordistribution to move too far from the original prediction, sowe also minimize the supervised loss in the meanwhile. TheKL distance also constrains the solution space searched bythe posterior model to be valid OpenIE tag sequence space.Semi-supervised Learninglearning loss function is:The overall semi-supervisedL Lsup λ · Lunsup Lsup λ · ELBO,where we use λ to control the tradeoffs between supervisedloss and unsupervised loss. During the training, the modelparameters θ, φ, and ω are optimized jointly.3.3. Model implementationIn this section, we will elaborate more about the implementation of our model in the setting of using neural networks. The framework of our semi-CVAE modelfor semi-supervised learning is illustrated in the Figure 5. As stated above, there are three components:encoder qφ (Yoie Ysrl , X), decoder pω (Ysrl Yoie , X), andprior model pθ (Yoie X). Since all of t

semantic role labeling (SRL). Both OpenIE and SRL can be formulated as a sequence tagging problem. In addition, SRL contains very similar output to OpenIE as shown inFigure 1. More importantly, the labels of SRL are relatively easy to obtain. To better exploit such correlation information be-tween the outputs of two tasks, we design a semi .

Related Documents:

Domain Cheat sheet Domain 1: Security and Risk Management Domain 2: Asset Security Domain 3: Security Architecture and Engineering Domain 4: Communication and Network Security Domain 5: Identity and Access Management (IAM) Domain 6: Security Assessment and Testing Domain 7: Security Operations Domain 8: Software Development Security About the exam:

Semi-supervised learning algorithms reduce the high cost of acquiring labeled training data by using both la-beled and unlabeled data during learning. Deep Convo-lutional Networks (DCNs) have achieved great success in supervised tasks and as such have been widely employed in the semi-supervised learning. In this paper we lever-

Variational Auto-Encoder (VAE), in particu- . However, manual labeling of the large dataset is very time- and labor-consuming. Sometimes, it even . SSVAE [45] extends Semi-VAE for sequence data and also demonstrates its effectiveness in the semi-supervised learning on the text data. The aforementioned semi-supervised VAE all use a .

Semi-Supervised Learning. MIT Press, Cambridge, 2006. X. Zhu. Semi-supervised learning literature survey. TR-1530. University of Wisconsin-Madison Department of Computer Science, 2005. M. Seeger. Learning with labeled and unlabeled data. Technical Report. University of Edinburgh, 2001. Semi-s

Active learning literature survey. 50th International Conference on Parallel Processing (ICPP) 4 August 9-12, 2021 in Virtual Chicago, IL . Semi-supervised Learning The key to semi-supervised learning is how to use a large amount of . The existing semi-

COUNTY Archery Season Firearms Season Muzzleloader Season Lands Open Sept. 13 Sept.20 Sept. 27 Oct. 4 Oct. 11 Oct. 18 Oct. 25 Nov. 1 Nov. 8 Nov. 15 Nov. 22 Jan. 3 Jan. 10 Jan. 17 Jan. 24 Nov. 15 (jJr. Hunt) Nov. 29 Dec. 6 Jan. 10 Dec. 20 Dec. 27 ALLEGANY Open Open Open Open Open Open Open Open Open Open Open Open Open Open Open Open Open Open .

In contrast with supervised learning algorithms, SSL algorithms can improve their performance by leveraging information in unlabeled data. Some recent results (Laine and Aila,2017;Miyato et al., 2019;Tarvainen and Valpola,2017) have shown that semi-supervised learning could reach perfor-mance of purely supervised learning in certain sce-narios.

Semi-supervised Domain Adaptation via Minimax Entropy Kuniaki Saito1, Donghyun Kim1, Stan Sclaroff1, Trevor Darrell2 and Kate Saenko1 1Boston University, 2University of California, Berkeley 1{keisaito, donhk, sclaroff, saenko}@bu.edu, 2trevor@eecs.berkeley.edu Abstract Contemporary domain adaptation methods are very ef-fective at aligning feature distributions of source and tar-