Spectral Learning Algorithms For Natural Language Processing

1y ago

9 Views

2 Downloads

2.06 MB

174 Pages

Last View : 13d ago

Last Download : 3m ago

Upload by : Isobel Thacker

Report this link

Download PDF

Transcription

Spectral Learning Algorithms for NaturalLanguage ProcessingShay Cohen1 , Michael Collins1 , Dean Foster2 , Karl Stratos1and Lyle Ungar21 Columbia2 UniversityUniversityof PennsylvaniaJune 9, 2013Spectral Learning for NLP1

Latent-variable ModelsLatent-variable models are used in many areas of NLP, speech, etc.:ILatent-variable PCFGs (Matsuzaki et al; Petrov et al)IHidden Markov ModelsINaive Bayes for clusteringILexical representations: Brown clustering, Saul and Pereira,etc.IAlignments in statistical machine translationITopic modelingIetc. etc.The Expectation-maximization (EM) algorithm is generally usedfor estimation in these models (Depmster et al., 1977)Other relevant algorithms: cotraining, clustering methodsSpectral Learning for NLP2

Example 1: Latent-Variable PCFGs(Matsuzaki, 2005; Petrov et al.,2006)S1SNPVP NP3VP2DNVPD1N2V4P1thedogsawhimthedogsawhimSpectral Learning for NLP3

Example 2: Hidden Markov ModelsS1S2S3S4thedogsawhimParameterized by π(s), t(s s0 ) and o(w s)EM is used for learning the parametersSpectral Learning for NLP4

Example 3: Naı̈ve BayesHXYp(h, x, y) p(h) p(x h) p(y h)I(the, dog)(I, saw)(ran, to)(John, was).EM can be used to estimate parametersSpectral Learning for NLP5

Example 4: Brown Clustering and Related ModelsC(w1 )C(w2 )w1w2p(w2 w1 ) p(C(w2 ) C(w1 )) p(w2 C(w2 ))(Brown et al., 1992)hp(w2 w1 ) Spectral Learning for NLPw1Pw2h p(h w1 ) p(w2 h)(Saul and Pereira, 1997)6

Example 5: IBM Translation ModelsnullPor favorPlease,, desearia reservar una habitacion .Iwould liketobookaroom.Hidden variables are alignmentsEM used to estimate parametersSpectral Learning for NLP7

Example 6: HMMs for SpeechPhoneme boundaries are hidden variablesSpectral Learning for NLP8

Co-training (Blum and Mitchell, 1998)Examples come in pairsEach view is assumed to be sufficient for classificationE.g. Collins and Singer (1999):. . . , says Mr. Cooper, a vice president of . . .IView 1. Spelling features: “Mr.”, “Cooper”IView 2. Contextual features: appositive presidentSpectral Learning for NLP9

Spectral MethodsBasic idea: replace EM (or co-training) with methods based onmatrix decompositions, in particular singular value decomposition(SVD)SVD: given matrix A with m rows, n columns, approximate asAjk dXσh Ujh Vjhh 1where σh are “singular values”U and V are m d and n d matricesRemarkably, can find the optimal rank-d approximation efficientlySpectral Learning for NLP10

Similarity of SVD to Naı̈ve BayesHXP (X x, Y y) YdXp(h)p(x h)p(y h)h 1Ajk dXσh Ujh Vjhh 1SVD approximation minimizes squared loss, not log-lossσh not interpretable as probabilitiesI Ujh , Vjh may be positive or negative, not probabilitiesBUT we can still do a lot with SVD (and higher-order,tensor-based decompositions)IISpectral Learning for NLP11

CCA vs. Co-trainingICo-training assumption: 2 views, each sufficient forclassificationISeveral heuristic algorithms developed for this settingICanonical correlation analysis:IIIIIITake paired examples x(i),1 , x(i),2Transform to z (i),1 , z (i),2z’s are linear projections of the x’sProjections are chosen to maximize correlation between z 1 andz2Solvable using SVD!Strong guarantees in several settingsSpectral Learning for NLP12

One Example of CCA: Lexical RepresentationsIx Rd is a worddog (0, 0, . . . , 0, 1, 0, . . . , 0, 0) R200,000I0y Rd is its context informationdog-context (11, 0, . . . 0, 917, 3, 0, . . . 0) R400,000IUse CCA on x and y to derive x Rkdog (0.03, 1.2, . . . 1.5) R100Spectral Learning for NLP13

Spectral Learning of HMMs and L-PCFGsSimple algorithms: require SVD, then method of moments inlow-dimensional spaceClose connection to CCAGuaranteed to learn (unlike EM) under assumptions on singularvalues in the SVDSpectral Learning for NLP14

Spectral Methods in NLPIBalle, Quattoni, Carreras, ECML 2011 (learning of finite-statetransducers)ILuque, Quattoni, Balle, Carreras, EACL 2012 (dependencyparsing)IDhillon et al, 2012 (dependency parsing)ICohen et al 2012, 2013 (latent-variable PCFGs)Spectral Learning for NLP15

OverviewBasic conceptsLinear Algebra RefresherSingular Value DecompositionCanonical Correlation Analysis: AlgorithmCanonical Correlation Analysis: JustificationLexical representationsHidden Markov modelsLatent-variable PCFGsConclusionSpectral Learning for NLP16

MatricesA Rm n m n “matrix of dimensions m by n”Spectral Learning for NLP 3 1 4A 0 2 5 A R2 317

Vectorsu Rn n “vector of dimension n”Spectral Learning for NLP 0u 2 1u R318

Matrix TransposeIA Rn m is the transpose of A Rm nA Spectral Learning for NLP 3 1 40 2 5 3 0A 1 2 4 519

Matrix MultiplicationMatrices B Rm d and C Rd nA {z}B {z}C {z}m nSpectral Learning for NLPm d d n20

Singular Value Decomposition (SVD)SVDA {z}m nId min(m, n)Spectral Learning for NLPdXi 1σ i {z}ui (v i) {z} {z }scalar m 1 1 n {z }m n22

Singular Value Decomposition (SVD)SVDA {z}m ndXi 1Id min(m, n)Iσ1 . . . σd 0Spectral Learning for NLPσ i {z}ui (v i) {z} {z }scalar m 1 1 n {z }m n22

Singular Value Decomposition (SVD)SVDA {z}m ndXi 1σ i {z}ui (v i) {z} {z }scalar m 1 1 n {z }m nId min(m, n)Iσ1 . . . σd 0Iu1 . . . ud Rm are orthonormal:uiSpectral Learning for NLP2 1ui · uj 0 i 6 j22

Singular Value Decomposition (SVD)SVDA {z}m ndXi 1σ i {z}ui (v i) {z} {z }scalar m 1 1 n {z }m nId min(m, n)Iσ1 . . . σd 0Iu1 . . . ud Rm are orthonormal:uiI2 1ui · uj 0 i 6 jv 1 . . . v d Rn are orthonormal:viSpectral Learning for NLP2 1v i · v j 0 i 6 j22

SVD in Matrix FormA {z}U {z}Σ {z}V {z}SVDm n U u1 . . . ud Rm d V v 1 . . . v d Rn d Spectral Learning for NLPm d d d d n 1σ Σ 00.σd d d R23

Matrix RankA Rm nrank(A) min(m, n)Irank(A) : number of linearly independent columns in A 1 1 1 21 1rankSpectral Learning for NLP 22 22 1 1 2 1 2 2 1 1 3rank 3(full-rank)24

Matrix Rank: Alternative DefinitionIrank(A) : number of positive singular values of A 1 1 2 1 2 2 1 1 2 1 1 2 1 2 2 1 1 3 4.53 0 00.7 0 Σ 000 0 500Σ 0 0.98 0 000.2rank 2rank 3(full-rank)Spectral Learning for NLP25

SVD and Low-Rank Matrix ApproximationISuppose we want to find B such that B Iarg minB: rank(B) rX(Ajk Bjk )2jkSolution:B rXσ iui(v i) i 1Spectral Learning for NLP26

SVD in PracticeIIBlack box, e.g., in MatlabIInput: matrix A, output: scalars σ 1 . . . σ d , vectors u1 . . . udand v 1 . . . v dIEfficient implementationsIApproximate, randomized approaches also availableCan be used to solve a variety of optimization problemsIFor instance, Canonical Correlation Analysis (CCA)Spectral Learning for NLP27

Canonical Correlation Analysis (CCA)IData consists of paired samples: (x(i) , y (i) ) for i 1 . . . nIAs in co-training, x(i) Rd and y (i) Rd are two “views” ofa sample point0View 1x(1) (1, 0, 0, 0)y (1) (1, 0, 0, 1, 0, 1, 0)x(2) (0, 0, 1, 0).y (2) (0, 1, 0, 0, 0, 0, 1).x(100000) (0, 1, 0, 0)Spectral Learning for NLPView 2y (100000) (0, 0, 1, 0, 1, 1, 1)29

Example of Paired Data: Webpage Classification (Blumand Mitchell, 98)IDetermine if a webpage is an course home pagecourse 1 instructor’shomepage course home page· · · Announcements· · ·Lectures· · · TAs· · · Information· · · TA’shomepage course 2IIIView 1. Words on the page: “Announcements”, “Lectures”View 2. Identities of pages pointing to the page: instructror’shome page, related course home pagesEach view is sufficient for the classification!Spectral Learning for NLP30

Example of Paired Data: Named Entity Recognition(Collins and Singer, 99)IIdentify an entity’s type as either Organization, Person, orLocation. . . , says Mr. Cooper, a vice president of . . .IView 1. Spelling features: “Mr.”, “Cooper”IView 2. Contextual features: appositive presidentIEach view is sufficient to determine the entity’s type!Spectral Learning for NLP31

Example of Paired Data: Bigram ModelHXYp(h, x, y) p(h) p(x h) p(y h)(the, dog)(I, saw)(ran, to)(John, was).IEM can be used to estimate the parameters of the modelIAlternatively, CCA can be used to derive vectors which can beused in a predictor 0.3 1.5 the . dog . 1.1 0.4Spectral Learning for NLP32

Projection MatricesIProject samples to lower dimensional spacex Rd x0 RpIIf p is small, we can learn with far fewer samples!Spectral Learning for NLP33

Projection MatricesIProject samples to lower dimensional spacex Rd x0 RpIIf p is small, we can learn with far fewer samples!0ICCA finds projection matrices A Rd p , B Rd pIThe new data points are a(i) Rp , b(i) Rp wherea(i) {z}A {z}x(i) {z}p 1Spectral Learning for NLPp d d 1b(i) {z}B y (i) {z} {z}p 1p d0 d0 133

Mechanics of CCA: Step 1I000Compute ĈXY Rd d , ĈXX Rd d , and ĈY Y Rd dn[ĈXY ]jk1 X (i)(i)(xj x̄j )(yk ȳk ) ni 1n1 X (i)(i)[ĈXX ]jk (xj x̄j )(xk x̄k )ni 1n1 X (i)(i)(yj ȳj )(yk ȳk )[ĈY Y ]jk ni 1where x̄ Spectral Learning for NLPPix(i) /nand ȳ Piy(i) /n34

Mechanics of CCA: Step 1I000Compute ĈXY Rd d , ĈXX Rd d , and ĈY Y Rd dn[ĈXY ]jk1 X (i)(i)(xj x̄j )(yk ȳk ) ni 1[ĈXX ]jkn1 X (i)(i) (xj x̄j )(xk x̄k )ni 1n1 X (i)(i)(yj ȳj )(yk ȳk )[ĈY Y ]jk ni 1where x̄ Spectral Learning for NLPPix(i) /nand ȳ Piy(i) /n35

Mechanics of CCA: Step 1I000Compute ĈXY Rd d , ĈXX Rd d , and ĈY Y Rd dn[ĈXY ]jk1 X (i)(i)(xj x̄j )(yk ȳk ) ni 1[ĈXX ]jkn1 X (i)(i) (xj x̄j )(xk x̄k )ni 1[ĈY Y ]jkn1 X (i)(i)(yj ȳj )(yk ȳk ) ni 1where x̄ Spectral Learning for NLPPix(i) /nand ȳ Piy(i) /n36

Mechanics of CCA: Step 2I 1/2 1/2Do SVD on ĈXX ĈXY ĈY Y 1/20 Rd d 1/2ĈXX ĈXY ĈY Y U ΣV SVDLet Up Rd p be the top p left singular vectors. Let0Vp Rd p be the top p right singular vectors.Spectral Learning for NLP37

Mechanics of CCA: Step 3I0Define projection matrices A Rd p and B Rd p 1/2A ĈXX UpI 1/2B ĈY Y VpUse A and B to project each (x(i) , y (i) ) for i 1 . . . n:x(i) Rd A x(i) Rp0y (i) Rd B y (i) RpSpectral Learning for NLP38

Input and Output of CCAx(i) (0, 0, 0, 1, 0, 0,0, 0, 0, . . . , 0) R50,000 (i)a ( 0.3 . . . 0.1) R100y (i) (497, 0, 1, 12, 0, 0, 0, 7,0, 0, 0, 0, . . . , 0, 58, 0) R120,000 (i)bSpectral Learning for NLP ( 0.7 . . . 0.2) R10039

Justification of CCA: Correlation CoefficientsISample correlation coefficient for a1 . . . an R andb1 . . . bn R isPnā)(bi b̄)nni 1 (ai qCorr({ai }i 1 , {bi }i 1 ) pPPnn22i 1 (ai ā)i 1 (bi b̄)Pi ai /n,b̄ Pi bi /nbwhere ā Correlation 1aSpectral Learning for NLP41

Simple Case: p 10ICCA projection matrices are vectors u1 Rd , v1 RdIProject x(i) and y (i) to scalars u1 · x(i) and v1 · y (i)Spectral Learning for NLP42

Simple Case: p 10ICCA projection matrices are vectors u1 Rd , v1 RdIProject x(i) and y (i) to scalars u1 · x(i) and v1 · y (i)IWhat vectors does CCA find? Answer: u1 , v1 arg max Corr {u ·u,vSpectral Learning for NLPx(i) }ni 1 , {v·y (i) }ni 1 42

Finding the Next ProjectionsIAfter finding u1 and v1 , what vectors u2 and v2 does CCAfind? Answer: u2 , v2 arg max Corr {u ·u,vx(i) }ni 1 , {v·y (i) }ni 1 subject to the constraints x(i) }ni 1 , {u1x(i) }ni 1 Corr {u2 ·· 0 (i) n(i) nCorr {v2 · y }i 1 , {v1 · y }i 1 0Spectral Learning for NLP43

CCA as an Optimization ProblemICCA finds for j 1 . . . p (each column of A and B) uj , vj arg max Corr {u ·u,vx(i) }ni 1 , {v·y (i) }ni 1 subject to the constraints x(i) }ni 1 , {ukx(i) }ni 1 Corr {uj ·· 0 (i) n(i) nCorr {vj · y }i 1 , {vk · y }i 1 0for k jSpectral Learning for NLP44

Guarantees for CCAHXYIAssume data is generated from a Naive Bayes modelILatent-variable H is of dimension k, variables X and Y are ofdimension d and d0 (typically k d and k d0 )IUse CCA to project X and Y down to k dimensions (needs(x, y) pairs only!)ITheorem: the projected samples are as good as the originalsamples for prediction of H(Foster, Johnson, Kakade, Zhang, 2009)IBecause k d and k d0 we can learn to predict H with farfewer labeled examplesSpectral Learning for NLP45

Guarantees for CCA (continued)Kakade and Foster, 2007 - cotraining-style setting:IAssume that we have a regression problem: predict somevalue z given two “views” x and yIAssumption: either view x or y is sufficient for predictionIUse CCA to project x and y down to a low-dimensional spaceITheorem: if correlation coefficients drop off to zero quickly,we will need far fewer samples to learn when using theprojected representationIVery similar setting to cotraining, but:IINo assumption of independence between the two viewsCCA is an exact algorithm - no need for heuristicsSpectral Learning for NLP46

Summary of the SectionISVD is an efficient optimization techniqueIICCA derives a new representation of paired data thatmaximizes correlationIILow-rank matrix approximationSVD as a subroutineNext: use of CCA in deriving vector representations of words(“eigenwords”)Spectral Learning for NLP47

OverviewBasic conceptsLexical representationsIEigenwords found using the thin SVD between words andcontextcapture distributional similaritycontain POS and semantic information about wordsare useful features for supervised learningHidden Markov ModelsLatent-variable PCFGsConclusionSpectral Learning for NLP48

Uses of Spectral Methods in NLPIWord sequence labelingIIIIIPart of Speech tagging (POS)Named Entity Recognition (NER)Word Sense Disambiguation (WSD)Chunking, prepositional phrase attachment, .Language modelingIIWhat is the most likely next word given a sequence of words(or of sounds)?What is the most likely parse given a sequence of words?Spectral Learning for NLP49

Uses of Spectral Methods in NLPIWord sequence labeling: semi-supervised learningIIIUse CCA to learn vector representation of words (eigenwords)on a large unlabeled corpus.Eigenwords map from words to vectors, which are used asfeatures for supervised learning.Language modeling: spectral estimation of probabilisticmodelsIIUse eigenwords to reduce the dimensionality of generativemodels (HMMs,.)Use those models to compute the probability of an observedword sequenceSpectral Learning for NLP50

The Eigenword Matrix UIU contains the singular vectors from the thin SVD of thebigram count matrixate cheeseate01cheese 0000hamI10You20I ate hamYou ate cheeseYou ateSpectral Learning for NLPham10000I00000You0000051

The Eigenword Matrix UIU contains the singular vectors from the thin SVD of thebigram matrix (wt 1 wt ) analogous to LSA, but uses contextinstead of documentsIIIContext can be multiple neighboring words (we often use thewords before and after the target)Context can be neighbors in a parse treeEigenwords can also be computed using the CCA betweenwords and their contextsIWords close in the transformed space are distributionally,semantically and syntactically similarIWe will later use U in HMMs and parse trees to project wordsto low dimensional vectors.Spectral Learning for NLP52

Two Kinds of Spectral ModelsIContext oblivious (eigenwords)IIlearn a vector representation of each word type based on itsaverage contextContext sensitive (eigentokens or state)Iestimate a vector representation of each word token based onits particular context using an HMM or parse treeSpectral Learning for NLP53

Eigenwords in PracticeIWork well with corpora of 100 million wordsIWe often use trigrams from the Google n-gram collectionIWe generally use 30-50 dimensionsICompute using fast randomized SVD methodsSpectral Learning for NLP54

How Big Should Eigenwords Be?IA 40-D cube has 240 (about a trillion) vertices.IMore precisely, in a 40-D space about 1.540 11 millionvectors can all be approximately orthogonal.ISo 40 dimensions gives plenty of space for a vocabulary of amillion wordsSpectral Learning for NLP55

Fast SVD: Basic Methodproblem Find a low rank approximation to a n m matrix M .solution Find an n k matrix A such that M AA MSpectral Learning for NLP56

Fast SVD: Basic Methodproblem Find a low rank approximation to a n m matrix M .solution Find an n k matrix A such that M AA MConstruction A is constructed by:1. create a random m k matrix Ω (iid normals)2. compute M Ω3. Compute thin SVD of result: U DV M Ω4. A Ubetter: iterate a couple times“Finding structure with randomness: Probabilisticalgorithms for constructing approximate matrixdecompositions” by N. Halko, P. G. Martinsson, and J.A. Tropp.Spectral Learning for NLP56

milesincheswifesisterbrotherdaughterunclesonfather husbandmothergirl raturepermeabilitydensityviscosity-0.3-0.2-0.1PC 2acrespoundsbytesbarrels0.00.10.2Eigenwords for ’Similar’ Words are CloseSpectral Learning for NLP-0.2-0.10.00.10.20.30.457

0.3Eigenwords Capture Part of sleepdrinkcarryeatlisten-0.2-0.10.0PC 20.10.2disagreeagreeSpectral Learning for NLP-0.20.00.20.458

Eigenwords: Pronouns0.1PC 3Spectral Learning for NLP-0.2-0.10.00.10.20.359

Eigenwords: Numbers2007eightninesevenfivesixthreefourtwoone 0-0.2-0.1PC 20.00.10.220082009-0.310Spectral Learning for NLP-0.4-0.20.07986534210.260

0.15Eigenwords: 0.10-0.05PC 2lisadavidjohnmichaelSpectral Learning for enlindanancysusanlizjennifercharlesdorothywilliam seph-0.10.00.1tricia0.261

CCA has Nice Properties for Computing EigenwordsIWhen computing the SVD of a word context matrix (asabove) we need to decide how to scale the countsIUsing raw counts gives more emphasis to common wordsBetter: rescaleIIIIDivide each row by the square root of the total count of theword in that rowRescale the columns to account for the redundancyCCA between words and their contexts does thisautomatically and optimallyICCA ’whitens’ the word-context covariance matrixSpectral Learning for NLP62

Semi-supervised Learning ProblemsISequence labeleing (Named Entity Recognition, POS,WSD.)IIIITopic identificationIIIIX target wordZ context of the target wordlabel person / place / organization .X words in titleZ words in abstractlabel topic categorySpeaker identification:IIIX videoZ audiolabel which character is speakingSpectral Learning for NLP63

Semi-supervised Learning using CCAIFind CCA between X and ZIRecall: CCA finds projection matrices A and B such thatx A x {z} {z} {z}k 1Ik 1k d0 d0 1Project X and Z to estimate hidden state: (x, z)IIk d d 1z B z {z} {z} {z}Note: if x is the word and z is its context, then A is thematrix of eigenwords, x is the (context oblivious) eigenwordcorresponding to work x, and z gives a context-sensitive“eigentoken”Use supervised learning to predict label from hidden stateIand from hidden state of neighboring wordsSpectral Learning for NLP64

Theory: CCA has Nice PropertiesIIf one uses CCA to map from target word and context (twoviews, X and Z) to reduced dimension hidden state and thenuses that hidden state as features in a linear regression topredict a y, then we have provably almost as good a fit in thereduced dimsion (e.g. 40) as in the original dimension (e.g.million word vocabulary).IIn contrast, Principal Components Regression (PCR:regression based on PCA, which does not “whiten” thecovariance matrix) can miss all the signal[Foster and Kakade, ’06]Spectral Learning for NLP65

Semi-supervised ResultsIFind spectral features on unlabeled dataIIIIIUse in discriminative modelIIICRF for NERAveraged perceptron for chunkingCompare against state-of-the-art embeddingsIIIRCV-1 corpus: Newswire63 million tokens in 3.3 million sentences.Vocabulary size: 300kSize of embeddings: k 50C&W, HLBL, Brown, ASO and Semi-Sup CRFBaseline features based on identity of word and its neighborsBenefitIIINamed Entity Recognition (NER): 8% error reductionChunking: 29% error reductionAdd spectral features to discriminative parser: 2.6% errorreductionSpectral Learning for NLP66

Section SummaryIEigenwords found using thin SVD between words and contextIIIIIcapture distributional similaritycontain POS and semantic information about wordsperform competitively to a wide range of other embeddingsCCA version provides provable guarantees when used asfeatures in supervised learningNext: eigenwords form the basis for fast estimation of HMMsand parse treesSpectral Learning for NLP67

A Spectral Learning Algorithm for HMMsIAlgorithm due to Hsu, Kakade and Zhang (COLT 2009; JCSS2012)IAlgorithm relies on singular value decomposition followed byvery simple matrix operationsIClose connections to CCAIUnder assumptions on singular values arising from the model,has PAC-learning style guarantees (contrast with EM, whichhas problems with local optima)IIt is a very different algorithm from EMSpectral Learning for NLP68

Hidden Markov Models (HMMs)H1H2H3H4thedogsawhimp(the dog saw him, 12 1 3) {z} {z }x1 .x4h1 .h4 π(1) t(2 1) t(1 2) t(3 1)Spectral Learning for NLP69

Hidden Markov Models (HMMs)H1H2H3H4thedogsawhimp(the dog saw him, 12 1 3) {z} {z }x1 .x4h1 .h4 π(1) t(2 1) t(1 2) t(3 1) o(the 1) o(dog 2) o(saw 1) o(him 3)Spectral Learning for NLP69

Hidden Markov Models (HMMs)H1H2H3H4thedogsawhimp(the dog saw him, 12 1 3) {z} {z }x1 .x4h1 .h4 π(1) t(2 1) t(1 2) t(3 1) o(the 1) o(dog 2) o(saw 1) o(him 3)IIIInitial parameters: π(h) for each latent state hTransition parameters: t(h0 h) for each pair of states h0 , hObservation parameters: o(x h) for each state h, obs. xSpectral Learning for NLP69

Hidden Markov Models (HMMs)H1H2H3H4thedogsawhimThroughout this section:IWe use m to refer to the number of hidden statesIWe use n to refer to the number of possible words(observations)ITypically, m n (e.g., m 20, n 50, 000)Spectral Learning for NLP70

HMMs: the forward algorithmH1H2H3H4thedogsawhimp(the dog saw him) Xp(the dog saw him, h1 h2 h3 h4 )h1 ,h2 ,h3 ,h4Spectral Learning for NLP71

HMMs: the forward algorithmH1H2H3H4thedogsawhimp(the dog saw him) Xp(the dog saw him, h1 h2 h3 h4 )h1 ,h2 ,h3 ,h4The forward algorithm:Spectral Learning for NLP71

HMMs: the forward algorithmH1H2H3H4thedogsawhimp(the dog saw him) Xp(the dog saw him, h1 h2 h3 h4 )h1 ,h2 ,h3 ,h4The forward algorithm:fh0 π(h)Spectral Learning for NLP71

HMMs: the forward algorithmH1H2H3H4thedogsawhimp(the dog saw him) Xp(the dog saw him, h1 h2 h3 h4 )h1 ,h2 ,h3 ,h4The forward algorithm:fh0 π(h)fh1 Xt(h h0 )o(the h0 )fh00h0Spectral Learning for NLP71

HMMs: the forward algorithmH1H2H3H4thedogsawhimp(the dog saw him) Xp(the dog saw him, h1 h2 h3 h4 )h1 ,h2 ,h3 ,h4The forward algorithm:fh0 π(h)fh1 Xt(h h0 )o(the h0 )fh00h0fh2 Xt(h h0 )o(dog h0 )fh10h0Spectral Learning for NLP71

HMMs: the forward algorithmH1H2H3H4thedogsawhimp(the dog saw him) Xp(the dog saw him, h1 h2 h3 h4 )h1 ,h2 ,h3 ,h4The forward algorithm:fh0 π(h)fh1 Xt(h h0 )o(the h0 )fh00h0fh2 Xh0Spectral Learning for NLPt(h h0 )o(dog h0 )fh10fh3 Xt(h h0 )o(saw h0 )fh20h071

HMMs: the forward algorithm in matrix formSpectral Learning for NLPH1H2H3H4thedogsawhim72

HMMs: the forward algorithm in matrix formIH1H2H3H4thedogsawhimFor each word x, define the matrix Ax Rm m as[Ax ]h0 ,h t(h0 h)o(x h)Spectral Learning for NLP72

HMMs: the forward algorithm in matrix formIH1H2H3H4thedogsawhimFor each word x, define the matrix Ax Rm m as[Ax ]h0 ,h t(h0 h)o(x h) e.g., [Athe ]h0 ,h t(h0 h)o(the h)Spectral Learning for NLP72

HMMs: the forward algorithm in matrix formIH1H2H3H4thedogsawhimFor each word x, define the matrix Ax Rm m as[Ax ]h0 ,h t(h0 h)o(x h) e.g., [Athe ]h0 ,h t(h0 h)o(the h)IDefine π as vector with elements πh , 1 as vector of all onesSpectral Learning for NLP72

HMMs: the forward algorithm in matrix formIH1H2H3H4thedogsawhimFor each word x, define the matrix Ax Rm m as[Ax ]h0 ,h t(h0 h)o(x h) e.g., [Athe ]h0 ,h t(h0 h)o(the h)IIDefine π as vector with elements πh , 1 as vector of all onesThenp(the dog saw him) 1 Ahim Asaw Adog Athe πForward algorithm through matrix multiplication!Spectral Learning for NLP72

The Spectral Algorithm: definitionsH1H2H3H4thedogsawhimDefine the following matrix P2,1 Rn n :[P2,1 ]i,j P(X2 i, X1 j)Easy to derive an estimate:[P̂2,1 ]i,j Spectral Learning for NLPCount(X2 i, X1 j)N73

The Spectral Algorithm: definitionsH1H2H3H4thedogsawhimFor each word x, define the following matrix P3,x,1 Rn n :[P3,x,1 ]i,j P(X3 i, X2 x, X1 j)Easy to derive an estimate, e.g.,:[P̂3,dog,1 ]i,j Spectral Learning for NLPCount(X3 i, X2 dog, X1 j)N74

Main Result Underlying the Spectral AlgorithmIDefine the following matrix P2,1 Rn n :[P2,1 ]i,j P(X2 i, X1 j)IFor each word x, define the following matrix P3,x,1 Rn n :[P3,x,1 ]i,j P(X3 i, X2 x, X1 j)ISVD(P2,1 ) U Rn m , Σ Rm m , V Rn mIDefinition:Bx U P3,x,1 V {z}Σ 1 {z}m mSpectral Learning for NLPm m75

Why does this matter?ITheorem: if P2,1 is of rank m, thenBx GAx G 1where G Rm m is invertibleIRecall p(the dog saw him) 1 Ahim Asaw Adog Athe π.Forward algorithm through matrix multiplication!INow note that Spectral Learning for NLPBhim Bsaw Bdog BtheGAhim G 1 GAsaw G 1 GAdog G 1 GAthe G 176

Why does this matter?ITheorem: if P2,1 is of rank m, thenBx GAx G 1where G Rm m is invertibleIRecall p(the dog saw him) 1 Ahim Asaw Adog Athe π.Forward algorithm through matrix multiplication!INow note that Bhim Bsaw Bdog BtheGAhim G 1 GAsaw G 1 GAdog G 1 GAthe G 1 GAhim Asaw Adog Athe G 1The G’s cancel!!Spectral Learning for NLP76

Why does this matter?ITheorem: if P2,1 is of rank m, thenBx GAx G 1where G Rm m is invertibleIRecall p(the dog saw him) 1 Ahim Asaw Adog Athe π.Forward algorithm through matrix multiplication!INow note that Bhim Bsaw Bdog BtheGAhim G 1 GAsaw G 1 GAdog G 1 GAthe G 1 GAhim Asaw Adog Athe G 1The G’s cancel!!IFollows that if we have b 1 G 1 and b0 Gπ thenSpectral Learning for NLP76

Why does this matter?ITheorem: if P2,1 is of rank m, thenBx GAx G 1where G Rm m is invertibleIRecall p(the dog saw him) 1 Ahim Asaw Adog Athe π.Forward algorithm through matrix multiplication!INow note that Bhim Bsaw Bdog BtheGAhim G 1 GAsaw G 1 GAdog G 1 GAthe G 1 GAhim Asaw Adog Athe G 1The G’s cancel!!IFollows that if we have b 1 G 1 and b0 Gπ thenb Bhim Bsaw Bdog Bthe b0Spectral Learni

Latent-variable Models Latent-variable models are used in many areas of NLP, speech, etc.: I Latent-variable PCFGs (Matsuzaki et al; Petrov et al) I Hidden Markov Models I Naive Bayes for clustering I Lexical representations: Brown clustering, Saul and Pereira, etc. I Alignments in statistical machine translation I Topic modeling I etc. etc. The Expectation-maximization (EM) algorithm is .

Related Documents:

Bruksanvisning för bilstereo Bruksanvisning for bilstereo ... - Jula

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

372 Views

1y ago

10 tips och tricks för att lyckas med ert sap-projekt

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

733 Views

2y ago

Nordens 25 största medieföretag efter omsättning

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

329 Views

1y ago

SS 02 52 68 Ljudklassning av utrymmen i byggnader - byggtjanst.se

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

354 Views

1y ago

Apple Developer Program License Agreement (Swedish)

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

342 Views

1y ago

A ol 5 N 1 Spectral Restoration Based Speech Enhancement for ... - Unirioja

including spectral subtraction [2-5] Wiener filtering [6-8] and signal subspace techniques [9-10], (ii) Spectral restoration algorithms including . Spectral restoration based speech enhancement algorithms are used to enhance quality of noise masked speech for robust speaker identification. In presence of background noise, the performance of .

9 Views

1y ago

The Prescription of Spectral iQ - StarkeyPro

Spectral iQ Gain refers to gain applied to the newly generated spectral cue. This value is relative to gain prescribed across the target region. For example, if the target region spans 1000 Hz and 3000 Hz and the gain applied in that range is 20dB, a Spectral iQ gain setting of 3dB will cause Spectral iQ to generate new cues that peak at

28 Views

2y ago

Spectral Methods and Inverse Problems

Spectral Methods and Inverse Problems Omid Khanmohamadi Department of Mathematics Florida State University. Outline Outline 1 Fourier Spectral Methods Fourier Transforms Trigonometric Polynomial Interpolants FFT Regularity and Fourier Spectral Accuracy Wave PDE 2 System Modeling Direct vs. Inverse PDE Reconstruction 3 Chebyshev Spectral Methods .

22 Views

2y ago

Recent Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Australia's general insurance industry: sapping consumers .

body for general insurance companies licenced under the Insurance Act. General Insurance Code of Practice 2.22 The General Insurance Code of Practice (the Code) is a voluntary self-regulatory industry code developed by the First introduced in 1994, the ICA. Code covers most classes of general insurance, including home, strata and motor insurance.

1y ago

187 Views

3944 Com-A Bima English

4 Bangladesh Co-operative Insurance Ltd BD Co-operative . 7 Central Insurance Company Ltd Central 8 City General Insurance Company Ltd City General 9 Continental Insurance Company Ltd Continental 10 Crystal Insurance Company Ltd Crystal 11 Desh General insurance ltd Desh General 12 Dhaka Insurance Company Ltd Dhaka .

2y ago

359 Views

Oracle Insurance Performance Insight for General Insurance

for General Insurance Overview Oracle Insurance Performance Insight for General Insurance (OIPIGI) is a comprehensive business intelligence system created exclusively for the General Insurance/Property and Casualty (P&C) insurance industry. OIPIGI provides a complete set of web-based analytical and reporting components that enable users to

1y ago

175 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Financial Performance of General Insurance Companies in Bangladesh: A .

general insurance companies, 17 private general insurance companies and the only public general insurance company (SBC) were taken for the study. SBC was established in 1973. To get a better representation the insurance companies were selected based on their year of establishment. Among the private general insurance companies 7 companies were

1y ago

215 Views

S OF GENERAL INSURANCE

General Insurance comprises of insurance of property against fire, burglary etc, personal insurance such as Accident and Health Insurance, and liability insurance which covers legal liabilities. Suitable general Insurance covers are necessary for every family. It is important to protect one’s property, which

3y ago

278 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

179 Views

Spectral Learning Algorithms For Natural Language Processing

It looks like you're using an ad-blocker