Statistical NLP For The Web - Columbia University

3y ago
34 Views
2 Downloads
1.77 MB
59 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Francisco Tran
Transcription

Statistical NLP for the WebIntroduction, Text Mining, Linear Methodsof RegressionSameer MaskeyWeek 1, September 5, 2012

Outline IntroductionFinal Project DetailsNLP-ML TopicsText Mining, Scoring chunks of textSimple Linear RegressionMultiple Linear RegressionEquations to ImplementationReading Assignment

Course Information Course Website: http://www.cs.columbia.edu/ smaskey/CS6998-0412 Discussions in courseworksOffice hours Wed: 2 to 4pm, 457 CS buildingIndividual appointments in person or in phone can be set by emailing the instructor :smaskey@cs.columbia.edu Instructor: Sameer Maskey, PhD Adj. Assistant Professor, Columbia University Research Scientist, IBM Research NLP/Speech Processing research for the last 12 years TA : Morgan Ulinski mulinski@cs.columbia.eduOffice hours : 2-4pm Tuesday, Speech Lab, CEPSR Prerequisites Probability, statistics, linear algebra, programming skillCS Account

Grading and Academic Integrity 3 Homework (15% each) Homework due dates are available in the class webpage You have 3 ‘no penalty’ late days in total that can be used duringthe semester Each additional late day (without approval) will be penalized, 20%each dayNo Midterm examFinal project (55%) It is meant for you to explore and do research NLP/ML topic ofyour choice Project proposal due soonNo Final ExamCollaboration allowed but presenting someone else’s work(including code) will result in an automatic zero

Textbooks For NLP topics we will use the following book: Speech and Language Processing (2nd Edition) by DanielJurafsky and James H MartinFor statistical methods/ML topics we will partly use Pattern Recognition and Machine Learning by ChristopherBishop

How Will We Approach Topics?Example Topic StackNLP TheoryText CategorizationML TheoryNaïve BayesDataEquations toImplementationSpam DataJava/C plication

Final Project DetailsExample Final Project 1 to 2 Person TeamThink about a cool NLP/ML basedweb/mobile application you have beenwanting to build forever and share tothe world! (Build it and get coursecredit for it)Sentiment AnalyzerLinear RegressionTwitter Data Project ProposalProject 1/2 semester reportProject 3/4 semester demoProject DEMO Day Paper (8 pages)Poster/SlidesDemoJava/C /PythonPhp/Django/Ruby/Javascript/Java

Final Project DEMO/Mini-Conference Day December 12, 2012Each student/group will present his/her/their work PaperPosterDemoJudges Internal JudgesExternal Industry Experts

Computing Environment Cs699804.cs.columbia.eduEach student will get 15-20 GB of space for experimentsYou can also use CS clusterYou can also use your laptops/desktops

NLPMLApplicationsThis intersectionis what the classis about

Goal of the Class By the end of the semester You will have in-depth knowledge of several NLP and ML topicsand explore the relationship between them You should be able to implement many of the NLP/ML methodson your own You will be able to frame many of the NLP problems in statisticalframework of your choice You will understand how to analytically read NLP/ML papers andknow the kind of questions to ask oneself when doing NLP/MLresearch You will have built one end-to-end NLP application that hopefullyyou will be proud of!

Topics in NLP (HLT, ACL) Conference Morphology (including word segmentation)Part of speech taggingSyntax and parsingGrammar EngineeringWord sense disambiguationLexical semanticsMathematical LinguisticsTextual entailment and paraphrasingDiscourse and pragmaticsKnowledge acquisition and representationNoisy data analysisMachine translationMultilingual language processingLanguage generationSummarizationQuestion answeringInformation retrievalInformation extractionTopic classification and information filteringNon-topical classification (sentiment/genre analysis)Topic clusteringText and speech miningText classificationEvaluation (e.g., intrinsic, extrinsic, user studies)Development of language resourcesRich transcription (automatic annotation)

Topics in ML (ICML, NIPS) Conference Reinforcement LearningOnline LearningRankingGraphs and EmbedddingGaussian ProcessesDynamical SystemsKernelsCodebook and DictionariesClustering AlgorithmsStructured LearningTopic ModelsTransfer LearningWeak SupervisionLearning StructuresSequential Stochastic ModelsActive LearningSupport Vector MachinesBoostingLearning KernelsInformation Theory and EstimationBayesian AnalysisRegression MethodsInference AlgorithmsAnalyzing Networks & Learning with Graphs

NLP Many Topics RelatedTasks SolutionsCombine Relevant TopicsMorphology (including word segmentation)Part of speech taggingSyntax and parsingGrammar EngineeringWord sense disambiguationLexical semanticsMathematical LinguisticsTextual entailment and paraphrasingDiscourse and pragmaticsKnowledge acquisition and representationNoisy data analysisMachine translationMultilingual language processingLanguage generationSummarizationQuestion answeringInformation retrievalInformation extractionTopic classification and information filteringNon-topical classification (sentiment/genre analysis)Topic clusteringText and speech miningText classificationEvaluation (e.g., intrinsic, extrinsic, user studies)Development of language resourcesRich transcription (automatic annotation) MLReinforcement LearningOnline LearningRankingGraphs and EmbedddingGaussian ProcessesDynamical SystemsKernelsCodebook and DictionariesClustering AlgorithmsStructured LearningTopic ModelsTransfer LearningWeak SupervisionLearning StructuresSequential Stochastic ModelsActive LearningSupport Vector MachinesBoostingLearning KernelsInformation Theory and EstimationBayesian AnalysisRegression MethodsInference AlgorithmsAnalyzing Networks & Learning with Graphs

Topics We Will Cover in This CourseNLP -- ML Text MiningLinear Models of Regression Text CategorizationLinear Methods of ClassificationGenerative Classifier Information Extraction/Tagging Syntax and ParsingHidden Markov ModelMaximum Entropy ModelsViterbi Search, Beam Search Topic and Document Clustering K-means, KNNExpectation Maximization Machine TranslationLanguage Modeling Evaluation TechniquesNeural NetworksDeep Belief NetworksBelief Propogation

How about Data?

NLP TheoryText MiningML TheoryDataEquations toImplementation Web/MobileApplicationData Mining: finding nontrivial patterns incorpora/databases that may be previously unknownand could be usefulText Mining: Find interesting patterns/information from unstructured textDiscover new knowledge from these patterns/informationInformation Extraction, Summarization, OpinionAnalysis, etc can be thought as some form of textminingLet us look at an example

Patterns in Unstructured TextAll Amazon reviewers may notrate the product, may justwrite reviews, we may have toinfer the rating based on text reviewSome of these patternscould be exploited todiscover knowledgePatterns may existin unstructured textReview of a camera in Amazon

Text to Knowledge Text Words, Reviews, News Stories, Sentences,Corpus, Text Databases, Real-time text, BooksMany methods to usefor discoveringknowledge from text Knowledge Ratings, Significance, Patterns, Scores, Relations

Unstructured Text ScoreFacebook’s “Gross National Happiness Index” Facebook users update their status “ is writing a paper”“ has flu ”“ is happy, yankees won!”Facebook updates are unstructured textScientists collected all updates and analyzed themto predict “Gross National Happiness Index”

Facebook’s “Gross National Happiness Index”How do you think they extracted this SCORE from aTEXT collection of status updates?

Facebook Blog Explains “The result was an index that measures how happypeople on Facebook are from day-to-day by lookingat the number of positive and negative words they'reusing when updating their status. When people intheir status updates use more positive words - orfewer negative words - then that day as a whole iscounted as happier than usual.”Looks like they are COUNTING! ve and –ve words in status updates

Mood Swings During a Day Based onTwitter Data Tweets Score509 Million tweets analyzed2.4 Million Users“Diurnal and Seasonal Mood Vary with Work, Sleep, andDaylength Across Diverse Cultures.” By Scott A. Golder andMichael W. Macy. Science, Vol. 333, September 30, 2011.

Let’s Build Our NLP/ML Model to PredictHappiness Simple Happiness Score Our simpler version of happiness index compared tofacebookScore ranges from 0 to 10There are a few things we need to consider We are using status updates wordsWe do not know what words are positive andnegativeWe do not have any training data

Our Prediction Problem Training data Assume we have N 100,000 status updates Assume we have a simple list of positive and negative words Let us also assume we asked a human annotator to read each ofthe 100,000 status update and give a happiness Score (Yi)between 0 to 10 “ is writing a paper” (Y1 4)“ has flu ” (Y2 1.8).“ is happy, game was good!” (Y100,000 8.9)Test data “ likes the weather” (Y100,001 ? )Given labeled set of 100KStatus updates, how do we buildStatistical/ML model thatwill predict the score for anew status update

NLP TheoryRepresenting Text of StatusUpdates As a VectorML TheoryDataEquations toImplementation What kind of feature can we come up with that wouldrelate well with happiness score How about represent status update asWeb/MobileApplication Count ( ve words in the sentence) (not the idealrepresentation, will see better representation letter)For the 100,000th sentence in our previous example: “ is happy, game was good.” Count is 2Status Update 100,000th is represented by (X100000 2, Y100000 8.9)

Modeling Technique We want to predict happiness score (Yi) for a new statusupdateIf we can model our training data with a statistical/ML model,we can do such prediction (1, 4) (0, 1.8) . . .Xi , Yi (2, 8.9)What modeling technique can we use? Linear Regression is one choice

NLP TheoryLinear RegressionML TheoryDataEquations toImplementation We want to find a function that given our x it wouldmap it to y One such function : Different values of thetas give different functions What is the best theta such that we have afunction that makes least error on predictions whencompared with yWeb/MobileApplication

Predicted vs. True

Sum of Squared Errors Plugging in f(x) and averaging the error across all trainingdata points we get the empirical loss

Finding the Minimum We can (but not always) find a minimum of afunction by setting the derivative or partial derivativesto zero Here we can take partials on thetas and set them tozero

Solving for Weights

Empirical Loss is Minimized With GivenValues for the Parameters Solving the previous equations we get following valuesfor the thetas

NLP TheoryEquations to ImplementationML TheoryDataEquations toImplementation Given our training data on status update with happiness score (1, 4)(0, 1.8).(2, 8.9)XiFor Prediction/Testing:Given optimal thetas, plug in thex value in our equation to get y,YiWeb/MobileApplicationTraining Our Regression Model:Just need to implement for loopthat computes numerators anddenominators in equations here.And we get optimal thetas

Simple Happiness Scoring Model tooSimple? So far we have a regression model that wastrained on a training data of facebook statusupdates (text) and labeled happiness scoreStatus updates words were mapped to onefeature Feature counted number of ve wordsMaybe too simple? How can we improve the model?Can we add more features? How about count of –ve words as well

Let Us Add One More Feature Adding one more feature Zi representing count of –ve words, nowtraining data will look like the following (1, 3, 4)(0, 6,1.8).(2, 0, 8.9)Xi,Zi,YiWhat would our linear regressionfunction would look like[3]Estimation of y i.e. f(x,z) is now a plane instead of a line

Regression Function in Matrix Form Remember our regression function in 2D lookedlike Representing in Matrix form we get, And empirical loss will be

Adding Features In K dimensions the regression function f(x) we estimate willlook like So the empirical loss would Representing with matrices

Empirical Loss with K Features and NData Points in Matrix Representation Representing empirical loss in Matrix formYXθ

Solve by Setting Partial Derivatives toZero Remember, to find the minimum empirical loss we set thepartial derivatives to zeroWe can still do the same in matrix form, we have to set thederivatives to zeroSolving the above equation we get our best set of parameters

NLP TheoryEquations to ImplementationML TheoryDataEquations toImplementationWeb/MobileApplication Given out N training data points we can build X and Y matrixand perform the matrix operationsCan use MATLABOr write your own, Matrix multiplication implementationGet the theta matrixFor any new test data plug in the x values (features) in ourregression function with the best theta values we have

Back to Our Happiness PredictionRegression Model Xi1 represented count of ve words(Xi1, Yi) pair were used to build simple linear regression modelWe added one more feature Xi2, representing count of –ve words(Xi1, Xi2, Yi) can be used to build multiple linear regression model Our training data would look like (1, 3, 4)(0, 6,1.8).(2, 0, 8.9)Xi1,Xi2,YiFrom this we can build X and Y Matrix and find the best theta valuesFor N Data points, we will get Nx3 X matrix, Nx1 Y matrix and 3X1 θ matrix

More Features? Feature Engineering So far we have only two features, is it good enough?Should we add more features?What kind of features can we add? Ratio of ve/-ve wordsNormalized count of ve wordsIs there a verb in the sentence?We need to think what are the kinds of information that maybetter estimate the Y valuesIf we add above 3 features, what is the value of K?

Polynomial Regression

Polynomial Regression

K Features, M Order Polynomial and NData Points With K 1, we get a regression line, with K 2 we geta planeWith M 1 we get a straight line or planeWith M 2 we get a curved line or planeSo with K 2 and M 2 ?

Trend SurfaceTrend Surfaces for different orders of polynomial [1]

Overfitting Higher order of polynomial should be used with caution thoughHigher order polynomial can fit the training data too closelyespecially when few training points, with the generalization errorbeing highLeave one out cross validation allows to estimate generalizationerror better If N data points use N-1 data points to train and use 1 to testHigher order of polynomial overfitting with few data points [2]

Testing Our Model Our goal was to build the best statistical model thatwould automate the process of scoring a chunk oftext (Happiness Score)How can we tell how good is our model?Remember previously we said let us assume wehave 100,000 status updatesInstead of using all 100K sentences let use the first90K to build the modelUse rest of 10K to test the model

10-fold Cross Validation 10 fold cross validation We trained on first 90K (1 to 90,000)Tested on (90,001 to 100,000)But we can do this 10 times if we select different 10K of test datapoint each k10k10k10k10k10k10k10k10k10k10k10k10k10k10k10k10k Exp10 10k10 experiments, build model and test times with 10 different sets oftraining and test dataAverage the accuracy across 10 experimentsWe can do any N-fold cross validation to test our model

Scores from Text, What Else Can TheyRepresent? Given a facebook status update we can predicthappiness scoreBut we can use the same modeling technique inmany other problems Summarization: Score may represent importanceQuestion Answering: Score may represent relevanceInformation extraction : Score may represent relationWe need to engineer features according to theproblemMany uses of the statistical technique we learnedtoday

Reviews to Automatic RatingsScoresYXStatistical ModelFeaturesTRAINFeaturesXModelPREDICTRating

Tweets to Mood Score“Diurnal and Seasonal Mood Vary with Work, Sleep, andDaylength Across Diverse Cultures.” By Scott A. Golder andMichael W. Macy. Science, Vol. 333, September 30, 2011. Can you now implement a regression model to predict MoodScore for new tweets?

Interesting Research on Twitter Data Predict Elections : Tumasjan, et. al, “Predicting elections with twitter:What 140 characters reveal about political sentiment,” AAAI 2010 Understand Mood Variations : Golder & Macy, “Diurnal and SeasonalMood Vary with Work, Sleep, and Daylength Across Diverse Cultures”,Science, 30 September 2011: Vol. 333 no. 6051 pp. 1878-1881 Find Influential People : Weng et al, “Twitterrank: finding topicsensitive influential twitterers” WSDM, 2010 Usage in Disease Outbreak : Chew et al.,“Pandemics in the age ofTwitter: content analysis of Tweets during the 2009 H1N1 outbreak,”PloS One, 5(11), 2010 Try predicting Stock Market : Bollen et al, “Twitter Mood Predicts theStock Market”, Journal of Computational Science, Mar, 2011

NLP TheoryWhat Else Can We Do?ML TheoryDataEquations toImplementation Sentiment AnalysisInformation ExtractionQuestion AnsweringSearchText MiningStory TrackingSummary GenerationEvent DetectionEntity Extraction.many moreWeb/MobileApplicationThink about the Final Project!

Finding Partner for the Final Project You can do the project alone! You can team up with 1 more person Some students prefer this2 person team needs to present a project thatrequire 2 person effortGo to courseworks and start posting whatyou are interested in and find a partner

Example Final ProjectSentiment on various topics Automatically detect the sentiment of aperson on a range of topics and update themodel whenever new information is provided

Readings 1.1, 3.1, 4.1 Bishop Book23.1.1, 23.1.2, 23.1.3 Jurafsky & Martin Book

References [1] al regression.pdf[2] Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006[3] Hastie, Tibshirani and Friedman, Elements of Statistical Learning, 2001

NLP Theory ML Theory Data Equations to Implementation Web/Mobile Application. Patterns in Unstructured Text Patterns may exist in unstructured text Some of these patterns could be exploited to discover knowledge All Amazon reviewers may not rate the product, may just write reviews, we may have to

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

have been so impressed with NLP that they have gone on to train in our Excellence Assured NLP Training Academy and now use NLP as NLP Practi-tioners, Master Practitioners and Trainers with other people. They picked NLP up and ran with it! NLP is about excellence, it is about change and it is about making the most of life. It will open doors for

5. Using NLP to Overcome Mental Barriers 6. Using NLP to Overcome Procrastination 7. Using NLP in Developing Attraction 8. Using NLP in Wealth Manifestation 9. How to Use NLP to Overcome Social Phobia 10. Using NLP to Boost Self-Condidence 11. Combining NLP with Modelling Techniques 12. How to Use NLP as a Model of Communication with Others 13.

NLP experts (e.g., [52] [54]). This process gave rise to a total of 57 different NLP techniques. IV. CLASSIFYING NLP TECHNIQUES BY TASKS We first classify the NLP techniques based on their text-processing tasks. Figure 1 depicts the relationship between NLP techniques, NLP tasks, NLP resources, and tools. We define an NLP task as a piece of .

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions