(Predictive Analytics II: Text, Web, And Social Media Analytics)

1y ago
23 Views
2 Downloads
9.53 MB
56 Pages
Last View : 2d ago
Last Download : 3m ago
Upload by : Camille Dion
Transcription

Tamkang UniversityTamkangUniversityPractices of Business IntelligenceII(Predictive Analytics II:Text, Web, and Social Media Analytics)1071BI07MI4 (M2084) (2888)Wed, 7, 8 (14:10-16:00) (B217)Min-Yuh DayAssistant ProfessorDept. of Information Management, Tamkang Universityhttp://mail. tku.edu.tw/myday/2018-10-311

(Syllabus)(Week)1 2018/09/12(Date)(Subject/Topics)(Course Orientation for Practices of Business Intelligence)2 2018/09/19(Business Intelligence, Analytics, and Data Science)3 2018/09/26(ABC: AI, Big Data, and Cloud Computing)4 2018/10/03I(Descriptive Analytics I: Nature of Data, Statistical Modeling,and Visualization)5 2018/10/106 2018/10/17(II) (National Day) (Day off)(Descriptive Analytics II: Business Intelligence andData Warehousing)2

(Syllabus)(Week)7 2018/10/24(Date)(Subject/Topics)I(Predictive Analytics I: Data Mining Process,Methods, and Algorithms)8 2018/10/31II(Predictive Analytics II: Text, Web, andSocial Media Analytics)9 2018/11/07(Midterm Project Report)10 2018/11/14(Midterm Exam)11 2018/11/21(Prescriptive Analytics: Optimization and Simulation)12 2018/11/28(Social Network Analysis)3

(Syllabus)(Week)13 2018/12/05(Date)(Subject/Topics)(Machine Learning and Deep Learning)14 2018/12/1215161718(Natural Language Processing)2018/12/19 AI(AI Chatbots and Conversational Commerce)2018/12/26(Future Trends, Privacy andManagerial Considerations in Analytics)2019/01/02(Final Project Presentation)2019/01/09(Final Exam)4

Business Intelligence (BI)1Introduction to BI and Data Science2Descriptive Analytics3Predictive Analytics4Prescriptive Analytics5Big Data Analytics6Future Trends5

Predictive Analytics II:Text, Web, andSocial Media Analytics6

Outline Text Analytics and Text Mining Overview– Natural Language Processing (NLP)– Text Mining Applications– Text Mining Process– Sentiment Analysis Web Mining Overview– Search Engines– Web Usage Mining (Web Analytics) Social Analytics7

A High-Level Depiction ofDeepQA ArchitectureSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson8

Text Analytics and Text MiningSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson9

Text Analytics Text Analytics Information Retrieval Information Extraction Data Mining Web Mining Text Analytics Information Retrieval Text MiningSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson10

Text mining Text Data Mining Knowledge Discovery inTextual DatabasesSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson11

Application Areas of Text Mining Information extractionTopic pt linkingQuestion answeringSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson12

Natural Language Processing (NLP) Natural language processing (NLP) is animportant component of text mining andis a subfield of artificial intelligence andcomputational linguistics.Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson13

Natural Language Processing (NLP) Part-of-speech taggingText segmentationWord sense disambiguationSyntactic ambiguityImperfect or irregular inputSpeech actsSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson14

NLP Tasks Question answeringAutomatic summarizationNatural language generationNatural language understandingMachine translationForeign language readingForeign language writing.Speech recognitionText-to-speechText proofingOptical character recognitionSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson15

Text-Based Deception-DetectionProcessSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson16

Multilevel Analysis of Text forGene/Protein Interaction IdentificationSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson17

Context Diagram for theText Mining ProcessSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson18

The Three-Step/TaskText Mining ProcessSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson19

Term–Document MatrixSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson20

A Multistep Process to Sentiment AnalysisSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson21

Sentiment nessresolution n SpamDetectionMulti- & CrossLingual SCCross-domainSCApproachesMachine LearningbasedLexicon basedHybridapproachesLexiconCreationOntology tionSource: Kumar Ravi and Vadlamani Ravi (2015), "A survey on opinion mining and sentiment analysis: tasks, approaches and applications."Knowledge-Based Systems, 89, pp.14-46.22

Sentiment Classification SupervisedLearningLinearClassifiersSupport VectorMachine (SVM)Neural gLexiconbasedApproachDecision asedApproachDeep Learning(DL)Naïve Bayes(NB)BayesianNetwork (BN)MaximumEntropy (ME)StatisticalCorpus-basedApproachSemanticSource: Jesus Serrano-Guerrero, Jose A. Olivas, Francisco P. Romero, and Enrique Herrera-Viedma (2015),"Sentiment analysis: A review and comparative analysis of web services," Information Sciences, 311, pp. 18-38.23

Example of Opinion:review segment on iPhone“I bought an iPhone a few days ago.It was such a nice phone.The touch screen was really cool.The voice quality was clear too.However, my mother was mad with me as I did not tellher before I bought it.She also thought the phone was too expensive, andwanted me to return it to the shop. ”Source: Bing Liu (2011) , “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,24

Example of Opinion:review segment on iPhone“(1) I bought an iPhone a few days ago.(2) It was such a nice phone. Positive(3) The touch screen was really cool.Opinion(4) The voice quality was clear too.(5) However, my mother was mad with me as I did nottell her before I bought it.(6) She also thought the phone was too expensive, andwanted me to return it to the shop. ”-NegativeOpinionSource: Bing Liu (2011) , “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,25

P–N Polarity andS–O Polarity RelationshipSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson26

Taxonomy of Web MiningSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson27

Structure of aTypical Internet Search EngineSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson28

Web Usage Mining(Web Analytics) Web usage mining (Web analytics)is the extraction of useful informationfrom data generatedthrough Web page visits and transactions. Clickstream AnalysisSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson29

Extraction of Knowledge fromWeb Usage DataSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson30

Web Analytics DashboardSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson31

Social Analytics Social analytics is defined asmonitoring, analyzing,measuring and interpretingdigital interactions andrelationships of people, topics, ideas andcontent.Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson32

Branches of Social AnalyticsSocial Network Analysis(SNA)Social AnalyticsSocial Media AnalyticsSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson33

Evolution ofSocial Media User EngagementSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson34

Python in Google DnGvwfUbeo4zJ1zTunjMqf2RkCrT35

Text ClassificationSource: des/text-classification/36

Text Classification Workflow Step 1: Gather DataStep 2: Explore Your DataStep 2.5: Choose a Model*Step 3: Prepare Your DataStep 4: Build, Train, and Evaluate Your ModelStep 5: Tune HyperparametersStep 6: Deploy Your ModelSource: des/text-classification/37

Text Classification FlowchartSource: des/text-classification/step-2-538

Text Classification S/W 1500: N-gramSource: des/text-classification/step-2-539

Text Classification S/W 1500: SequenceSource: des/text-classification/step-2-540

Step 2.5: Choose a ModelSamples/Words 1500150,000/100 1500IMDb review dataset,the samples/words-per-sample ratio is 144Source: des/text-classification/step-2-541

Step 2.5: Choose a ModelSamples/Words 15,0001,500,000/100 15,000Source: des/text-classification/step-2-542

Step 3: Prepare Your DataTexts:T1: ’The mouse ran up the clock'T2: ’The mouse ran down’Token Index:{'the': 1, 'mouse’: 2, 'ran': 3, 'up': 4, 'clock': 5, 'down': 6,}.NOTE: 'the' occurs most frequently,so the index value of 1 is assigned to it.Some libraries reserve index 0 for unknown tokens,as is the case here.Sequence of token indexes:T1: ‘The mouse ran up the clock’ [1, 2,3, 4, 1, 5]T1: ’The mouse ran down’ [1,2,3, 6]Source: des/text-classification/step-343

One-hot encoding'The mouse ran up the clock’ Themouseranuptheclock123415[ 0,0,0,0,0,1,0,0,0,0,0,0,0,1,0],0],0],0],0],0] ][0, 1, 2, 3, 4, 5, 6]Source: des/text-classification/step-344

Word embeddingsSource: des/text-classification/step-345

Word embeddingsSource: des/text-classification/step-346

t1 'The mouse ran up the clock't2 'The mouse ran down's1 t1.lower().split(' ')s2 t2.lower().split(' ')terms s1 s2sortedset sorted(set(terms))print('terms ', terms)print('sortedset ', 1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT47

t1 'The mouse ran up the clock't2 'The mouse ran down's1 t1.lower().split(' ')s2 t2.lower().split(' ')terms s1 s2print(terms)tfdict {}for term in terms:if term not in tfdict:tfdict[term] 1else:tfdict[term] 1a []for k,v in tfdict.items():a.append('{}, gle.com/drive/1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT48

sorted by value reverse sorted(tfdict.items(),key lambda kv: kv[1], reverse True)sorted by value reverse dict dict(sorted by value reverse)id2word {id: word for id, word inenumerate(sorted by value reverse dict)}word2id dict([(v, k) for (k, v) om/drive/1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT49

sorted by value sorted(tfdict.items(), key lambda kv: kv[1])print('sorted by value: ', sorted by value)sorted by value2 sorted(tfdict, key tfdict.get, reverse True)print('sorted by value2: ', sorted by value2)sorted by value reverse sorted(tfdict.items(), key lambda kv: kv[1], reverse True)print('sorted by value reverse: ', sorted by value reverse)sorted by value reverse dict dict(sorted by value reverse)print('sorted by value reverse dict', sorted by value reverse dict)id2word {id: word for id, word in enumerate(sorted by value reverse dict)}print('id2word', id2word)word2id dict([(v, k) for (k, v) in id2word.items()])print('word2id', word2id)print('len words:', len(word2id))sorted by key sorted(tfdict.items(), key lambda kv: kv[0])print('sorted by key: ', sorted by key)tfstring '\n'.join(a)print(tfstring)tf T50

fromkeras.preprocessing.textimport TokenizerSource: ta-deep-learning-keras/51

fromkeras.preprocessing.textimport Tokenizerfrom keras.preprocessing.text import Tokenizer# define 5 documentsdocs ['Well done!', 'Good work', 'Great effort', 'nicework', 'Excellent!']# create the tokenizert Tokenizer()# fit the tokenizer on the documentst.fit on texts(docs)print('docs:', docs)print('word counts:', t.word counts)print('document count:', t.document count)print('word index:', t.word index)print('word docs:', t.word docs)# integer encode documentstexts to matrix t.texts to matrix(docs, mode 'count')print('texts to matrix:')print(texts to matrix)Source: ta-deep-learning-keras/52

texts to matrix t.texts to matrix(docs, mode 'count')docs: ['Well done!', 'Good work', 'Great effort’,'nice work', 'Excellent!’]word counts: OrderedDict([('well', 1), ('done', 1),('good', 1), ('work', 2), ('great', 1), ('effort', 1),('nice', 1), ('excellent', 1)])document count: 5word index: {'work': 1, 'well': 2, 'done': 3, 'good':4, 'great': 5, 'effort': 6, 'nice': 7, 'excellent': 8}word docs: {'done': 1, 'well': 1, 'work': 2, 'good': 1,'great': 1, 'effort': 1, 'nice': 1, 'excellent': 1}texts to matrix:[[0. 0. 1. 1. 0. 0. 0. 0. 0.][0. 1. 0. 0. 1. 0. 0. 0. 0.][0. 0. 0. 0. 0. 1. 1. 0. 0.][0. 1. 0. 0. 0. 0. 0. 1. 0.][0. 0. 0. 0. 0. 0. 0. 0. 1.]]Source: ta-deep-learning-keras/53

t.texts to matrix(docs, mode 'tfidf')from keras.preprocessing.text import Tokenizer# define 5 documentsdocs ['Well done!', 'Good work', 'Great effort', 'nice work','Excellent!']# create the tokenizert Tokenizer()# fit the tokenizer on the documentst.fit on texts(docs)print('docs:', docs)print('word counts:', t.word counts)print('document count:', t.document count)print('word index:', t.word index)print('word docs:', t.word docs)# integer encode documentstexts to matrix t.texts to matrix(docs, mode 'tfidf')print('texts to matrix:')print(texts to matrix)texts to matrix:[[0. 0. 1.25276297 1.25276297 0. 0. 0. 0. 0. ][0. 0.98082925 0. 0. 1.25276297 0. 0. 0. 0. ][0. 0. 0. 0. 0. 1.25276297 1.25276297 0. 0. ][0. 0.98082925 0. 0. 0. 0. 0. 1.25276297 0. ][0. 0. 0. 0. 0. 0. 0. 0. 1.25276297]]Source: ta-deep-learning-keras/54

Summary Text Analytics and Text Mining Overview– Natural Language Processing (NLP)– Text Mining Applications– Text Mining Process– Sentiment Analysis Web Mining Overview– Search Engines– Web Usage Mining (Web Analytics) Social Analytics55

References Ramesh Sharda, Dursun Delen, and Efraim Turban (2017),Business Intelligence, Analytics, and Data Science: AManagerial Perspective, 4th Edition, Pearson. Jake VanderPlas (2016),Python Data Science Handbook: Essential Tools for Workingwith Data, O'Reilly Media.56

Social Analytics Social analytics is defined as monitoring, analyzing, measuring and interpreting digital interactions and relationships of people, topics, ideas and content. 32 Source: Ramesh Sharda, DursunDelen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson

Related Documents:

Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text

SAP Predictive Analytics Data Manager Automated Modeler Expert Modeler (Visual Composition Framework) Predictive Factory Hadoop / Spark Vora SAP Applications SAP Fraud Management SAP Analytics Cloud HANA Predictive & Machine Learning Spatial Graph Predictive (PAL/APL) Series Data Streaming Analytics Text Analytics

predictive analytics and predictive models. Predictive analytics encompasses a variety of statistical techniques from predictive modelling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events. When most lay people discuss predictive analytics, they are usually .

The Predictive Analytics Modeler career path prepares students to learn the essential analytics models to collect and analyze data efficiently. This will require skills in predictive analytics models, such as data mining, data collection and integration, nodes, and statistical analysis. The Predictive Analytics Modeler will use tools for market

Predictive analytics software identifies insights in data Analytics software is vastly superior to Excel 37 Corvelle Drives Concepts to Completion Recommendations Communicate predictive analytics benefits Use predictive analytics software to: -Improve communication -Increase return on assets -Reduce the risk of unprofitable investments 38

enabled only by predictive analytics. Predictive analytics is an advanced form of data analytics that utilizes a large number of variables based on both internal and external data sources and leverages advanced statistical tools as well as specialized analytical techniques to predict likely future outcomes. Predictive analytics lays the .

In-Database Analytics: Predictive Analytics, Oracle Exadata and Oracle Business Intelligence Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics . 12 years ―stem celling analytics‖ into Oracle Designed advanced analytics into database kernel to leverage relational

organization. Upon reading this paper, you should be able to get started crafting a predictive analytics program and choosing partners who can ensure your success. PREDICTIVE ANALYTICS PRESENTS IMPORTANT USE CASES DRIVING COSTS DOWN AND QUALITY UP Healthcare presents the perfect storm for predictive analytics. The digitalization of the clinical