Tamkang UniversityTamkangUniversityPractices of Business IntelligenceII(Predictive Analytics II:Text, Web, and Social Media Analytics)1071BI07MI4 (M2084) (2888)Wed, 7, 8 (14:10-16:00) (B217)Min-Yuh DayAssistant ProfessorDept. of Information Management, Tamkang Universityhttp://mail. tku.edu.tw/myday/2018-10-311
(Syllabus)(Week)1 2018/09/12(Date)(Subject/Topics)(Course Orientation for Practices of Business Intelligence)2 2018/09/19(Business Intelligence, Analytics, and Data Science)3 2018/09/26(ABC: AI, Big Data, and Cloud Computing)4 2018/10/03I(Descriptive Analytics I: Nature of Data, Statistical Modeling,and Visualization)5 2018/10/106 2018/10/17(II) (National Day) (Day off)(Descriptive Analytics II: Business Intelligence andData Warehousing)2
(Syllabus)(Week)7 2018/10/24(Date)(Subject/Topics)I(Predictive Analytics I: Data Mining Process,Methods, and Algorithms)8 2018/10/31II(Predictive Analytics II: Text, Web, andSocial Media Analytics)9 2018/11/07(Midterm Project Report)10 2018/11/14(Midterm Exam)11 2018/11/21(Prescriptive Analytics: Optimization and Simulation)12 2018/11/28(Social Network Analysis)3
(Syllabus)(Week)13 2018/12/05(Date)(Subject/Topics)(Machine Learning and Deep Learning)14 2018/12/1215161718(Natural Language Processing)2018/12/19 AI(AI Chatbots and Conversational Commerce)2018/12/26(Future Trends, Privacy andManagerial Considerations in Analytics)2019/01/02(Final Project Presentation)2019/01/09(Final Exam)4
Business Intelligence (BI)1Introduction to BI and Data Science2Descriptive Analytics3Predictive Analytics4Prescriptive Analytics5Big Data Analytics6Future Trends5
Predictive Analytics II:Text, Web, andSocial Media Analytics6
Outline Text Analytics and Text Mining Overview– Natural Language Processing (NLP)– Text Mining Applications– Text Mining Process– Sentiment Analysis Web Mining Overview– Search Engines– Web Usage Mining (Web Analytics) Social Analytics7
A High-Level Depiction ofDeepQA ArchitectureSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson8
Text Analytics and Text MiningSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson9
Text Analytics Text Analytics Information Retrieval Information Extraction Data Mining Web Mining Text Analytics Information Retrieval Text MiningSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson10
Text mining Text Data Mining Knowledge Discovery inTextual DatabasesSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson11
Application Areas of Text Mining Information extractionTopic pt linkingQuestion answeringSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson12
Natural Language Processing (NLP) Natural language processing (NLP) is animportant component of text mining andis a subfield of artificial intelligence andcomputational linguistics.Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson13
Natural Language Processing (NLP) Part-of-speech taggingText segmentationWord sense disambiguationSyntactic ambiguityImperfect or irregular inputSpeech actsSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson14
NLP Tasks Question answeringAutomatic summarizationNatural language generationNatural language understandingMachine translationForeign language readingForeign language writing.Speech recognitionText-to-speechText proofingOptical character recognitionSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson15
Text-Based Deception-DetectionProcessSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson16
Multilevel Analysis of Text forGene/Protein Interaction IdentificationSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson17
Context Diagram for theText Mining ProcessSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson18
The Three-Step/TaskText Mining ProcessSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson19
Term–Document MatrixSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson20
A Multistep Process to Sentiment AnalysisSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson21
Sentiment nessresolution n SpamDetectionMulti- & CrossLingual SCCross-domainSCApproachesMachine LearningbasedLexicon basedHybridapproachesLexiconCreationOntology tionSource: Kumar Ravi and Vadlamani Ravi (2015), "A survey on opinion mining and sentiment analysis: tasks, approaches and applications."Knowledge-Based Systems, 89, pp.14-46.22
Sentiment Classification SupervisedLearningLinearClassifiersSupport VectorMachine (SVM)Neural gLexiconbasedApproachDecision asedApproachDeep Learning(DL)Naïve Bayes(NB)BayesianNetwork (BN)MaximumEntropy (ME)StatisticalCorpus-basedApproachSemanticSource: Jesus Serrano-Guerrero, Jose A. Olivas, Francisco P. Romero, and Enrique Herrera-Viedma (2015),"Sentiment analysis: A review and comparative analysis of web services," Information Sciences, 311, pp. 18-38.23
Example of Opinion:review segment on iPhone“I bought an iPhone a few days ago.It was such a nice phone.The touch screen was really cool.The voice quality was clear too.However, my mother was mad with me as I did not tellher before I bought it.She also thought the phone was too expensive, andwanted me to return it to the shop. ”Source: Bing Liu (2011) , “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,24
Example of Opinion:review segment on iPhone“(1) I bought an iPhone a few days ago.(2) It was such a nice phone. Positive(3) The touch screen was really cool.Opinion(4) The voice quality was clear too.(5) However, my mother was mad with me as I did nottell her before I bought it.(6) She also thought the phone was too expensive, andwanted me to return it to the shop. ”-NegativeOpinionSource: Bing Liu (2011) , “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,25
P–N Polarity andS–O Polarity RelationshipSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson26
Taxonomy of Web MiningSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson27
Structure of aTypical Internet Search EngineSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson28
Web Usage Mining(Web Analytics) Web usage mining (Web analytics)is the extraction of useful informationfrom data generatedthrough Web page visits and transactions. Clickstream AnalysisSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson29
Extraction of Knowledge fromWeb Usage DataSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson30
Web Analytics DashboardSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson31
Social Analytics Social analytics is defined asmonitoring, analyzing,measuring and interpretingdigital interactions andrelationships of people, topics, ideas andcontent.Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson32
Branches of Social AnalyticsSocial Network Analysis(SNA)Social AnalyticsSocial Media AnalyticsSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson33
Evolution ofSocial Media User EngagementSource: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson34
Python in Google DnGvwfUbeo4zJ1zTunjMqf2RkCrT35
Text ClassificationSource: des/text-classification/36
Text Classification Workflow Step 1: Gather DataStep 2: Explore Your DataStep 2.5: Choose a Model*Step 3: Prepare Your DataStep 4: Build, Train, and Evaluate Your ModelStep 5: Tune HyperparametersStep 6: Deploy Your ModelSource: des/text-classification/37
Text Classification FlowchartSource: des/text-classification/step-2-538
Text Classification S/W 1500: N-gramSource: des/text-classification/step-2-539
Text Classification S/W 1500: SequenceSource: des/text-classification/step-2-540
Step 2.5: Choose a ModelSamples/Words 1500150,000/100 1500IMDb review dataset,the samples/words-per-sample ratio is 144Source: des/text-classification/step-2-541
Step 2.5: Choose a ModelSamples/Words 15,0001,500,000/100 15,000Source: des/text-classification/step-2-542
Step 3: Prepare Your DataTexts:T1: ’The mouse ran up the clock'T2: ’The mouse ran down’Token Index:{'the': 1, 'mouse’: 2, 'ran': 3, 'up': 4, 'clock': 5, 'down': 6,}.NOTE: 'the' occurs most frequently,so the index value of 1 is assigned to it.Some libraries reserve index 0 for unknown tokens,as is the case here.Sequence of token indexes:T1: ‘The mouse ran up the clock’ [1, 2,3, 4, 1, 5]T1: ’The mouse ran down’ [1,2,3, 6]Source: des/text-classification/step-343
One-hot encoding'The mouse ran up the clock’ Themouseranuptheclock123415[ 0,0,0,0,0,1,0,0,0,0,0,0,0,1,0],0],0],0],0],0] ][0, 1, 2, 3, 4, 5, 6]Source: des/text-classification/step-344
Word embeddingsSource: des/text-classification/step-345
Word embeddingsSource: des/text-classification/step-346
t1 'The mouse ran up the clock't2 'The mouse ran down's1 t1.lower().split(' ')s2 t2.lower().split(' ')terms s1 s2sortedset sorted(set(terms))print('terms ', terms)print('sortedset ', 1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT47
t1 'The mouse ran up the clock't2 'The mouse ran down's1 t1.lower().split(' ')s2 t2.lower().split(' ')terms s1 s2print(terms)tfdict {}for term in terms:if term not in tfdict:tfdict[term] 1else:tfdict[term] 1a []for k,v in tfdict.items():a.append('{}, gle.com/drive/1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT48
sorted by value reverse sorted(tfdict.items(),key lambda kv: kv[1], reverse True)sorted by value reverse dict dict(sorted by value reverse)id2word {id: word for id, word inenumerate(sorted by value reverse dict)}word2id dict([(v, k) for (k, v) om/drive/1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT49
sorted by value sorted(tfdict.items(), key lambda kv: kv[1])print('sorted by value: ', sorted by value)sorted by value2 sorted(tfdict, key tfdict.get, reverse True)print('sorted by value2: ', sorted by value2)sorted by value reverse sorted(tfdict.items(), key lambda kv: kv[1], reverse True)print('sorted by value reverse: ', sorted by value reverse)sorted by value reverse dict dict(sorted by value reverse)print('sorted by value reverse dict', sorted by value reverse dict)id2word {id: word for id, word in enumerate(sorted by value reverse dict)}print('id2word', id2word)word2id dict([(v, k) for (k, v) in id2word.items()])print('word2id', word2id)print('len words:', len(word2id))sorted by key sorted(tfdict.items(), key lambda kv: kv[0])print('sorted by key: ', sorted by key)tfstring '\n'.join(a)print(tfstring)tf T50
fromkeras.preprocessing.textimport TokenizerSource: ta-deep-learning-keras/51
fromkeras.preprocessing.textimport Tokenizerfrom keras.preprocessing.text import Tokenizer# define 5 documentsdocs ['Well done!', 'Good work', 'Great effort', 'nicework', 'Excellent!']# create the tokenizert Tokenizer()# fit the tokenizer on the documentst.fit on texts(docs)print('docs:', docs)print('word counts:', t.word counts)print('document count:', t.document count)print('word index:', t.word index)print('word docs:', t.word docs)# integer encode documentstexts to matrix t.texts to matrix(docs, mode 'count')print('texts to matrix:')print(texts to matrix)Source: ta-deep-learning-keras/52
texts to matrix t.texts to matrix(docs, mode 'count')docs: ['Well done!', 'Good work', 'Great effort’,'nice work', 'Excellent!’]word counts: OrderedDict([('well', 1), ('done', 1),('good', 1), ('work', 2), ('great', 1), ('effort', 1),('nice', 1), ('excellent', 1)])document count: 5word index: {'work': 1, 'well': 2, 'done': 3, 'good':4, 'great': 5, 'effort': 6, 'nice': 7, 'excellent': 8}word docs: {'done': 1, 'well': 1, 'work': 2, 'good': 1,'great': 1, 'effort': 1, 'nice': 1, 'excellent': 1}texts to matrix:[[0. 0. 1. 1. 0. 0. 0. 0. 0.][0. 1. 0. 0. 1. 0. 0. 0. 0.][0. 0. 0. 0. 0. 1. 1. 0. 0.][0. 1. 0. 0. 0. 0. 0. 1. 0.][0. 0. 0. 0. 0. 0. 0. 0. 1.]]Source: ta-deep-learning-keras/53
t.texts to matrix(docs, mode 'tfidf')from keras.preprocessing.text import Tokenizer# define 5 documentsdocs ['Well done!', 'Good work', 'Great effort', 'nice work','Excellent!']# create the tokenizert Tokenizer()# fit the tokenizer on the documentst.fit on texts(docs)print('docs:', docs)print('word counts:', t.word counts)print('document count:', t.document count)print('word index:', t.word index)print('word docs:', t.word docs)# integer encode documentstexts to matrix t.texts to matrix(docs, mode 'tfidf')print('texts to matrix:')print(texts to matrix)texts to matrix:[[0. 0. 1.25276297 1.25276297 0. 0. 0. 0. 0. ][0. 0.98082925 0. 0. 1.25276297 0. 0. 0. 0. ][0. 0. 0. 0. 0. 1.25276297 1.25276297 0. 0. ][0. 0.98082925 0. 0. 0. 0. 0. 1.25276297 0. ][0. 0. 0. 0. 0. 0. 0. 0. 1.25276297]]Source: ta-deep-learning-keras/54
Summary Text Analytics and Text Mining Overview– Natural Language Processing (NLP)– Text Mining Applications– Text Mining Process– Sentiment Analysis Web Mining Overview– Search Engines– Web Usage Mining (Web Analytics) Social Analytics55
References Ramesh Sharda, Dursun Delen, and Efraim Turban (2017),Business Intelligence, Analytics, and Data Science: AManagerial Perspective, 4th Edition, Pearson. Jake VanderPlas (2016),Python Data Science Handbook: Essential Tools for Workingwith Data, O'Reilly Media.56
Social Analytics Social analytics is defined as monitoring, analyzing, measuring and interpreting digital interactions and relationships of people, topics, ideas and content. 32 Source: Ramesh Sharda, DursunDelen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text
SAP Predictive Analytics Data Manager Automated Modeler Expert Modeler (Visual Composition Framework) Predictive Factory Hadoop / Spark Vora SAP Applications SAP Fraud Management SAP Analytics Cloud HANA Predictive & Machine Learning Spatial Graph Predictive (PAL/APL) Series Data Streaming Analytics Text Analytics
predictive analytics and predictive models. Predictive analytics encompasses a variety of statistical techniques from predictive modelling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events. When most lay people discuss predictive analytics, they are usually .
The Predictive Analytics Modeler career path prepares students to learn the essential analytics models to collect and analyze data efficiently. This will require skills in predictive analytics models, such as data mining, data collection and integration, nodes, and statistical analysis. The Predictive Analytics Modeler will use tools for market
Predictive analytics software identifies insights in data Analytics software is vastly superior to Excel 37 Corvelle Drives Concepts to Completion Recommendations Communicate predictive analytics benefits Use predictive analytics software to: -Improve communication -Increase return on assets -Reduce the risk of unprofitable investments 38
enabled only by predictive analytics. Predictive analytics is an advanced form of data analytics that utilizes a large number of variables based on both internal and external data sources and leverages advanced statistical tools as well as specialized analytical techniques to predict likely future outcomes. Predictive analytics lays the .
In-Database Analytics: Predictive Analytics, Oracle Exadata and Oracle Business Intelligence Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics . 12 years ―stem celling analytics‖ into Oracle Designed advanced analytics into database kernel to leverage relational
organization. Upon reading this paper, you should be able to get started crafting a predictive analytics program and choosing partners who can ensure your success. PREDICTIVE ANALYTICS PRESENTS IMPORTANT USE CASES DRIVING COSTS DOWN AND QUALITY UP Healthcare presents the perfect storm for predictive analytics. The digitalization of the clinical