Predictive Analytics With Social Media Data - INET Oxford

1y ago
19 Views
2 Downloads
1.19 MB
14 Pages
Last View : 17d ago
Last Download : 3m ago
Upload by : Xander Jaffe
Transcription

20Predictive Analytics with SocialMedia DataN i e l s B u u s L a s s e n , L i s b e t h l a C o u r,a n d R a v i Va t r a p uThis chapter provides an overview of theextant literature on predictive analytics withsocial media data. First, we discuss the difference between predictive vs. explanatorymodels and the scientific purposes for andadvantages of predictive models. Second, wepresent and discuss the foundational statistical issues in predictive modelling in generalwith an emphasis on social media data.Third, we present a selection of papers onpredictive analytics with social media dataand categorize them based on the applicationdomain, social media platform (Facebook,Twitter, etc.), independent and dependentvariables involved, and the statistical methods and techniques employed. Fourth andlast, we offer some reflections on predictiveanalytics with social media data.IntroductionSocial media has evolved into a vital constituent of many human activities. We increasinglyBK-SAGE-SLOAN QUAN-HAASE-160238-Chp20.indd 328share several aspects of our private, interpersonal, social, and professional lives onFacebook, Twitter, Instagram, Tumblr, andmany other social media platforms. The resulting social data is persistent, archived, and canbe retrieved and analyzed by employing avariety of research methods as documentedin this handbook (Quan-Haase & Sloan,Chapter 1, this volume). Social data analyticsis not only informing, but also transformingexisting practices in politics, marketing,investing, product development, entertainment, and news media. This chapter focuseson predictive analytics with social mediadata. In other words, how social media datahas been used to predict processes and outcomes in the real world.Recent research in the field ofComputational Social Science (CioffiRevilla, 2013; Conte et al., 2012; Lazer et al.,2009) has shown how data resulting from thewidespread adoption and use of social mediachannels such as Facebook and Twitter can beused to predict outcomes such as Hollywood23/09/16 5:06 PM

Predictive Analytics with Social Media Datamovie revenues (Asur & Huberman, 2010),Apple iPhone sales (Lassen, Madsen, &Vatrapu, 2014), seasonal moods (Golder& Macy, 2011), and epidemic outbreaks(Chunara, Andrews, & Brownstein, 2012).Underlying assumptions for this researchstream on predictive analytics with socialmedia data (Evangelos et al., 2013) are thatsocial media actions such as tweeting, liking,commenting and rating are proxies for user/consumer’s attention to a particular object/product and that the shared digital artefactthat is persistent can create social influence(Vatrapu et al., 2015).Predictive Models vs.Explanatory ModelsAt the outset, we find that the differencebetween predictive and explanatory modelsneeds to be emphasized. Predictive analyticsentail the application of data mining, machinelearning and statistical modelling to arrive atpredictive models of future observations as wellas suitable methods for ascertaining the predictive power of these models in practice (Shmueli& Koppius, 2011). Consequently, predictiveanalytics differ from explanatory models in thatthe latter aims to: (1) draw statistical inferencesfrom validating causal hypotheses about relationships among variables of interest, and; (2)assess the explanatory power of causal modelsunderlying these relationships (Shmueli, 2010).This crucial distinction between explanatoryand predictive models is best surmised byShmueli & Koppius (2011) in the followingstatement: “whereas explanatory statisticalmodels are based on underlying causal relationships between theoretical constructs, predictivemodels rely on associations between measurable variables” (p. 556). For example, in politicalscience, explanatory models have investigatedthe extent to which social media platforms suchas Facebook can function as online publicspheres (Robertson & Vatrapu, 2010; Vatrapu,Robertson, & Dissanayake, 2008) in terms ofBK-SAGE-SLOAN QUAN-HAASE-160238-Chp20.indd 329329users’ interactions and sentiments (Hussain,Vatrapu, Hardt, & Jaffari, 2014; Robertson,Vatrapu, & Medina, 2010a,b). On the otherhand, predictive models in political sciencesought to predict election outcomes from socialmedia data (Chung & Mustafaraj, 2011; Sang& Bos, 2012; Skoric, Poor, Achananuparp,Lim, & Jiang, 2012; Tsakalidis, Papadopoulos,Cristea, & Kompatsiaris, 2015).Distinguishing between explanation andprediction as discrete modelling goals,Shmueli & Koppius (2011) argued that anymodel, which strives to embrace both explanation and prediction, will have to trade-offbetween explanatory and predictive power.More specifically, Shmueli & Koppius(2011) claim that predictive analytics canadvance scientific research in six scenarios:(1) generating new theory for fast-changingenvironments which yield rich datasets aboutdifficult-to-hypothesize relationships andunmeasured-before concepts; (2) developing alternate measures for constructs; (3)comparing competing theories via tests ofpredictive accuracy; (4) augmenting contemporary explanatory models through capturingcomplex patterns which underlie relationships among key concepts; (5) establishingresearch relevance by evaluating the discrepancy between theory and practice; and (6)quantifying the predictability of measureablephenomena.This chapter discusses predictive modelling of (big) social media data in social sciences. The focus will be entirely on what isoften referred to as predictive models: models that use statistical and/or mathematicalmodelling to predict a phenomenon of interest. Furthermore, the focus will be on prediction in the sense of forecasting a futureoutcome of the phenomenon of interest assuch predictions are the ones that have so farreceived most attention in the literature. Toillustrate the concepts, models, methods andevaluation of results we use examples fromeconomics and finance. The general principles are, however, easily employed to othersocial science fields as well, for example,23/09/16 5:06 PM

330The SAGE Handbook of Social Media Research Methodsmarketing. The concepts and principlesthat this section discusses are of a generalnature and are informed by Hyndman &Athanasopoulos (2014) and Chatfield (2002).This chapter does not discuss applicablesoftware solutions. However, it is worth mentioning that there exist quite a few softwarepackages with more or less automatic searchprocedures when it comes to model specification. A few ones are, for example, SAS, SPSSand the Autometrics package of OxMetrics.Predictive Modelling of SocialMedia DataWhen performing predictive analysis on socialmedia data researchers often have to make alot of decisions along the way. Examples ofthe most important decisions or choices willbe discussed in the sections below.The phenomenon of interest andthe type of forecastsQuite often the focus will be on a single outcome (univariate modelling – one modelequation) where the goal is to derive a prediction or forecast of, for example, sales in acompany or the stock price of the company.In some cases, more than one outcome willbe of interest and then a multivariate approachin which more than one relationship or modelequation is specified, estimated, and used atthe same time is worth considering. Fromnow on let us assume that the phenomenon ofinterest is sales of a company and the socialmedia data are among the factors that areconsidered as explanatory for the outcome.The discussion will then relate to the univariate case. At this stage, a decision is alsonecessary in relation to the data frequency. Isthe predictive model supposed to be appliedto forecast monthly sale, quarterly sales orsales of an even higher frequency like weeklyor daily?BK-SAGE-SLOAN QUAN-HAASE-160238-Chp20.indd 330The dataOnce the phenomenon of interest is identified, decisions concerning the data to be usedhave to be made. Data can be of differenttypes: time series (e.g. sales per month orsales per day), cross sectional (e.g. individuals such as customers, for a given period intime) or longitudinal/panel (a combination ofthe former two such as a set of customersobserved through several months). Predictivemodels can be relevant for all these types ofdata and many of the basic principles foranalysis are quite similar. In the remainingparts of this section, for simplicity the focuswill be on time series only.As social media data have been growingin volume and importance during the last10 years, in some cases the final number ofobservations for modelling may be ratherlimited as the dependent variable may reflectaccounting and book-keeping and be relatively low-frequency like monthly or quarterly in nature. If this is the case, there may bea limit to how advanced models can be used.In other cases, daily data may be available andmore complex models may be considered.The frequency of the data is also important for model specification itself. With morehigh frequency data, a researcher may discover more informative dynamic patternscompared to a case with less frequent data.Consider a case where sales of a companyneed to be forecasted. If the reaction timefrom increased activity on the Facebook pageof the company to changes in sales is short(e.g. just a couple of days) then if sales areavailable only on a monthly basis the lagpattern between explanatory factors and outcome may be difficult to identify and use.In many cases there will be a large set ofpotential explanatory factors that may beincluded in various tentative model specifications. Social media data may be just a partof such data and it will be important to alsoinclude other variables. The quality as wellas the quantity of data is very important forbuilding a successful predictive model.23/09/16 5:06 PM

Predictive Analytics with Social Media DataSocial media dataand pre-processingWhen researchers consider using socialmedia data for predictive purposes, at theoutset the social media data will be collectedat the level of the individual action (e.g. aFacebook ‘like’ or a tweet) and in order toprepare the data to enter a predictive modelsome pre-processing will be necessary. Oftenthe data will need to be temporally aggregated to match the temporal aggregationlevel of the outcome, for example, monthlydata. Also as some of the inputs from socialmedia are text variables, some filtering, interpretation, and classification may be necessary. An example of the latter would be theapplication of a supervised machine learningalgorithm that classifies the posts and comments into positive, negative or neutral sentiments (Thelwall, Chapter 32, this volume).At the current moment it is mainly the preprocessing of the social media data that isconsidered challenging from the computational aspects of big data analytics (Council,2013). Once the individual actions (posts,likes, etc.) are temporally aggregated andclassified, the set of potential explanatoryfactors are usually rather limited and as theoutcome variables are of fairly low frequencies like monthly or quarterly (stock marketdata are actually sometimes used at a dailyfrequency) which means that the modellingprocess deviates less from more classicalapproaches within predictive modelling.In search of a model equation –theory-based versus data-driven?In very general terms a model equation willidentify some relationship between the phenomenon of interest (y) and a set of explanatory factors. The relationship will never beperfect either due to un-observable factors,measurement errors or other types of errors.The general equation: y f (explanatory factors) errorBK-SAGE-SLOAN QUAN-HAASE-160238-Chp20.indd 331331Where f describes some relationship betweenwhat is inside the parenthesis and y.In principle, linear, non-linear, parametric,non-parametric and semi-parametric modelsmay be considered. In general, non-linearmodels will require more data points/observations than linear models as the structuresthey search for are more complex.There is a range of possible starting pointsfor the search process. At one end lies traditional econometrics where the starting pointis often an economic or behavioural theorythat will guide the researcher in finding a setof potential explanatory factors. At the otherend of the range machine learning algorithmswill help identify a relationship from a largeset of social media data and other potentialexplanatory factors. The advantage of starting from a theory-based model specificationis that the researcher may be more confidentthat the model is robust in the sense that theidentified relationship is reliable at least forsome period of time. Without a theory theidentified structure may still work for predictions in the short run but may be lessrobust and in general will not add much toan understanding of the phenomenon at hand.In between pure theoretically inspired models and models based on data pattern discoveries are many models that include elementsof both categories. As theoretical models areoften more precise when it comes to selection of explanatory factors for the more fundamental or long-run relationships they maybe less precise when it comes to a descriptionof dynamics and a combination that allowsfor a primary theoretically based long-runpart may prove more useful.To finalize the discussion of theory-basedversus data-driven model selection the concept of causality is often useful. If a causalrelationship exists a change in an explanatoryfactor is known to imply a change in the outcome. A model that suffers from a lack of acausal relationship suffers from an endogeneity problem (a concept used in econometrics).A model that suffers from an endogeneity problem will not be useful for tests of a23/09/16 5:06 PM

332The SAGE Handbook of Social Media Research Methodstheory of for policy evaluations. If the onlypurpose of the model is forecasting, identification of a causal relationship is of lessimportance as a strong association betweenthe explanatory factors and the outcome maybe sufficient. However, without causalitythe predictive model may be considered lessrobust (more risk of a model break-down) togeneral changes in structures and society andhence may be best at forecasting in the shortrun. If this is the case, some sort of monitoring on a continuous basis to identify a modelbreak-down at an early stage is advisable.Fitting of a predictive modelIn this step the researcher will adapt themathematical specification of the predictivemodel to the actual data. In the case of alinear regression model this is done by estimation using the ordinary least squares(OLS) method or the maximum likelihood(ML). For non-linear models such as neuralnetworks, some mathematical algorithm isused. In rare cases estimation of a modelis not possible (e.g. in case of perfect multicollinearity of a linear regression model). Insuch a case the researcher has to re-think themodel specification.Estimation (the use of a formula or a procedure) may in itself sound simple, but alreadyat this stage the researcher has to specify theset-up to be used for model evaluation in thefollowing step as they are highly dependent.Even though it may seem natural to useas many data point as possible for the modelfitting, there are other considerations to takeinto account as well. For the estimation step,it is stressed that in addition to the decisionof estimation or fitting method, a decision onexactly which sample or part of the sample touse for estimation is of importance too.Evaluation of a predictive modelfor forecasting purposesThe true test of a predictive model that is tobe used for forecasting of future values of theBK-SAGE-SLOAN QUAN-HAASE-160238-Chp20.indd 332outcome of interest is by investigating theout-of-sample properties of the model.This statement calls for the need of an estimation (or training) sample and an evaluation(or test) sample. As a good in-sample modelfit does not ensure good forecasting properties of a predictive model, the evaluationprocess then naturally starts by an analysisof the in-sample properties of the model andextends to an out-of-sample analysis.In-sample evaluation of the modelThe first thing to note is that if the model hasa theoretical foundation the signs of the estimated coefficients will be compared to thesigns expected from the theory.A second thing to be aware of is whetherthe model fulfils the underlying statisticalassumptions (these may differ depending onthe type of model in focus). In classical linearregression modelling, problems such as autocorrelation and heteroscedasticity will needattention and a study of potential outliers isof high importance. When forecasting is thefinal purpose of the model multicollinearityis of less importance. Finally, indicators inrelation to the functional form specificationmay provide useful information on how toimprove the model.The overall fit of the model may be captured by measures such as R2, adjusted R2,the family for measures based on absoluteor squared errors (e.g. MSE, RMSE, MAE,MAPE), and information criteria such asAIC, and BIS. A small warning is justifiedhere as too much emphasis on obtaining agood fit may result in overfitting of the modelwhich is not necessarily desirable when thepurpose of the model is forecasting.Out-of-sample evaluationFor an out-of-sample evaluation study themodel is used to forecast values for a timeperiod that was not used for the estimationof the model. In the ‘pure’ case neither23/09/16 5:06 PM

Predictive Analytics with Social Media Datafuture values of the explanatory factorsnor future values of the outcome are knownand the model that is used to obtain the forecast will need to rely on lagged values of theexplanatory factors or to use predicted valuesof the explanatory factors. In the former case,the specification of the model equation interms of lags will set a limit to how manyperiods into the future the model can predict.In many cases an out-of-sample forecastevaluation will rely on sets of one step aheadpredictions, but predictions for a longer forecast horizon (e.g. six months ahead for amodel specified with monthly data) are alsosometimes considered.Once the out-of-sample forecasts areobtained it is possible to calculate forecasterrors and to study their patterns. Focusareas will be of directional nature (the trendin the outcome captured), as they may berelated to predictability of turning pointsand summary measures for the errors willagain prove useful (e.g. MSE, MAPE, etc.)but this time for the forecasted period only.The idea of splitting the sample into differentparts for evaluation can be extended in various ways using cross-validation (Hyndman& Athanasopoulos, 2014).Using a predictive model forforecasting purposesOnce a model has been chosen some considerations concerning its implementationare important. This topic is very muchrelated to the overall phenomenon and problem; hence a general discussion is difficultto provide.There is, however, one type of considerations that deserves mentioning: how oftenthe model needs re-estimation or specification updating. Given that often the generaldata pattern is quite robust, the specificationupdating may only take place in case of newvariables becoming available or in case a sufficiently large number of data points havebecome available such that more complexstructures could be allowed for.BK-SAGE-SLOAN QUAN-HAASE-160238-Chp20.indd 333333Finally, from a practical perspective acombination of forecasts from different basicpredictive models is also a possibility andquite popular in certain fields.Categorized List of PredictiveModels with Social Media DataTable 20.1 below presents a selected list ofresearch papers on predictive analytics withsocial media data categorized across different application domains in terms of socialmedia platform (Facebook, Twitter, etc.) andthe independent and dependent variablesinvolved. For conceptual exposition and literature review on the predictive power ofsocial media data (see Gayo-Avello et al.(2013)).Application DomainsAs can be seen from Table 20.1, there havebeen many predictive models of sales basedon social media data. Such predictive modelswork for the brands that can command largeamounts of human attention on social media,and therefore generate big data on socialmedia. Examples are iPhone sales, H&Mrevenues, Nike sales, etc., which are all product categories around which there is a possibility to have large volumes and ranges ofopinions on social media platforms. Forbrands and products that don’t generate largevolumes of social media data, for instance,insurance, banking, shipping, basic household supplies, etc. the predictive models tendnot to work. One explanation for the successful performance of the predictive models isthat social media actions can be categorizedinto the phases of the different domain-specific models from the application domains ofmarketing, finance, epidemiology, etc. Forexample, the actual stock price for Apple isin rough terms mainly based on discountedhistorical sales and expectations to futuresales. If social media can model sales, then23/09/16 5:06 PM

BK-SAGE-SLOAN QUAN-HAASE-160238-Chp20.indd 334Political election outcomePolitical alignmentMovie Academy Award winnersDetecting influenza outbreaksMany types of salesTotal number of citationsGoogle TrendsTwitterTwitterIMDB, Flixster, YahooMovies, HSX, i & Varian (2012)Chung & Mustafaraj (2011)Conover, Gonçalves, Ratkiewicz,Flammini, & Menczer (2011)Bothos, Apostolou, & Mentzas(2010)Radosavljevic, Grbovic, Djuric, &Bhamidipati (2014)Ritterman, Osborne, & Klein (2009)Gruhl, Guha, Kumar, Novak, &Tomkins (2005)Jansen, Zhang, Sobel, & Chowdury(2009)Li & Cardie (2013)Culotta (2010)Dijkman, Ipeirotis, Aertsen, & vanHelden (2015)Eysenbach (2011)Sales of cars, homes and travelTwitterTwitterGoogle TrendsGoogle TrendsLassen et al. (2014)Bollen & Mao (2011)Voortman (2015)Vosen & Schmidt (2011)Stock-PricesTwitterEarly stage influenza detectionTwitterSport results and number of goalsBrand variablesTwitterTumblrSalesBlogsiPhone salesDow Jones Industrial AverageCar salesConsumer spendingMovie revenueTwitterAsur & Huberman (2010)Dependent VariablesSocial DataReferenceHistorical prices, unigrams and bigrams,Twitter activityTeam and player mentionsTwitter texts about fluTwitter sentiment variablesTwimpact variable (number of tweetationswithin n days after publication)Product/brand mentionsTwitter keywordsTwitter activity and sentimentMeasures from IMDB, Flixster, Yahoomovies,HSX, Twitter, RottenTomatoes.comTwitter hashtagsTwitter collective sentimentTwitter activity, sentiment and theatredistributionTwitter activity and sentimentCalm, Alert, Sure, Vital, Kind and HappyGoogle trend data car namesReal personal income y, interest rates on3-month Treasury Bills I and stock pricess (measured on S&P 500), Google Trend,and consumer spending t-1Historical sales and Google trend variableIndependent VariablesTable 20.1 Categorization of Research Publications on Predictive Analytics with Social Media DataUnsupervised Bayesian Model based onMarkov NetworkPoisson Regression Model using MaximumLikelihood PrincipleSVR Regression using Unigrams andBigramsTime-Series Linear Regression ModelsTime-Series using Cross-CorrelationMulti-Variate/Linear RegressionTime-Series Multiple Regression ModelTime-Series Multiple Regression ModelMultivariate Distribution ModelsSVM trained on hashtag metadataSimple Seasonal AR Models and FixedEffects ModelsLinear RegressionTime-Series Multiple Regression ModelTime-Series Multiple Regression ModelTime Series Linear Regression ModelARIMA/Time Series Multiple RegressionModelTime-Series Multiple Regression ModelStatistical MethodsThe SAGE Handbook of Social Media Research Methods23/09/16 5:06 PM

BK-SAGE-SLOAN QUAN-HAASE-160238-Chp20.indd 33523/09/16 5:06 PMGilbert & Karahalios (2009)Won et al. (2013)Mao, Counts, & Bollen (2014)Bollen, Mao, & Zeng (2011)Bollen, Mao, & Pepe (2011)Eichstaedt et al. (2015)De Choudhury, Gamon, Counts, &Horvitz (2013)De Choudhury, Counts, & Horvitz(2013)De Choudhury, Counts, Horvitz, &Hoff (2014)Weeks & Holbert (2013)Hughes, Rowe, Batey, & Lee (2012)Krauss, Nann, Simon, Gloor, &Fischbach (2008)Seiffertt & Wunsch (2008)Tang & Liu (2010)Karabulut (2013)Facebook, Twitter, YouTube Dissemination of News Content inSocial MediaFacebookTie strengthWeblog social media data TwitterGender, age, web engine news search, emailnews activity and cell phone activity15 communication variablesSuicide related words and mentionsMany types discussedSocial Dimension variablesFacebook GNH (General national happiness),positivity, negativityUK, US, and Canadian stock markets "Bullish" or "bearish" mentions on TwitterDJIATwitter moods and feelingsSocio-economic eventsTwitter moods and feelingsHeart attacksAnger, stress and fatigueDepressionLanguage, emotion, style, ego-network, anduser engagementPostpartum changes in emotion and Engagement, emotion, ego-network andbehaviourlinguistic stylePostpartum depressionSocial activity and interactionVariables on financial marketsOnline behaviorsStock pricesTime-Series Multiple Regression ModelTime-Series Multiple Regression ModelDecision Tree ModelOLS Regression ModelSupport Vector MachineTime-Series Multiple Regression ModelSelf-Organizing Fuzzy Neural NetworkExtended version of: Profile of mood statesTime-Series Multiple Regression ModelSupport Vector MachineTime Series Multiple Regression ModelDifferent Model Types DiscussedSocioDim, several advanced models combinedTime-Series Multiple Regression ModelTwitter textsTime Series Multiple Regression ModelDecision tree, KAURI/LINDEN methodTime Series Linear Regression ModelLinear Regression (LR), GaussianProcess (GP) and Sequential MinimalOptimization for Regression (SMO)Probability ModelsSeveralFlickr and YouTubeFacebookElection outcomes GermanyTwitter texts and sentimentsTwitter textsTwitter activityTwitter textsTime-Series Multiple Regression ModelTime-Series Multiple Regression ModelTwitterTumasjan, Sprenger, Sandner, &Welpe (2010)Yu, Duan, & Cao (2013)Dutch election outcomeEntity belongingElection outcome SingaporeElection outcomes EUGoogle blogs, Boardreader Firm equity valueVariables for activity and sentimentand Twitter comparedto Google NewsTwitter and FacebookSocialising and info exchangeBig5 personality traits, NFC and sociabilityForumsMovie success and academy awards Intensity, positivity and trendsetter variablesTwitterTwitterTwitterTwitterSang & Bos (2012)Shen, Wang, Luo, & Wang (2013)Skoric et al. (2012)Tsakalidis et al. (2015)

336The SAGE Handbook of Social Media Research Methodsthere is a high potential for the associatedstock price to also being modelled withsocial media data. In the case of epidemiology, all social media texts on flu can also becategorized in to the different domain-specific phases of spread, incubation, immunity,resistance, susceptibility etc.Social Media Data TypesFor modelling stock prices, Twitter andGoogle Trends have proven to be the bestplatforms. Twitter and Google Trends beatFacebook for stock price modelling becauseof higher data volume and immediacy. Onthe other hand, Facebook data have been successfully used for modelling sales, humanemotions, personalities and human relationsto a brand. In general, picture and videobased social media platforms such asInstagram, YouTube and Netflix are becoming more prevalent and we expect them tobecome more relevant for predictive modelsin the future.Independent and DependentVariablesAs can be seen from Table 20.1, a wide rangeof dependent variables have been modelled:sales, stock prices, Net Promoter Score, happiness, feelings, personalities, interest areas,social groups, diseases, epidemics, suicide,crime, radicalization, civil unrest. The independent variables used reflect the humansocial relations to the dependent variablesmainly consist of measures of social mediaactivity, feelings, personalities andsentiment.Statistical Methods EmployedWe find that a wide range of statistical modelsfor predictive analytics have been used including Regression, Neural Network, SVM,BK-SAGE-SLOAN QUAN-HAASE-160238-Chp20.indd 336Decision Trees, ARIMA, Dynamic Systems,Bayesian Networks, and combined models.In the next section, we present an illustrative case study of predictive modelling withbig social data.An Illustrative Case Study ofPredictive ModellingIn this section, we demonstrate how socialmedia data from Twitter and Facebook canbe used to predict the quarterly sales ofiPhones and revenues of clothing retailer,H&M, respectively. Based on a conceptualmodel of social data (Vatrapu, Mukkamala,& Hussain, 2014) consisting of Interactions(actors, actions, activities, and artifacts) andConversations (topics, keywords, pronouns,and sentiments), and drawing from thedomain-specific theories in advertising andsales from marketing (Belch, Belch, Kerr, &Powell, 2008), we developed and evaluatedlinear regression models that transform (a)iPhone tweets into a prediction of the quarterly iPhone sales with an average error closeto the established prediction models frominvestment banks (Lassen et al., 2014) and(b) Facebook likes into a prediction of theglobal revenue of the fast fashion company,H&M. Our basic premise is that social mediaactions can serve as proxies for user’s attention and as such have predictive power. Thecentral research question for this demonstrative case study was: To what extent can BigSocial Data predict real-world outcomessuch as sales and revenues? Table 20.2below presents the dataset collected for predictive analytics purposes of this case study.We adhered to the methodological schematic recommended by Shmueli & Koppius(2011) for building empirical predictivemodels. We built on and extended the predictive analytics method of Asur & Huberman(2010) and examined if the principles forpredicting movie revenue with Twitter datacan also be used to predict iPhone sales and23/09/16 5:06 PM

Pr

extant literature on predictive analytics with social media data. First, we discuss the dif-ference between predictive vs. explanatory models and the scientific purposes for and advantages of predictive models. Second, we present and discuss the foundational statisti-cal issues in predictive modelling in general with an emphasis on social media .

Related Documents:

SAP Predictive Analytics Data Manager Automated Modeler Expert Modeler (Visual Composition Framework) Predictive Factory Hadoop / Spark Vora SAP Applications SAP Fraud Management SAP Analytics Cloud HANA Predictive & Machine Learning Spatial Graph Predictive (PAL/APL) Series Data Streaming Analytics Text Analytics

predictive analytics and predictive models. Predictive analytics encompasses a variety of statistical techniques from predictive modelling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events. When most lay people discuss predictive analytics, they are usually .

PrEdictivE social analytics FacEs HigH HurdlEs PrEdictivE lEad scoring MakEs tHE Winning sHot corral the Future With Predictive analytics The technology has the potential to convert raw data into game-changing insights—but new challenges, like harnessing social media data, are a rodeo ride IT has yet to master.

The Predictive Analytics Modeler career path prepares students to learn the essential analytics models to collect and analyze data efficiently. This will require skills in predictive analytics models, such as data mining, data collection and integration, nodes, and statistical analysis. The Predictive Analytics Modeler will use tools for market

Predictive analytics software identifies insights in data Analytics software is vastly superior to Excel 37 Corvelle Drives Concepts to Completion Recommendations Communicate predictive analytics benefits Use predictive analytics software to: -Improve communication -Increase return on assets -Reduce the risk of unprofitable investments 38

enabled only by predictive analytics. Predictive analytics is an advanced form of data analytics that utilizes a large number of variables based on both internal and external data sources and leverages advanced statistical tools as well as specialized analytical techniques to predict likely future outcomes. Predictive analytics lays the .

what does predictive analytics mean in social media and its role in the future of social media. . Zishan is a SAS certified predictive modeler with over 4 years' analytics consulting experience for global players in banking, insurance and retail domain. Zishan is currently part of Target Corporation's enterprise business intelligence team .

Vincent is a Scrum Master, Agile Instructor, and currently serves as an Agile Delivery Lead at a top US bank. Throughout his career he has served as a Scrum Master and Agile Coach within start-ups, large corporations, and non-profit organizations. In his spare time he enjoys watching old movies with family. Mark Ginise AGILE ENGINEER AND COACH Mark Ginise leads Agility training for the federal .