A Review Of Big Data Predictive Analytics In Information .

2y ago
22 Views
2 Downloads
638.98 KB
23 Pages
Last View : 20d ago
Last Download : 3m ago
Upload by : Grant Gall
Transcription

2017 Proceedings of the Conference on Information Systems Applied ResearchAustin, Texas USAISSN: 2167-1508v10 n4512A Review of Big Data Predictive Analytics inInformation Systems ResearchAlhassan Ohiomahaohio100@uOttawa.caPavel Andreevandreev@Telfer.uOttawa.caMorad Benyoucefbenyoucef@Telfer.uOttawa.caTelfer School of ManagementUniversity of OttawaOttawa, ON K1N 6N5, CanadaAbstractBig data, with its inherent complexity, introduces new challenges for traditional business intelligenceand analytics tools, and offers opportunities for organizations to use advanced solutions to exploit theirhighly complex data. Moreover, the use of predictive analytics on big data has emerged as an importanttopic for researchers and practitioners from various disciplines. This study conducts a review of theInformation Systems (IS) literature on big data predictive analytics to identify the areas of big datapredictive analytics that have been studied and are still in need of more research focus, and proposesspecific research questions for future investigation. Overall, we found that the emergence of big datahas changed the role of predictive analytics from activities such as theory generation and validation tomore data-driven discovery of complex patterns and relationships between variables, and assessing thelikelihood of occurrence of relationships between a dataset’s variables. The outcomes of this researchcontribute to the IS literature by helping identify research gaps, approaches, and emerging directionsin big data predictive analytics, and enable practitioners to understand the potentials and applicationsof this new and important concept.Keywords: Big data, Predictive analytics, Business Intelligence, Systematic Review1. INTRODUCTIONOrganizations are witnessing a rapid growth inthe volume of data they generate daily (Watson,2014). Recent reports indicate that 4 Zettabytes(4 Trillion Gigabytes) of digital data are createdevery day (Goes, 2015). IBM reports that 90% ofthe data in the present day have been generatedin the last two to three years (IBM, 2015). Whatpoles apart is that these high volumes of data areof different variety, have different veracity,originate with different velocity and offer differentvalues, a concept now generally known as “Bigdata” (Goes, 2015; Power, 2013; Shim, French,Guo, & Jablonski, 2015). Usually, organizationsturn to their data to explore the challenges andopportunities existing within their business.However, although the emergence of big dataoffers organizations ample opportunities (PhillipsWren, Iyer, Kulkarni, & Ariyachandra, 2015),many organizations still lack an understanding ofhow to better utilize these growing amounts ofdata to their advantage (Bedeley, 2014;Koronios, Gao, & Selle, 2014; Power, 2013). This 2017 ISCAP (Information Systems & Computing Academic Professionals)http://iscap.infoPage 1

2017 Proceedings of the Conference on Information Systems Applied ResearchAustin, Texas USAISSN: 2167-1508v10 n4512is because business intelligence and analyticstools used by organizations are not usuallysufficient to handle the complexity of big data(Chen, Chiang, & Storey, 2012; Watson, 2014).Big data requires the application of advancedanalytical techniques (Chen et al., 2012; PhillipsWren et al., 2015; Watson, 2014; Wixom et al.,2014). In view of that, organizations are beingcompelled to exploit the potential of predictiveanalytics as well as other advanced businessintelligence and analytics tools to help themunravel insights from their big data (Chen et al.,2012; Deka, 2014; Gualtieri, Rowan Curran,TaKeaways, & To, 2013; Kiron & Shockley, 2011;LaValle, Lesser, Shockley, Hopkins, & Kruschwitz,2011; Watson, 2014).Predictive analytics include methods that scandata for correlations, trends, and patterns todiscover insights and make predictions of possibleoutcomes (Abbott, 2014; Delen & Demirkan,2013; Kotu & Deshpande, 2014; Watson, 2014).With predictive analytics, informed decisions aremade through a blend of data, analysis, andscientific reasoning rather than just humaninstincts or beliefs (Nettleton, 2014). Unarguably,predictive analytics have been available for awhile, predominantly as a method for validatingempirical models with small datasets gatheredmostly from surveys and interviews (Shmueli &Koppius, 2011). Nevertheless, the emergence ofbig data has increased the promise of predictiveanalytics mainly because the latter are moreeffectual on multifarious large amounts of data(Moeyersoms & Martens, 2015). Predictiveanalytics are rapidly growing because of the shiftfrom prevailing Business Intelligence tools toadvanced analytics techniques and the massivesurge of structured and unstructured data (Finlay,2014; Kotu & Deshpande, 2014).The use of predictive analytics on big data hasemerged as an important area of study for bothresearchers and practitioners across variousdisciplines, including biosciences, medicine,computer science and engineering (Deka, 2014;Sun, Zou, & Strang, 2015). While several scholarshave addressed specific research questions orbuilt predictive models for specific applications,no far-reaching research agenda has beendeveloped to understand what has beenaccomplished, how it has been accomplished andwhat remains to be accomplished when usingpredictive analytics on big data, particularly in theInformation Systems (IS) literature. Shmueli andKoppius (2011) provide an understanding of therole of predictive analytics and the need tointegrate it in IS research but the concept was notinvestigated from a big data perspective. Theirreview found that predictive analytics was mostlyapplied to small data usually gathered throughsurveys. Additionally, papers reviewed byShmueli and Koppius (2011) were publishedbetween 1990 and 2006. To the best of ourknowledge, no study has since then investigatedthe use of predictive analytics in IS research,particularly in the realm of big data.In light of all this, we believe that the literaturecan benefit from an investigation of the currentstate and role of big data predictive analytics(BDPA) in IS research. There is now a sizablebody of research to be reviewed starting from2006. Hence, there is a need to synthesize theliterature to determine what has been done andwhat is missing in this area. Also, it will beinteresting to uncover whether the era of big datahas changed the type and context of predictiveanalytics research conducted within the IS field,and whether the complexity of big data hasintroduced new predictive models, algorithms andapplication domains. This paper aims to do justthat. The current study makes three maincontributions. First, we review the interplay between bigdata, analytics and business intelligence andprovide a definition of the term “Big datapredictive analytics (BDPA)” (Backgroundsection). We believe that this is the first workthat combines unique concepts from theliterature to offer a cohesive definition of theterm.Second, we conduct a structured review ofthe academic literature on BDPA (Researchmethod section) and reveal insights onresearch contexts, topics and applications ofBDPA (Analysis section). For instance, wefound that the majority of the reviewedstudies used techniques that were notfrequently employed for predictive modellingbefore the era of big data.Third, our discussion will help researchersunderstand the current body of knowledge,identify key gaps in the literature on BDPA,and suggest several questions that can serveas a starting point for further research in thisarea (Discussion section).2. BACKGROUNDTo provide an understanding of big data, analyticsand business intelligence in the context ofdecision-making, this section describes the threeconcepts and how they relate to each other. We 2017 ISCAP (Information Systems & Computing Academic Professionals)http://iscap.infoPage 2

2017 Proceedings of the Conference on Information Systems Applied ResearchAustin, Texas USAISSN: 2167-1508v10 n4512also propose a definition of BDPA through asynthesis of definitions in the literature.In the beginning, the concept of big data denotedlarge volumes of data. The financial industry (i.e.,Stock markets, Credit institutions) has beendealing with such voluminous data since the nologiespromotedanenvironment where large amounts of data wereeasy to collect from different sources at differentspeeds. The sources of such data include sensorsof various kinds, social media posts, digitalpictures and videos, purchase transactionrecords, and cell phone GPS signals (IBM, 2015).Yet, scholars suggest that many data sourcestoday remain untapped or underutilized (Franks,2012; Watson, 2014). The size, diversity anddelivery speed of big data creates hugechallenges for organizations. Such challengesinvolve the viability of traditional businessintelligence and analytics tools, as well as theopportunities for organizations to employ cuttingedge tools to help them obtain optimum valuefrom their highly complex data. Befittingly,research on big data, analytics and businessintelligence has received growing attention fromthe academic community in the past few years(Chen et al., 2012; Phillips-Wren et al., 2015;Watson, 2014). Next, we discuss the relationshipbetween big data, analytics and businessintelligence.Big DataWatson (2014, p. 1249) defined big data as “datathat is high volume, high velocity and or highvariety which requires new technologies andtechniques to capture, store, and analyze it andis used to enhance decision making, provideinsight and discovery and support and optimizeprocesses”. Two other dimensions (i.e., Veracityand Value) have been used to characterize bigdata (Shim et al., 2015). The dimensions of bigdata offer opportunities for insight, but a realchallenge is how to turn big data into valuableinsights. Organizations constantly gathering bigdata do not directly create business value; valueis created only when big data is analyzed andutilized for decision making (Watson, 2014).AnalyticsAnalytics involve the use of iterative andmethodical techniques to discover, analyze andinterpret meaningful patterns from data (Baltzan& Welsh, 2015). Analytics support businesseswith technologies needed to analyze data,visualize it and create models to foresee futureproblems and opportunities, and tools to optimizebusiness processes (Delen & Demirkan, 2013).Big Data Analytics (Big Data Analytics) isa concept used to describe the analytics of bigdata (Chen et al., 2012; Sun et al., 2015).Analytics build on principles from data mining,statistical analysis and operations research (Chenet al., 2012). There are currently three corecategories of analytics, namely Descriptive,Predictive and Prescriptive analytics (Deka, 2014;Delen & Demirkan, 2013; Watson, 2014). In thisstudy, we only focus on Predictive analytics.Predictive analytics include methods thatinvestigate historical and current data for hiddenpatterns and relationships to predict future trendsand outcomes (Shim et al., 2015). Predictiveanalytics reveal insights on “what will happen”and “why it will happen” (Deka, 2014; Delen &Demirkan, 2013). Predictive analytics by designinclude key aspects of descriptive andprescriptive analytics as well (Hair Jr, 2007). Ituncovers relationships and patterns within data Business IntelligenceWatson (2009, p. 491) defined Businessintelligence as a “broad category of applications,technologies, and processes for gathering,storing, accessing and analyzing data to helpbusiness users make better decisions”. Theconcept includes technology, systems, practicesand applications that analyze business data tohelp organizations understand their business andmarket (Lim, Chen, & Chen, 2013). The termBusinessIntelligenceandAnalytics(Business Intelligence Analytics) gainedpopularity and was widely adopted in the early2000s because of the notion that businessintelligence was heavily dependent on analytics(Lim et al., 2013). Chen et al. (2012) ques, technologies, systems, practices,methodologies, and applications that analyzecritical business data to help an enterprise betterunderstand its business and market and maketimely business decisions”. To simplify, businessanalytics provide insights from business data tosupport intelligence for smart decisions making.Thus, business analytics is essential to gainbusiness intelligence. Together they provide toolsto convert business data into information intoknowledge for better wisdom, actions andunderstanding of a business. With a clearunderstating of the concepts and interplaybetween big data, analytics and businessintelligence, we can now investigate the focalpoint of our research “Big Data PredictiveAnalytics”. 2017 ISCAP (Information Systems & Computing Academic Professionals)http://iscap.infoPage 3

2017 Proceedings of the Conference on Information Systems Applied ResearchAustin, Texas USAISSN: 2167-1508v10 n4512Big Data Predictive Analytics (BDPA)The era of big data provides an avenue to processhighly accurate forecasts and therefore createsnew application possibilities for predictiveanalytics (Gualtieri et al., 2013). Simply put,BDPA is predictive analytics for big data (Sun etal., 2015). As an emerging research area, notmuch effort has been dedicated to explicitlydefine BDPA. To fill this gap, we identifydistinctive descriptions of big data and predictiveanalytics separately.As discussed earlier, prior research has brandedbig data as data with 3 key dimensions namelyvolume, variety and velocity (Beyer & Laney,2012; Chen et al., 2012; Watson, 2014).Additionally, veracity (Claverie-Berge, 2012;Lukoianova & Rubin, 2014) and value (Hashem etal., 2015; Lycett, 2013) were introduced as newdimensions. Hence, big data can be referred to asdata with high volume, variety, velocity, veracityand value (Abbasi, Sarker, & Chiang, 2016;Gandomi & Haider, 2015; Shim et al., 2015).Notwithstanding the differences in perceptionsabout the meaning of predictive analytics in theliterature, there is a close unanimity thatwhatever definition is adopted, it involves theidea of discovery of trends, relationships andpatterns from data for decision making andprediction of future events (Deka, 2014; Goul,Balkan, & Dolk, 2015; Hair Jr, 2007; Kridel &Dolk, 2013; Russell, 2015; Shim et al., 2015;Shmueli & Koppius, 2011; Watson, 2014). Russell(2015) featured identification of risks andopportunities in describing predictive analytics.Similarly, Zeng (2015) featured “prediction offuture events in a wide range of applicationcontexts, as well as individual, group, societalbehaviors and actions” in describing predictiveanalytics. As highlighted previously, analyticsonly involves the use of iterative and methodicaltechniques to discover, analyze and interpretmeaningful patterns from the data (Baltzan &Welsh, 2015). Hence, predictive analytics can bereferred to as the use of iterative and methodicaltechniques that collect and analyze data to revealtrends, relationships and patterns within it toidentify problems and opportunities, predictfuture events, and guide decision making in awide range of application contexts, includingindividual, group, and social behaviors andactions. Based on the above discussion, we offerthe following definition:Big dataiterativecollect,volume,predictive analytics is the use ofand methodical techniques thatanalyze, and interpret highvariety, velocity, veracity andvalue data to reveal trends, relationshipsand patterns within data to identifyproblems and opportunities, predictfuture events, and guide decision makingin a wide range of application contexts,including individual, group, and socialbehaviors and actions.BDPA will have a profound impact in helpingbusiness organizations deal with high volumes ofstructured and unstructured data to generateinsights that guide day-to-day operations,improve decision making and define futurestrategies (Deka, 2014). Next, we investigate theIS literature for published studies on BDPA tounderstand its current state, application3. RESEARCH METHODTo understand the present state of BDPA researchand identify future research directions, we reviewthe literature for relevant publications within theIS discipline. We adopt Levy and Ellis (2006)guidelines for conducting a systematic literaturereview. The guidelines suggest that a review ofthe literature should follow the inputs, processingand outputs phases. Accordingly, we identifyBDPA studies from the top ranked IS nt over time. Second, we analyze andclassify relevant studies (processing). Third, wediscuss the applications and state of currentpractice of BDPA based on the identified studies(output).Review InputsA methodical search of the literature wasconducted for published studies with any of thekeywords "Predict*", "Forecast*", "Data driven","data mining", "machine Learning", "Analysis" or"Analytic*" within their title, abstract andkeywords. We also required that "Big data" or"Large data*" be mentioned somewhere in thecontent of the papers. We assume thesekeywords will be in the title, abstract andkeywords of any literature relevant for our study.However, it is possible that our search mightneglect other relevant studies that do not havethese keywords in their title, abstract andkeywords.This review covers related studies published fromJanuary 2006 to June 2017. The search wasconducted on top IS senior scholar basketjournals as recognized by the Association ofInformation Systems’ and Peffers and Ya (2003).We only focused on papers from top IS seniorscholar basket journals because of their profound 2017 ISCAP (Information Systems & Computing Academic Professionals)http://iscap.infoPage 4

2017 Proceedings of the Conference on Information Systems Applied ResearchAustin, Texas USAISSN: 2167-1508v10 n4512impact and quality publications. A total of 341studies were identified from the search and weresaved to a reference manager (Endnote).Selected5 YearImpactFactorDecision Support Systems474.29MIS Quarterly1012.22IS Research74.79Journal of IT56.95 (2016)Journal of MIS42.35 (2016)IS Journal32.82JournalofStrategicInformation Systems24.61European Journal of IS12.81 (2016)Journal of AIS12.01 (2016)JournalsTotal80Table 1: BDPA Studies Reviewed from DifferentIS JournalsApplicability of literature: We scrutinized thecontents of these 341 studies against thefollowing criteria to make sure they are applicableto our research; (1) Are the studies focused onprediction? (2) Are the studies big data oriented(i.e., the Data used in the study have Volume,Variety and or Velocity)? (3) Are the studiesmethodologically grounded (i.e., Analysis goal,Data collection, Modelling method, Validationmethod)? (4) Are the studies practically ortheoretically relevant? Only studies that met allfour criterial were selected, including a fewbecause of their conceptual significance. Here, weexcluded papers whose concepts of BDPA did notfall within the scope, such as adoption relatedpapers e.g., (Agrawal, 2015; Li, Wu, Liu, & Li,2015). Additionally, we left out predictiveanalytics studies that used a relatively smallsample of data and or non-complex data tovalidate their predictive models e.g., Zheng et al.(2015). Also, discussion notes and some noninfluential non-empirical papers were excluded.This resulted in a final list of 77 relevant studies.Additionally, a review of the references of these77 papers and a forward reference search (i.e.,articles that cite articles under review) throughgoogle scholar yielded a final list of 80 relevantstudies. Table 1 illustrates where the selectedstudies where published. Decision SupportSystems (DSS) published the majority (47) of therelevant BDPA studies. DSS is a good fit for BDPAstudies because of its aligned goal of supportingand optimizing the decision-making process.Another reason may be because, compared toother journals in the basket, DSS has a fastpublication timeframe, which explains the highnumber of publications on the topic.Publication TrendWe further examine the longitudinal trends ofBDPA studies. We grouped all studies publishedprior to 2010 together as “Before 2010. Figure 1shows the publication trends of our search resultsproviding an understanding of the advancementof BDPA research over the years. We can see thatfrom 2014, BDPA started receiving attention andthe number of publications with that themeskyrocketed, with approximately 89% of thestudies published between 2014 and 2017. Itshould be noted that this search was conductedin June 2017, thus, we suggest that noassumption should be made about thedownwards curve in trend from 2016 to 2017.This may suggest that the literature will beflushed with more studies on BDPA in the comingyears.Publication Trend of BDPA33191311BEFORE 2010201023220112012201362014201520162017Figure 1: Publication Trend of BDPA in ISResearch4. ANALYSIS AND RESULTSResearch CategoryCount%Empirical Research6075%General Overview1012.5%Privacy Issues with BDPA45%Business Value of BDPA33.75%Literature Surveys33.75%80100%TotalTable 2: BDPA Themes Grouped in IS researchThe results of our review suggest that BDPA weremainly used in IS research for a priori data-drivendiscovery of relationships between variables andan assessment of the likelihood of occurrence of 2017 ISCAP (Information Systems & Computing Academic Professionals)http://iscap.infoPage 5

2017 Proceedings of the Conference on Information Systems Applied ResearchAustin, Texas USAISSN: 2167-1508v10 n4512the relationships between variables in thedataset. In Table 2, we see that about 75.3% ofthe studies are empirical in nature and theremaining 24.7% are non-empirical, ing of the concept under study. Thesethemes include a general overview (10), privacyissues of BDPA (4), business value of BDPA (3)and literature surveys (3). Table 3 (Appendix A)summarizes the selected BDPA studies.Another 9 of the studies used datasets thatcontain 10 million observations or more. It shouldbe noted that there are some studies that usedmultiple datasets of different size for theirinvestigation (Langseth & Nielsen, 2015) so, weonly indicated the highest number of data used ineach research. Evidently, the vast amount of dataavailable today seems to be underutilized orunavailable to the IS literature.Big Data Characteristics: As illustrated earlier,big data is by design of large volume, differentvariety and generated at different frequencies.Our analysis reveals that 15 of the empiricalstudies use datasets that we identify as eitherhigh volume e.g., (Cresci, Di Pietro, Petrocchi,Spognardi, & Tesconi, 2015) or high variety (Tsai& Chen, 2014). The majority of studies (31) useddata with volume and variety (Huang, Chen, &Chen, 2016; Martens & Provost, 2014), volumeand velocity e.g., (Langseth & Nielsen, 2015;Moeyersoms & Martens, 2015) or velocity andvariety (Dag, Topuz, Oztekin, Bulur, & Megahed,2016; Sahoo, singh, & Mukhopadhyay, 2012).While 14 other studies use datasets that havevolume, variety and velocity (Wattal, Telang,Mukhopadhyay, & Boatwright, 2011; Wu, Huang,Song, & Liu, 2016). Interestingly, we found that9 of the studies that used datasets with the 3Vscharacteristics were published between 2016 and2017 alone. This indicates that upcoming studieson BDPA are more likely to feature datasets withthe 3Vs characteristics.Data Sources: User generated content viareviews, ratings and social media has been themost exploited source of data available to BDPAin IS research with a total of 26 studies reportingtheir usage. Studies in this group rely on usergenerated content to understand user sentiments(Stieglitz & Dang-Xuan, 2013) or userpreferences for recommender systems (Chen,Shih, & Lee, 2016a), with the exception of Cresciet al. (2015) who used social media data toidentify fraudulent twitter followers. Another 9studies used historical transactional data aboutcustomers in their study, such as (Carneiro,Figueira, & Costa, 2017). Another 7 studies reportusing health records for their investigation.Additional 6 studies used datasets other than thepopular sources outlined. Datasets used in these6 studies where collected from police theft reports(Camacho-Collados & Liberatore, 2015), lakedata (Jiang, Liu, Zhang, & Yuan, 2016), ormultiple sources e.g., (Bogaert, Ballings, & Vanden Poel, 2016; Geva, Oestreicher-Singer, Efron,& Shimshoni, 2017; Pai, Wu, & Hsueh, 2014).Data used by 11 other studies were collected viatext documents (3), email or text messages (2),census data (3) and website content (4). Ouranalysis indicates that IS studies are making useof more publicly available data. A reason for thismight be the ethical constrains involved incollecting institutional data, data privacyconsiderations, or the fear that revealing datamight affect competitive advantage.Data Size: Of the empirically conducted studiesthat we analyzed, only 9 reported using a datasetwith less than 10,000 observations. Most studiesin this category analyze either text documentswhich mostly dwell on high dimensionality (Tsai &Chen, 2014) or health records (Wimmer, Yoon, &Sugumaran, 2016) where the number ofobservations is usually small because somehealth cases like cancer are not widely dispersed.We found that 19 of the reviewed studies useddatasets ranging between 10,000 and 100,000observations. Another 16 studies used datasetsranging between 100,000 and 1 millionobservations, while 7 used datasets rangingbetween 1 million to 10 million observations.Analysis Techniques: It is important to notethat most of the studies we reviewed report usingmultiple modelling techniques for their analysis,hence we only documented the techniques thatyielded the best performance. Our analysis showsthat a majority (23) of the reviewed studies usedtechniques that were not frequently used forpredictive modelling before the era of big data.For instance, Huang et al. (2016) used a Googlesimilarity distance measure to suggest arecommender system. Another example isKhairul and Shahrul (2015) who introduced anidentity matching model using Q-gram indexing.9 studies report using regression models for theiranalysis e.g., (Bardhan, Oh, Zheng, & Kirksey,Empirical ResearchThe use of BDPA in IS research is summarized inTable 4 (Appendix B) in terms of big datacharacteristics, data size, data source, method ofanalysis and application domain. We noticed that45 of the 60 empirical studies where published inDSS. The results are further analyzed. 2017 ISCAP (Information Systems & Computing Academic Professionals)http://iscap.infoPage 6

2017 Proceedings of the Conference on Information Systems Applied ResearchAustin, Texas USAISSN: 2167-1508v10 n45122014). 5 studies report using Bayesian modelsbased on networks (Coussement, Benoit, &Antioco, 2015; Wattal et al., 2011) or hiddenMarkov models (Jiang et al., 2016; Sahoo et al.,2012). Another 4 studies report using decisiontree models. Interestingly, 3 of those studieswhere about evidence based medicine (Dag et al.,2016; Gómez-Vallejo et al., 2016; Meyer et al.,2014). This is because decision tree models aresuitable for problems with sequences of what-ifscenarios that can lead to various outcomes.Medical decisions are an example of suchproblemssincehealthpractitionersarecontinually faced with situations where they makecrucial decisions to determine the right diagnosis,the ideal treatment or the survival chances ofpatients. Only 3 studies report on new algorithmsthat manage the complexity of the big data theyhad to investigate. For instance, Tsai and Chen(2014) introduced an efficient genetic algorithmfor reducing high dimensional data. Also, 3studies report using support vector machines fortheir investigations. Finally, 2 studies each reportusing matrix factorization, naïve Bayes, neuralnetworks, rough sets and times series techniquesfor their data analysis. This suggests thatlongstanding predictive analytics techniques havebeen used in the literature for prediction using bigdata.Application Domain: Among the IS studiesanalyzed, 11 were conducted to understand andpredict the sentiment of users about subjectssuch as movies (Fersini, Messina, & Pozzi, 2014)and products (Salehan & Kim, 2016). Another 11studies were conducted to develop recommendersystems for movies, products, or predictuncertainty (Banerjee, Bhattacharyya, & Bose,2017; Zhang, Guo, & Chen, 2016) e.t.c. Also, 10other studies report using BDPA in fields such aspredicting event attendance (Bogaert et al.,2016), forecasting microsystem in biological anddisease control (Jiang et al., 2016) and genericfields (Tsai & Chen, 2014). Additional 9 studiesused BDPA to gather market intelligence forsegmenting e.g., (Wattal et al., 2011), sales leadqualification (D’Haen, Van den Poel, Thorleuchter,& Benoit, 2016), or better targeting consumerse.g., (De Cnudde & Martens, 2015; Moeyersoms& Martens, 2015; Pournarakis, Sotiropoulos, &Giaglis, 2017). Also, 6 other studies each whereapplied in health domain support medicaldiagnosis e.g., (Gómez-Vallejo et al., 2016) orindex personal health profiles e.g., (Bardhan etal., 2014). Extra 6 studies where applied toanomaly and fraud detection in issues such asidentifying fraud twitter accounts (Cresci et al.,2015) and identifying phishing for internet fraud(Abbasi et al., 2015). An additional 4 studieswhere applied to financials to predict firm valuefor stock boosting purposes (Luo & Zhang, 2013;Shynkevich, McGinnity, Coleman, & Belatreche,2016) or to determine crowdfunding outcomes(Yuan, Lau, & Xu, 2016). 2 studies applied BDPAto identify defective toys (Winkler, Abrahams,Gruss, & Ehsani, 2016) or predict crimeoccurrence (Camacho-Collados & Liberatore,2015). Only 1 study used text mining to classifysimilar documents (Martens & Provost, 2014).This review suggests that predictive analytics hasbeen widely recognized and utilized by severalindustries to unravel insights from their

in big data predictive analytics, and enable practitioners to understand the potentials and applications of this new and important concept. Keywords: Big data, Predictive analytics, Business Intelligence, Systematic Review 1. INTRODUCTION Organizations are witnessing a rapid growth in the volume of

Related Documents:

The Rise of Big Data Options 25 Beyond Hadoop 27 With Choice Come Decisions 28 ftoc 23 October 2012; 12:36:54 v. . Gauging Success 35 Chapter 5 Big Data Sources.37 Hunting for Data 38 Setting the Goal 39 Big Data Sources Growing 40 Diving Deeper into Big Data Sources 42 A Wealth of Public Information 43 Getting Started with Big Data .

big data systems raise great challenges in big data bench-marking. Considering the broad use of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads, which is the prerequisite for evaluating big data systems and architecture. Most of the state-of-the-art big data benchmarking efforts target e-

of big data and we discuss various aspect of big data. We define big data and discuss the parameters along which big data is defined. This includes the three v’s of big data which are velocity, volume and variety. Keywords— Big data, pet byte, Exabyte

Retail. Big data use cases 4-8. Healthcare . Big data use cases 9-12. Oil and gas. Big data use cases 13-15. Telecommunications . Big data use cases 16-18. Financial services. Big data use cases 19-22. 3 Top Big Data Analytics use cases. Manufacturing Manufacturing. The digital revolution has transformed the manufacturing industry. Manufacturers

Big Data in Retail 80% of retailers are aware of Big Data concept 47% understand impact of Big Data to their business 30% have executed a Big Data project 5% have or are creating a Big Data strategy Source: "State of the Industry Research Series: Big Data in Retail" from Edgell Knowledge Network (E KN) 6

6 Big Data 2014 National Consumer Law Center www.nclc.org Conclusion and Recommendations Unfortunately, our analysis concludes that big data does not live up to its big promises. A review of the big data underwriting systems and the small consumer loans that use them leads us to believe that big data is a big disappointment.

Hadoop, Big Data, HDFS, MapReduce, Hbase, Data Processing . CONTENTS LIST OF ABBREVIATIONS (OR) SYMBOLS 5 1 INTRODUCTION TO BIG DATA 6 1.1 Current situation of the big data 6 1.2 The definition of Big Data 7 1.3 The characteristics of Big Data 7 2 BASIC DATA PROCESSING PLATFORM 9

This platform addresses big-data challenges in a unique way, and solves many of the traditional challenges with building big-data and data-lake environments. See an overview of SQL Server 2019 Big Data Clusters on the Microsoft page SQL Server 2019 Big Data Cluster Overview and on the GitHub page SQL Server Big Data Cluster Workshops.