Big Data & Data Science: Economic Applications

2y ago
8 Views
2 Downloads
924.00 KB
64 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Ronnie Bonney
Transcription

Big Data & Data Science:Economic applicationsJosé García MontalvoMaster of Data ScienceCourse: “Economics for the Era of Big Data”October 13, 2015

Summary IntroductionSome preliminary comments on data science and big dataBig data and economicsMy experience with big data and economics:– Finding the real price of housing in Spain– Breaking the world 100 times in small pieces– Looking at infrastructures and companies– Men, women and scoring– Electoral predictions– Big data for financial services and marketingDanger of big data: confidentiality and correlationConcluding remarks

Introduction “In God we trust; all other must bring data”,Edwards Deming “Without data you are just one more personwith an opinion” (anónimo) “We are drowning in information but starvedfor knowledge.” John Naisbitt

Introduction “Big data is like teenage sex: everyone talksabout it, nobody really knows how to do it,everyone thinks everyone else is doing it, soeveryone claims they are doing it.” Dan Ariely

Introduction: Target’s legend Excerpt from "Predictive Analytics“ by Eric Siegel:In 2010, I invited an expert at Target, Andrew Pole, tokeynote at Predictive Analytics World, a conference forwhich I serve as program chair. Pole manages dozens ofanalytics professionals who run various predictiveanalytics (PA) projects at Target. In October of that year,Pole delivered a stellar keynote on a wide range of PAdeployments at Target. He took the stage anddynamically engaged the audience, revealing detailedexamples, interesting stories, and meaningful businessresults that left the audience clearly enthused.

Introduction: Target’s legend Excerpt from "Predictive Analytics“ by Eric Siegel:Toward the end, Pole describes a project to predictcustomer pregnancy. Given that there's a tremendoussales opportunity when a family prepares for a newborn,you can see the marketing potential. Featured in the New York Times (Charles Duhigg,"HowCompanies Learn Your Secrets“, 16-2-12) and FT

Introduction: Target’s legend Andrew Pole had just started working as a statistician forTarget in 2002, when two colleagues from the marketingdepartment stopped by his desk to ask an odd question:“If we wanted to figure out if a customer is pregnant,even if she didn’t want us to know, can you do that? ”Pole has a master’s degree in statistics and another ineconomics“We knew that if we could identify them in their secondtrimester, there’s a good chance we could capture themfor years,” Pole told me. “As soon as we get them buyingdiapers from us, they’re going to start buying everythingelse too.

Introduction: Target’s legend Pole was working for Target’s Guest Marketing Analyticsdepartment”: he was a statistician and a mathematicianPole was asked to find unique moments of the life ofcostumers such that consumption habits could be flexibleenough to be attracted by the department store as aloyal client - divorce, moving to a new house, birth of achild, are some of those moments however when the child is born it is already too late:they needed to “attract” the costumer before thatmoment

Introduction: Target’s legend In those particular situation costumers are more vulnerable to marketingThey identify 25 products (calciumsupplements, magnesium, zinc, unscentedsoap, large cotton bags, etc.) that pregnantwomen buy during the first 20 weeks aspredictors of pregnancyThey used “habit looping” algorithm

Introduction: the case of Amazon Amazon used dozen of literary critics to suggest titles to its costumers until 2001 - “Amazon voice” was considered by the WSJ asthe most influential critic of the USAt some point Jeff Bezos questioned if therewas not better way to offer client specificrecommendations: using the history ofproducts bought by a costumer plus similaritieswith other costumer plus Linden’s “item byitem” algorithm the automatic recommendationsystem beat big time the human critics

Introduction: the case of Amazon “Amazon voice” o “machine learning”? Human critics or algorithms? - Algorithm won clearlyand all critics were firedToday 1/3 of the sale of Amazon come from thesystem of personalized recommendationsLinden’s item by item algorithm has beenadopted by many digital shops including Netflix

Data Science During the last millenium science has been an empiricendeavor (description of natural phenomenon)During the last centuries science opened up to model,formulations and generalizationsIn recent decades computational science and simulationof complex phenomenonCurrently eScience:– Unifies theory, experiments and simulation– Massive data capture using specific software or generating them bysimulation– Knowledge and information is stored in computers

Big Data

Big data: the problem of storage

Big data More basic facts:– Need to move from a model centered around acomputer to a model centered in the data with parallelmassive computation (many cores and accelerators)and memory persistence new architectures to mitigatethe issue of heat and energy consumption (why bitcoinmining takes place in Iceland?)– Need to move to distributed computing (scalable andparallel computing) and new types of data generate theneed for new tool that can work with non relationaldatabases (non-SQL) like Map Reduce

Big data More basic facts:– New solutions in data science have reduced significantlythe cost of complex processes like genome sequencingor micro-segmentation– Demand of statisticians, mathematicians and computerscience graduates who know economics/business hasincrease drastically

Big data and econometrics Important differences between theeconometrics before and after big data:– Supervised learning (inputs and outputs) usesclassification methods, decision trees and neuralnetworks when we use to call that regression– Unsupervised learning (only inputs) by shrinkage ordimensionality reduction while we use to call that nonparametric estimation– Traditional econometrics stop explaining ridge regressions– Stata/Mata/Gauss versus Hadoop/MapReduce/Pig/Mahout/OpenRefine/ Hive/Hbase/ZooKeeper and R– Importance of over-fitting and cross validation

Big data and econometrics Important differences between theeconometrics before big data and after(classical econometrics):– Large dimensionality: k n: need to regularize and usemethods to reduce the dimensionality of the models likeLASSO (least absolute shrinkage and Selection Operator anyother that penalizes the number of parameters)– The path towards causality (using randomized experiments ornatural experiments) has taken a detour towards machinelearning techniques where prediction is emphasized overcausality– Big data techniques are subject to the Lucas critique: socialreputation and scoring: credit card fraud and testing of 99cents operations

Big data and economics The availability of increasingly larger databases, in manycases geocoded, that merge information of diverse originmake economics a discipline more and more scientific:– The Billion Prices Project: real time estimation of the evolution ofprices using millions of prices of on-line business. The project showsthat official inflation is quite similar to on-line calculations forcountries like Brazil, Chile, Venezuela or Colombia but not forArgentina (accumulated difference between 2007 and 2011 of 65%)– What can we do with information on 24 million credits?– The STAR experiment and its current effects– Choi and Varian (2014): use big data techniques to improveprediction models Example: AR(1) model for weeklyunemployment subsidies using as transfer function Google Trendsfor words like jobs, welfare or unemployment

Big data and economics The Billion Prices Project (MIT)– They use the stability or change in the HTML tags usedto construct the web pages of online department storesto identify changes of prices over time– Using this principle their software identifies theinformation relevant for the product and its price– The URL of the page that indexes the products is usedto classify them by categories

Big data and economics: house prices House prices in Spain:––––––AppraisalsAsk prices (they can be found crawling Internet)Closing pricesRegistry pricesNotary prices

Big data and economics: house prices Montalvo and Raya (2012), “Imaginary prices ”Increasing the appraisal price to produce amortgage:– Finally I got data to show what it was theoreticallyobvious– Appraisal values set in function of the financial needsof the clients and not to represent the value of thehouse

Big data and economics: house prices Montalvo and Raya (2012), “Imaginaryprices ” merge four datasets from verydifferent sources:– Housing intermediary: market prices– Financial institution: appraisal prices, amount of themortgage– Ask prices: robots in Internet– Official Registry of Real Estate Properties: amount ofmortgage, registry price (reported in the officialownership document)– General Directory of Real Estate Properties: unique idnumber

Big data and economics: house prices The damaging role of appraisal firms and its conflict of interest with banks andsavings and loansEquity withdrawals very limited in Spain:well, not really you get it upfrontThe condescending view of the regulator:countercyclical buffer, no securitization,strict regulation of out of balanceoperations, large consolidationperimeter but huge problems ofincentives of appraisal companies andbanks

Big data and economics: house prices210Density34Loan to official value (appraisal)0.511.5Crédito sobre valor de tasacion2

Big data and economics: house prices01Density23Loan to transaction price012Crédito sobre precio de venta3

Big data and economics: house prices1.51.50Density22.5Over-appraisal on sale price-.50.511.5Incremento porcentual del v. de tasación sobre el precio de compra

Big data and economics: geo-diversity What is the effect of ethnic diversityon economic growth anddevelopment? slides What’s the impact of infrastructures onfirm location? Paper

Big data and economics: millions of loans Did banks reduce their standards tooriginate credits? Is it better morecapital or more women as riskofficers? Paper

Big data and economics: predicting elections How to predict a difficult election: fromsimple time series baseline models tocomplex integrated models withcensus post-stratification and Bayesianlogit models updated using thousandsof polls. Paper

Big data and marketing Changes in the buyers journey: need to understand whatthe buyer does 60 to 80% of the time before contactinga company representativeDigital body language: visits to the web, time spend,emails interactions, interaction with social media,behavior after watching a video, banner or report;searching before and after; etc.Chief Marketing Officer (CMO) and Chief InformationOfficer (CIO) have to be very close and also in constantinteraction with the sales department

Big data and marketing Technologies for marketing– MAS: marketing automation software (ExactTarget, Marketo, Eloqua,–––––––etc.)Business Inteligence Databases (IBM, Oracle, SAP)CRM: Customer Relationship Management (SAP, NetSuite,Salesforce, Oracle, etc.)CMS: Content management system (Abode, OpenText, Oracle, etc.)Platforms blog: WordPress, Moveable type, etc.DMP: Data management platforms (Abode, BlueKai, CoreAudience,Krux, Lotame,etc.)Analytic tools: web analytics, charbeat, google analytics, mint, etc.SMM: social media management software (Abode social, buddyMedia, Web trends, HootSuite, etc.)

Big data and marketing Technologies for marketing:– Predictive lead scoring vendors: FlipTop, Infer, KXEN (SAP), LatticeEngines (all of them use big data techniques)– Call- center software– SEM platforms (Search Engine Management Platforms) tomanage, automatize and optimize marketing in search pages andpay-for-click campaigns– DPS: Demand Side Platforms allows the marketing departmentand its agencies to manage in real time the auctions for advertising(RTB: real time bidding) simultaneously in several online advertisingmarkets

Big data and financial services After what has happened during the financial crisis financial institutions need to gain theirlost reputationThe EBA, IMF, ECB, etc. insist that theEuropean Banking sector has an importantprofitability problem (low ROEs) - need newbusiness modelIncreasing competition from new non-bankactors in the financial intermediation businessis eroding large parts of the value chain ofbanking products

1. Recovering clients’ trust Could the banking industry do like Amazon and recommend individualized products to itsclients? - banks swim in a huge sea of veryrelevant data which opens the door to adaptproducts to the needs of each client (instead ofinventing product that then they promoteacross all types of clients)Objective: improve access of families of lowincome to financial product at a cost that isreasonable for their income profile, ability topay and risk aversion of the costumers

1. Recovering clients’ trust In many countries, including the US, there is a high proportion of clients that, either for havingno credit history or a short credit history,cannot access to banking servicesIn the US these potential costumers end up inpayday loan services paying very high interestrates and having a low maturity product

2. New business models The reduced profitability of banking, the increase in regulation and the high level ofleverage of the economy requires efficiencyimprovement in the banking sectorThe future of their business model can bebased on big data (large databases created bybanks): reducing inefficiencies, increasingproducts costumization and costumerssatisfaction using analytics and big dataBig data is also critical to confront increasingdemand of information for regulation purposes

3. How to confront new competitors The financial disintermediation is affecting the profits of financial institutionsUntil recently these competitors attackedbasically the payment instruments chain link(cryptocurrencies, mobile payments,complementary currencies, etc.) but the aremoving fast to other parts of the value chain(peer to peer loans, personal loans, etc.)

Big data and banks: applications Basic applications– Optimization of the relationship with clients– Improvement in the financial functions– Risk reduction– Compliance with new regulations

1. Big data and banking: scoring FICO (Fair Isaac Corporation) and internal modelsBehavioral models for long time customers andconcessional models (based on demographicsand few more observation) for recentcustomers or even no clients

FICO Factors to calculate the credit score (the exact formulais a secret). Approximate weights:– 35% Payment History Late payments on bills, such as amortgage, credit card or automobile loan, can cause a– 30%Credit Utilization The ratio of current revolving debt(such as credit card balances) to the total available revolvingcredit (credit limits).– 15% Length of Credit Historyages, assuming they pay their bills, it can have a positiveimpact on their FICO score.– 10% Types of Credit Used (installment, revolving, consumerfinance) Consumers can benefit by having a history ofmanaging different types of credit.– 10% Recent search for credit and/or amount of creditobtained recently Multiple credit inquiries for a consumerseeking to open new credit, such as credit cards, retail store.

1. Big data and banking: scoring Testimony of J. P. Morgan before the House of Representative (1912): “The most importantfactor to ge a credit is not wealth butreputation. A man who is not trustable wouldnot obtain from me any money eve if he ownsall the wealth of ChristianityReputation - social reputation - social creditscore - social status onlineBasic sources to measure social reputation:Facebook, Twitter, LinkedIn, etc.

1. Big data and banking: scoring examples Neo Finance (Palo Alto) specializes in loantargeted to young people who want to buy acar but do not have a lengthy credit history - this would imply to pay very high interestrates:– Neo Finance uses the number and quality of the connectionsof the loan applicant in LinkedIn. They look specially for linksto workers in the same company to predict stability of job inthe future and income– They also use the contacts in other companies to estimate theprobability of finding a job conditional on being fired of hercompany to estimate the time to finding a job after beingdismissed - objective estimate job stability

1. Big data and banking: scoring examples Keditech– They use the location of the residence of friends and theirjobs to calculate a credit score. If friends have delinquentcredits this reduces the probability of getting a creditapproved.– The algorithm generates a decision in 8 seconds (average)and produces a delinquency rate less than 10%Lenddo generates a social capital online score between 0and 1000 using the number of followers in Facebook,their characteristics (demographics, residence, jobs,etc.), their education degrees, their employer and credithistory, and the credit history of their friends. Thereforeif a friend stops paying bills or loans my score is affected- similar idea to microcredits in developing countries

1. Big data and banking: scoring examples Experian (Extended View) y Equifax (Vantage Score) are open to use social reputationinformation coming from Internet.FICO has shown no interest in this informationSome analyst call these new techniques tocalculate the score as the reinvention of thewheel of credit scores: “it’s the Wild West likethe early days of FICO” Pete FaderBanks have high quality costumers’ data andexperience that it is difficult that can be beatenby social reputation indices (very noisy) atleast currently

1. Big data and banking BBVA and credit card terminals: http://mwcimpact.com/Importance for new business and creditMasterCard and Spending Pulse: real time dataon consumption in different commercialactivitiesVISA produces high frequency predictions usingeconomic surveysMoody’s Analytics forecasts each month theemployment in the private sector usinginformation on 500.000 companies that usetheir payroll software

1. Big data and banking: examples scoring Khandani et al (2012) increase the number ofvariables on transactions of the clientes andcredit scores generated by agencies (Experia,etc.) - improves forecast reaching 85% ofright predictions and 6-25% saving on totallosses

2. Big data and banking: credit cards Reason for big data used to detect credit card fraud is simple: it saves millions of euros to abank – it takes advantage of one of the V(velocity) of big dataSize of data is huge: employers data,applications to jobs, loans, etc., death lists,incarcerated, black lists, IRS, etc. as well aspatters that could be used to analyze thegeographical location of payments,characteristics of the business and similarbusinesses, etc.

2. Big data and banking: credit cards Four approaches:––––Based on rules (known patterns)Detection of anomalies or outliers (unknown patterns)Predictive analysis searching for complex patternsHybrid models

3. Other utilities Peer to peer lending: RateSetter, Zopa, Lending Club, etc.Platforms like Social finance, CommonBond oUpstart for students debt - applicants accept thatthose platforms get any information needed toscore them using data of employers or online socialnetworksCar insurance and the rise of sensors

The dangers of big data Big data provides very useful tools to managebusiness in an uncertain environment withincreasing regulation and mistrust fromcostumers in the banking industry - however,it is not sure that any big data project willgenerate a successful strategy:– Decreasing returns to the accumulation of information– Data are not informative if they are not properlyanalyzed– Need to calculate the cost benefit of any big data project(return on investment, ROI)

The dangers of big data A huge amount of data cannot overcome the foundations of statistics, the influence ofmeasurement errors or the dangers of spuriouscorrelationsYou need technical knowledge but also be opento evaluate constantly the predictive ability ofyour model and adjust it if there is lost ofprecision: the experience of Google Trends withGoogle Flu Trends is a good precautionary tale

Carefull with algorithms! Small area flu forecast (using “Google Flu Trends”): big data and algorithms havelimitations - in the last three years themodels have over-estimated the flue by 50%Princeton versus Facebook– Princeton study: Facebook will loose 80% of its users(epidemic model on the number of times that the wordFacebookMySpace– Facebookprinciples that correlation implies causality, and the same

The dangers of big data Data privacy and reutilization: new consent clauses The possibility that mistakes in the capture,merge or cleaning of data may generatenegative effects for citizens by the applicationof big data to specific problems. For example,what happen if a company calculates a wrongcredit score for a citizen using big datatechniques and, that leads to the denial of acredit? - NCLC (2014) (National ConsumerLaw Center)

The dangers of big data The issue of anonymized data - several papershave shown how to find out the name of theanonymous person to whom the data refer to - recurrent topic in the latest meeting of theAmerican Statistical Association

Excerpt from "Predictive Analytics“ by Eric Siegel: In 2010, I invited an expert at Target, Andrew Pole, to keynote at Predictive Analytics World, a conference for which I serve as program chair. Pole manages dozens of analytics professionals who run various predictive analytics (PA) projects at Target. In October of that year,

Related Documents:

The Rise of Big Data Options 25 Beyond Hadoop 27 With Choice Come Decisions 28 ftoc 23 October 2012; 12:36:54 v. . Gauging Success 35 Chapter 5 Big Data Sources.37 Hunting for Data 38 Setting the Goal 39 Big Data Sources Growing 40 Diving Deeper into Big Data Sources 42 A Wealth of Public Information 43 Getting Started with Big Data .

big data systems raise great challenges in big data bench-marking. Considering the broad use of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads, which is the prerequisite for evaluating big data systems and architecture. Most of the state-of-the-art big data benchmarking efforts target e-

of big data and we discuss various aspect of big data. We define big data and discuss the parameters along which big data is defined. This includes the three v’s of big data which are velocity, volume and variety. Keywords— Big data, pet byte, Exabyte

Retail. Big data use cases 4-8. Healthcare . Big data use cases 9-12. Oil and gas. Big data use cases 13-15. Telecommunications . Big data use cases 16-18. Financial services. Big data use cases 19-22. 3 Top Big Data Analytics use cases. Manufacturing Manufacturing. The digital revolution has transformed the manufacturing industry. Manufacturers

Big Data in Retail 80% of retailers are aware of Big Data concept 47% understand impact of Big Data to their business 30% have executed a Big Data project 5% have or are creating a Big Data strategy Source: "State of the Industry Research Series: Big Data in Retail" from Edgell Knowledge Network (E KN) 6

Ten Principles and Fourteen Big Ideas of Science Education Introduction: Why 'big ideas'? 1 Section One: Principles underpinning essential education in science 6 Section Two: Selecting big ideas in science 16 Section Three: From small to big ideas 24 Section Four: Working with big ideas in mind 42 Profiles of seminar participants 51

6 Big Data 2014 National Consumer Law Center www.nclc.org Conclusion and Recommendations Unfortunately, our analysis concludes that big data does not live up to its big promises. A review of the big data underwriting systems and the small consumer loans that use them leads us to believe that big data is a big disappointment.

Reasoning (Big Ideas) Direct Fractions Multiplication 3-D shapes 10 CONTENT PROFICIENCIES . As teachers we need to have Big Ideas in mind in selecting tasks and when teaching. What is a Big Idea? Big Ideas are Mathematically big Conceptually big Pedagogically big 13 .