Statistical Learning And Data Mining Stat557

1y ago

10 Views

2 Downloads

651.60 KB

26 Pages

Last View : 13d ago

Last Download : 3m ago

Upload by : Allyson Cromer

Report this link

Download PDF

Transcription

Statistical Learning and Data Mining Stat557Statistical Learning and Data MiningStat557Jia LiDepartment of StatisticsThe Pennsylvania State UniversityEmail: jiali@stat.psu.eduhttp://www.stat.psu.edu/ jialiJia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557General InformationIICourse homepage:http://www.stat.psu.edu/ jiali/stat557Prerequisite:IIIJia LiElementary probability theoryConditional distribution, expectationC, Matlab, or S-plus programminghttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557IText books:IIRequired: The Elements of Statistical Learning, by T. Hastie,R. Tibshirani, and J. Friedman(ElemStatLearn).Optional:1. Classification and Regression Trees by L. Breiman, J. H.Friedman, R. A. Olshen, and C. J. Stone2. Pattern Recognition and Neural Networks by B. Ripley3. Principles of Data Mining by H. Mannila, P. Smyth and D. J.Hand4. Data Mining: Concepts and Techniques by J. Han and M.KamberJia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557What Is Data Mining?Data mining: tools, methodologies, and theories for revealingpatterns in data—a critical step in knowledge discovery.Driving forces:I Big data:IIIIExplosive growth of data in a great variety of fieldsIIIJia LiEnormous volumeHigh complexity: dimension, structureDynamicCheaper storage devices with higher capacityFaster communicationBetter database manage systemsIRapidly increasing computing power: distributed and parallelplatformsIMake data to work for ushttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Research fieldsJia LiIStatisticsIMachine learningIPattern recognitionISignal processingIDatabasehttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557ApplicationsIBusinessIIIGenomicsIIIIIJia LiTerrabytes of data on the internetMultimedia informationCommunication systemsIIHuman genome project: DNA sequencesMicroarray dataInformation retrievalIIWal-Mart data warehouseCredit card companiesSpeech recognitionImage analysisMany other scientific fieldshttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Problems Focused: PredictionJia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557TerminologyNotationIInput X : X is often multidimensional. Each dimension of X isdenoted by Xj and is referred to as a feature, predictor, orindependent variable/variable.IOutput Y : response, dependent variable.CategorizationI Supervised learning vs. unsupervised learningIIIs Y available in the training data?Regression vs. ClassificationIIIs Y quantitative or qualitative?For qualitative Y , it is also denoted byG G {1, 2, ., K }.Jia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557ExamplesEmail spam: (ElemStatLearn)Jia LiIGoal: predict whether an email is a junk email, i.e., “spam”.IRaw data: text email messages.IInput X : relative frequencies of 57 of the most commonlyoccurring words and punctuation marks in the email message.ITraining data set: 4601 email messages with email typeknown (supervised learning).http://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557ExamplesHandwritten digit recognition:(ElemStatLearn)IIGoal: identify single digits 0 9 based on images.Raw data: images that are scaled segments from five digitZIP codes.IIIJia Li16 16 eight-bit grayscale mapsPixel intensities range from 0 (black) to 255 (white).Input data: a 256 dimension vector, or feature vectors withlower dimensions.http://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Jia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557ExamplesImage segmentation:Jia LiIGoal: segment images into regions of different types, e.g.,man-made vs. natural in aerial images, graph and picture vs.text in document images.IRaw data: grayscale images represented by matrices of sizem n, or color images represented by 3 such matrices.http://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Aerial images. Left: Original image of size 512 512 with pixel intensityranging from 0 to 255, Right: Hand-labeled classified images. White:man-made, Gray: natural.Jia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557IInput data:IIIIMethodologies:IIJia LiDivide images into blocks of pixels or form a neighborhoodaround each pixel.Compute statistics using pixel intensities in each block.An image is converted to an array of input vectors.Assume the feature vectors are independent.Employ spatial models to capture dependence among thevectors.http://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Jia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557ExamplesSpeech recognition:IGoal: identify words spoken according to speech signalsIIIJia LiAutomatic voice recognition systems used by airline companiesAutomatic stock price reportingRaw data: voice amplitude sampled at discrete time spots (atime sequence).http://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Jia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557IIInput data: speech feature vectors computed at the samplingtime.Methodology:IIIJia LiEstimate an Hidden Markov Model (HMM) for each word,e.g., State College, San Francisco,Pittsburgh.For a new word, find the HMM that yields the maximumlikelihood.Identify the word as the one associated with the HMM.http://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557ExamplesDNA Expression Microarray:IIIGoal: identify disease or tissue typesRaw data: for each sample taken from a tissue of a particulardisease type, the expression levels of a large collection ofgenes are measured.Input data: cleaned-up gene expression dataIIIIIExample data set: 4026 genes, 96 samples taken from 9classes of tissues.Challenges:IIJia LiNormalizationDenoising.Ample literature on the topic of cleaning microarray datavery high dimensional datavery limited number of sampleshttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557ExamplesDNA sequence classification:Jia LiIGoal: distinguish “junk” segments from coding segments.IRaw data: sequences of letters, e.g., A,C,G,T for DNAsequences.IInput data: likelihood ratio statistics computed fromstochastic models.ISupervised learning: estimate stochastic models, selectmodels.http://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Supervised LearningTwo types of learning:IRegression: the response Y is quantitative.IClassification: the response Y is qualitative, or categorical.Two aspects in learning:IFit the data well.IRobustEquivalent concepts:Jia LiITraining error vs. testing errorIBias vs. varianceIFitting vs. overfittingIEmpirical risk vs. model complexity (capacity)http://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Jia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Learning SpectrumJia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557RegressionOverview:I Linear models:IIIGeneralized linear modelsExpand basis:IIIIJia LiThe mean response is a linear function of the independentvariables.Splines (polynomials)Reproducing Kernel Hilbert SpacesWavelet smoothingKernel methodshttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557Classification: A graphic ViewJia Lihttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557OutlinesJia LiILinear regressionILinear methods for classificationIPrototype methodsIClassification and regression tree (CART)IMixture discriminant analysisIHidden Markov models and its applicationshttp://www.stat.psu.edu/ jiali

Statistical Learning and Data Mining Stat557 Examples Email spam: (ElemStatLearn) I Goal: predict whether an email is a junk email, i.e., \spam". I Raw data: text email messages. I Input X: relative frequencies of 57 of the most commonly occurring words and punctuation marks in the email message. I Training data set: 4601 email messages with email type known (supervised learning).

Related Documents:

DATA MINING - University of Rajshahi

Preface to the First Edition xv 1 DATA-MINING CONCEPTS 1 1.1 Introduction 1 1.2 Data-Mining Roots 4 1.3 Data-Mining Process 6 1.4 Large Data Sets 9 1.5 Data Warehouses for Data Mining 14 1.6 Business Aspects of Data Mining: Why a Data-Mining Project Fails 17 1.7 Organization of This Book 21 1.8 Review Questions and Problems 23

13 Views

1y ago

Data Mining in Bioinformatics - UQAM

DATA MINING What is data mining? [Fayyad 1996]: "Data mining is the application of specific algorithms for extracting patterns from data". [Han&Kamber 2006]: "data mining refers to extracting or mining knowledge from large amounts of data". [Zaki and Meira 2014]: "Data mining comprises the core algorithms that enable one to gain fundamental in

41 Views

2y ago

Multi Relational Data Mining Approaches: A Data Mining Technique

Data Mining and its Techniques, Classification of Data Mining Objective of MRD, MRDM approaches, Applications of MRDM Keywords Data Mining, Multi-Relational Data mining, Inductive logic programming, Selection graph, Tuple ID propagation 1. INTRODUCTION The main objective of the data mining techniques is to extract .

9 Views

7m ago

Data Mining: Why Data Mining? - Leiden University

October 20, 2009 Data Mining: Concepts and Techniques 7 Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithm Other Disciplines Visualization October 20, 2009 Data Mining: Concepts and Techniques 8 Why Not Traditional Data Analysis? Tremendous amount of data

41 Views

2y ago

Data Mining Algorithms - Stanford University

Data Mining CS102 Data Mining Looking for patterns in data Similar to unsupervised machine learning Popularity predates popularity of machine learning "Data mining" often associated with specific data types and patterns We will focus on "market-basket" data Widely applicable (despite the name) And two types of data mining patterns

11 Views

1y ago

Exploration and Mining in Canada

enable mining to leave behind only clean water, rehabilitated landscapes, and healthy ecosystems. Its objective is to improve the mining sector's environmental performance, promote innovation in mining, and position Canada's mining sector as the global leader in green mining technologies and practices. Source: Green Mining Initiative (2013).

22 Views

1y ago

Data Mining - Brigham Young University

Data Mining Popularity lRecent Data Mining explosion based on: lData available -Transactions recorded in data warehouses -From these warehouses specific databases for the goal task can be created lAlgorithms available -Machine Learning and Statistics -Including special purpose Data Mining software products to make it easier for people to work through the entire data mining cycle

10 Views

1y ago

Introduction to Machine Learning & Data Mining

Data mining process 6 CS590D 12 Data Mining: Classification Schemes General functionality – Descriptive data mining – Predictive data mining Different views, different classifications – Kinds of data to be mined – Kinds of knowledge to be discovered – Kinds of techniqu

28 Views

2y ago

Recent Views

Yahoo: Failures - Harvard University

Stock closes at an all time low 8.11 Yahoo invested 1Bn in Alibaba Yahoo co-founder & CEO Jerry Yang steps down after 18 months Microsoft and Yahoo agree to search partnership 2008 Yahoo tries to buy Google for 3Bn. Google denied the offer 2009 Yahoo acquires many media companies Microsoft tries to buy Yahoo for 44.6Bn Yahoo denied offer .

1y ago

200 Views

Reviewers Guide – AT&T Yahoo! Go Mobile

Reviewers Guide – AT&T Yahoo! Go Mobile AT&T Yahoo! Go Mobile gives you access to a wide range of the Yahoo! services you . select download then select attachments to view and download the attachment. 4 . emoticons, audibles, voice IMs and attach photos to IM conversations. To use Yahoo! Messenger, click on Messenger in the Yahoo! Go .

2y ago

369 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

1y ago

335 Views

Yahoo Microsoft: A Horizontal Romance, or a Broken

News, Finance, Sports and Rivals Entertainment -Yahoo! Music, Movies, TV, Games, Video and omg! Life Style - Yahoo! Autos, Real Estate, Food, Tech, Kids, Health o Connected Life - Co-branded broadband, Yahoo! Moblie Digital Home, Desktop

1y ago

127 Views

2017-2018 GRANDE ÉCOLE MSc in MANAGEMENT

Descriptif des cours Course Outlines 10 Catalogue des cours/ Course Catalog 2017-2018 FIN: Finance/Finance A : Actuariat/Actuarial, Insurance E : Finance d’entreprise/Corporate Finance The course liste tables and the course outlines G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d’Information, Sciences de la Décision et .

3y ago

312 Views

Behavioral Finance and Wealth L Management

Introduction to Behavioral Finance CHAPTER1 What Is Behavioral Finance? Behavioral Finance: The Big Picture Standard Finance versus Behavioral Finance The Role of Behavioral Finance with Private Clients How Practical Application of Behavioral Finance Can Create a Successful Advisory Rel

2y ago

377 Views

Catalogue des Cours Course Catalog - ESSEC Business School

10 Catalogue des cours/Course Catalog 2021-2022 FIN: Finance/Finance E : Finance d'entreprise/Corporate Finance G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d'Information, Sciences de la Décision et Statistiques/ Information Systems, Decision Sciences and Statistics

1y ago

222 Views

kama sastry 2004@yahoo.co.uk in.groups.yahoo .

kama_sastry_2004@yahoo.co.uk up/hot-indi

2y ago

477 Views

IX. “Can You Buy Me Now?”: The Erratic Closing of the .

2016-2017 Developments in Banking law 547 by both parties, Verizon was supposed to purchase Yahoo’s shares for 4,825,800,000.965 Excluded from the transaction were Yahoo’s holdings in Yahoo Japan and Alibaba.966 The sale will end Yahoo’s twenty

2y ago

358 Views

Implementasi Rest Web Service Pada Aplikasi Pengolah Pesan Yahoo . - Core

REST Web Service: Gambar 3. Desain Sistem REST Web Service 3. HASIL DAN PEMBAHASAN 3.1 Gambaran Umum Aplikasi Pada Penelitian ini akan menghasilkan sebuah aplikasi pengolah pesan Yahoo Messenger dan Aplikasi REST Web Service. Aplikasi pengolah pesan Yahoo Messenger berfungsi untuk mengirim dan menerima pesan Yahoo Messenger.

1y ago

165 Views

SINGAPORE - Kelly Services

FINANCE Chief Financial Officer Degree/Master 15 20,000 25,000 Finance Assistant Diploma 1-3 2,800 3,400 Finance Controller Degree 10-15 10,000 18,000 Finance Director Degree 15 15,000 20,000 Finance Executive/ Senior Finance Executive Degree 2-5 3,000 6,000 Finance Manager/ Assistan

2y ago

527 Views

Ministries of Finance and Nationally Determined Contributions

Rodrigo Rojo, IDB Sr. Consultant and advisor to Ministry of Finance of Chile. Colombia German Romero Otalora and Laura Marcela Ruiz Daza — Office of the Vice-Minister — Ministry of Finance. Ireland Paul Ryan — International Finance Division — Ministry of Finance Sean Judge — Department of Finance — Ministry of Finance

1y ago

232 Views

Trade Finance & Supply Chain Finance Awards 2022

In February 2022, Global Finance will publish its annual selections for the World's Best Trade Finance and Supply Chain Finance Providers. Global Finance will name the best trade finance providers in more than 100 countries and territories, eight global regions and

1y ago

215 Views

Vol. 36 No. 7 - tall

Finance Officer Barry Umbs xxtallbarry@aol.com Secretary Mary Kershner tllskr@yahoo.com Editor Megan Lukans pdxmegan@yahoo.com Miss TI Coordinator Erica Hand QueenErica2015@gmail.com Alt. Exec Officer Patty Huggett pjh2637@yahoo.com Treasurer Bob Huggett Sactallbob@gmail.com

1y ago

106 Views

Statistical Learning And Data Mining Stat557

It looks like you're using an ad-blocker