• Have any questions?
  • info.zbook.org@gmail.com

MATLAB Machine Learning - WordPress

8d ago
5 Views
0 Downloads
753.24 KB
11 Pages
Last View : 7d ago
Last Download : n/a
Upload by : Kelvin Chao
Share:
Transcription

How MachineLearning WorksIntroducingMachineLearningMachine learning uses two types of techniques: supervisedlearning, which trains a model on known input and output data sothat it can predict future outputs, and unsupervised learning, whichfinds hidden patterns or intrinsic structures in input data.Machine Learning TechniquesUNSUPERVISEDLEARNINGGroup and interpretdata based onlyon input dataMACHINE Develop predictivemodel based on bothinput and output data4What is MachineLearning?The aim of supervised machine learning is to build a modelthat makes predictions based on evidence in the presence ofuncertainty. A supervised learning algorithm takes a known set ofinput data and known responses to the data (output) and trains amodel to generate reasonable predictions for the responseto new data.Real-World ApplicationsWith the rise in big data, machine learning hasbecome particularly important for solving problemsin areas like these: Machine learning algorithms find natural patterns in datathat generate insight and help you make better decisions andpredictions. They are used every day to make critical decisionsin medical diagnosis, stock trading, energy load forecasting, andmore. Media sites rely on machine learning to sift through millionsof options to give you song or movie recommendations. Retailersuse it to gain insight into their customers’ purchasing behavior.Image processing and computer vision,for face recognition, motion detection, andobject detection Computational biology, for tumordetection, drug discovery, and DNAsequencing Energy production, for price and load Automotive, aerospace, andforecasting Classification techniques predict discrete responses—forexample, whether an email is genuine or spam, or whethera tumor is cancerous or benign. Classification modelsclassify input data into categories. Typical applicationsinclude medical imaging, speech recognition, andcredit scoring. Regression techniques predict continuous responses—for example, changes in temperature or fluctuations inpower demand. Typical applications includeelectricity load forecasting and algorithmic trading.3Introducing Machine LearningThey have data on previous patients, including age,weight, height, and blood pressure. They knowwhether the previous patients had heart attackswithin a year. So the problem is combining theexisting data into a model that can predict whethera new person will have a heart attackwithin a year.Introducing Machine LearningUnsupervisedLearningUnsupervised learning finds hidden patterns or intrinsic structuresin data. It is used to draw inferences from datasets consisting ofinput data without labeled responses.Clustering is the most common unsupervised learningtechnique. It is used for exploratory data analysis to find hiddenpatterns or groupings in data.Applications for clustering include gene sequence analysis,market research, and object recognition.manufacturing, for predictive maintenance Suppose clinicians want to predict whethersomeone will have a heart attack within a year.Computational finance, for creditscoring and algorithmic trading Using Supervised Learning toPredict Heart AttacksSupervised learning uses classification and regression techniquesto develop predictive models.5More Data,More Questions,Better AnswersIntroducing Machine LearningSupervisedLearningMachine learning teaches computers to do what comes naturally tohumans and animals: learn from experience. Machine learning algorithmsuse computational methods to “learn” information directly from datawithout relying on a predetermined equation as a model. The algorithmsadaptively improve their performance as the number of samples availablefor learning increases.REGRESSIONNatural language processing6Introducing Machine LearningClusteringPatterns inthe Data

How Do You DecideWhich Algorithmto Use?Real-World ExamplesSelecting an AlgorithmMACHINE LEARNINGChoosing the right algorithm can seem overwhelming—thereare dozens of supervised and unsupervised machine learningalgorithms, and each takes a different approach to learning.There is no best method or one size fits all. Finding the rightalgorithm is partly just trial and error—even highly experienceddata scientists can’t tell whether an algorithm will work withouttrying it out. But algorithm selection also depends on the size andtype of data you’re working with, the insights you want to get fromthe data, and how those insights will be used.SUPERVISEDLEARNINGOptimizing HVAC Energy Usage inLarge BuildingsUNSUPERVISEDLEARNINGThe heating, ventilation, and air-conditioning (HVAC)systems in office buildings, hospitals, and other largescale commercial buildings are often inefficient becausethey do not take into account changing weather patterns,variable energy costs, or the building’s thermal rt VectorMachinesLinear Regression,GLMK-Means, K-MedoidsFuzzy C-MeansDiscriminantAnalysisSVR, GPRHierarchicalNaive BayesEnsemble MethodsGaussian MixtureNearest NeighborDecision TreesNeural NetworksNeural NetworksHidden MarkovModelBuilding IQ’s cloud-based software platform addressesthis problem. The platform uses advanced algorithmsand machine learning methods to continuouslyprocess gigabytes of information from power meters,thermometers, and HVAC pressure sensors, as well asweather and energy cost. In particular, machine learningis used to segment data and determine the relativecontributions of gas, electric, steam, and solar powerto heating and cooling processes. The building IQplatform reduces HVAC energy consumption in largescale commercial buildings by 10% - 25% during normaloperation.710 Introducing Machine LearningIntroducing Machine LearningReal-World ExamplesWhen ShouldYou Use MachineLearning?Consider using machine learning when you have a complex task orproblem involving a large amount of data and lots of variables, butno existing formula or equation. For example, machine learning is agood option if you need to handle situations like these:Detecting Low-Speed Car CrashesWith more than 8 million members, the RAC is one of theUK’s largest motoring organizations, providing roadsideassistance, insurance, and other services to private andbusiness motorists.Hand-written rules and equationsare too complex—as in facerecognition and speech recognition.The rules of a task are constantlychanging—as in fraud detectionfrom transaction records.The nature of the data keepschanging, and the program needsto adapt—as in automated trading,energy demand forecasting, andpredicting shopping trends.To enable rapid response to roadside incidents,reduce crashes, and mitigate insurance costs, the RACdeveloped an onboard crash sensing system that usesadvanced machine learning algorithms to detect lowspeed collisions and distinguish these events from morecommon driving events, such as driving over speedbumps or potholes. Independent tests showed the RACsystem to be 92% accurate in detecting test crashes.8Introducing Machine LearningReal-World Examples11 Introducing Machine LearningLearn MoreReady for a deeper dive? Explore these resources to learn more aboutmachine learning methods, examples, and tools.Creating Algorithms that Can AnalyzeWorks of ArtWatchResearchers at the Art and Artificial IntelligenceMachine Learning Made Easy 34:34Laboratory at Rutgers University wanted to see whethera computer algorithm could classify paintings by style,Signal Processing and Machine Learning Techniques for Sensor Data Analytics 42:45genre, and artist as easily as a human. They began byidentifying visual features for classifying a painting’sstyle. The algorithms they developed classified thestyles of paintings in the database with 60% accuracy,ReadMachine Learning Blog Posts: Social Network Analysis, Text Mining, Bayesian Reasoning, and moreoutperforming typical non-expert humans.The Netflix Prize and Production Machine Learning Systems: An Insider LookThe researchers hypothesized that visual features usefulMachine Learning Challenges: Choosing the Best Model and Avoiding Overfittingfor style classification (a supervised learning problem)could also be used to determine artistic influences (anunsupervised problem).They used classification algorithms trained on Googleimages to identify specific objects. They tested thealgorithms on more than 1,700 paintings from 66different artists working over a span of 550 years. TheExploreMATLAB Machine Learning ExamplesMachine Learning SolutionsClassify Data with the Classification Learner Appalgorithm readily identified connected works, includingthe influence of Diego Velazquez’s “Portrait of PopeInnocent X” on Francis Bacon’s “Study After Velazquez’sPortrait of Pope Innocent X.”9Introducing Machine Learning 2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for a list of additional trademarks.Other product or brand names may be trademarks or registered trademarks of their respective holders.92991v00

Questions to Consider Before You StartEvery machine learning workflow begins with three questions:Getting Started withMachine Learning What kind of data are you working with?MACHINE LEARNING What insights do you want to get from it? How and where will those insights be applied?Your answers to these questions help you decide whether to usesupervised or unsupervised learning.SUPERVISEDLEARNINGChoose supervised learning if you need to train a modelto make a prediction--for example, the future value ofa continuous variable, such as temperature or a stockprice, or a classification—for example, identify makes ofcars from webcam video ONChoose unsupervised learning if youneed to explore your data and want totrain a model to find a good internalrepresentation, such as splitting data upinto clusters.Getting Started with Machine Learning4Getting Started with Machine Learning5Workflow at a GlanceRarely a Straight LineWith machine learning there’s rarely a straight line from start tofinish—you’ll find yourself constantly iterating and trying different ideasand approaches. This chapter describes a systematic machine learningworkflow, highlighting some key decision points along the way.1. ACCESS and load the data.4. TRAIN models using thefeatures derived in step 3.2. PREPROCESS the data.5. ITERATE to find the best model.3. DERIVE features usingthe preprocessed data.6. INTEGRATE the best-trainedmodel into a production system.In the next sections we’ll look at the steps in more detail, using ahealth monitoring app for illustration. The entire workflow will be completed in MATLAB .Machine Learning ChallengesTraining a Model to Classify Physical ActivitiesMost machine learning challenges relate to handling your data andfinding the right model.This example is based on a cell phone health-monitoring app.The input consists of three-axial sensor data from the phone’saccelerometer and gyroscope. The responses, (or output), are theactivities performed–walking, standing, running, climbing stairs,or lying down.Data comes in all shapes and sizes. Real-world datasets can bemessy, incomplete, and in a variety of formats. You might justhave simple numeric data. But sometimes you’re combining severaldifferent data types, such as sensor signals, text, and streamingimages from a camera.We want to use the input data to train a classification model toidentify these activities. Since our goal is classification, we’ll beapplying supervised learning.Preprocessing your data might require specialized knowledge andtools. For example, to select features to train an object detectionalgorithm requires specialized knowledge of image processing.Different types of data require different approaches to preprocessing.The trained model (or classifier) will be integrated into an app tohelp users track their activity levels throughout the day.MACHINE LEARNINGIt takes time to find the best model to fit the data. Choosing theright model is a balancing act. Highly flexible models tend to overfitdata by modeling minor variations that could be noise. On theother hand, simple models may assume too much. There are alwaystradeoffs between model speed, accuracy, and complexity.Sounds daunting? Don’t be discouraged. Remember that trialand error is at the core of machine learning—if one approach oralgorithm doesn’t work, you simply try another. But a systematicworkflow will help you get off to a smooth start.Getting Started with Machine Learning3Getting Started with Machine Learning6

1 Step One: Load the Data3 Step Three: Derive FeaturesTo load data from the accelerometer and gyroscope we dothe following:1.Sit down holding the phone, log data from the phone,and store it in a text file labeled “Sitting.”2.Stand up holding the phone, log data from the phone,and store it in a second text file labeled “Standing.”3.Repeat the steps until we have data for each activity wewant to classify.Deriving features (also known as feature engineering or featureextraction) is one of the most important parts of machine learning.It turns raw data into information that a machine learning algorithmcan use.For the activity tracker, we want to extract features that capture thefrequency content of the accelerometer data. These features willhelp the algorithm distinguish between walking (low frequency)and running (high frequency). We create a new table that includesthe selected features.We store the labeled data sets in a text file. A flat file format suchas text or CSV is easy to work with and makes it straightforward toimport data.Use feature selection to:Machine learning algorithms aren’t smart enough to tell thedifference between noise and valuable information.Before using the data for training, we need to make sure it’sclean and complete.Getting Started with Machine LearningImprove the accuracy of a machine learning algorithm Boost model performance for high-dimensional data sets Improve model interpretability Prevent overfitting72 Step Two: Preprocess the DataWe import the data into MATLAB and plot each labeled set.To preprocess the data we do the following: Getting Started with Machine Learning3 Step Three: Derive Features continuedraw dataoutliersThe number of features that you could derive is limited only by your imagination. However, there are a lot of techniquescommonly used for different types of data.1. Look for outliers–data points that lie outside therest of the data.Data TypeFeature Selection TaskTechniquesSensor dataExtract signal properties from raw sensordata to create higher-level informationPeak analysis – perform an fft and identify dominant frequenciesExtract features such as edge locations,resolution, and colorBag of visual words – create a histogram of local image features, such asedges, corners, and blobsWe must decide whether the outliers can be ignored or whetherthey indicate a phenomenon that the model should account for.In our example, they can safely be ignored (it turns out that wemoved unintentionally while recording the data).Pulse and transition metrics – derive signal characteristics such as risetime, fall time, and settling timeSpectral measurements – plot signal power, bandwidth, mean frequency,and median frequency2. Check for missing values (perhaps we lost databecause the connection dropped during recording).We could simply ignore the missing values, but this will reducethe size of the data set. Alternatively, we could substituteapproximations for the missing values by interpolating or usingcomparable data from another sample.10Image andvideo dataHistogram of oriented gradients (HOG) – create a histogram of localgradient directionsMinimum eigenvalue algorithm – detect corner locations in imagesOutliers in the activity-tracking data.Edge detection – identify points where the degree of brightness changessharplyIn many applications, outliers provide crucial information.For example, in a credit card fraud detection app, theyindicate purchases that fall outside a customer’s usualbuying patterns.Getting Started with Machine LearningTransactional dataCalculate derived values that enhancethe information in the dataTimestamp decomposition – break timestamps down into components suchas day and monthAggregate value calculation – create higher-level features such as the totalnumber of times a particular event occurred82 Step Two: Preprocess the Data continuedGetting Started with Machine Learning114 Step Four: Build and Train the ModelWhen building a model, it’s a good idea to start with somethingsimple; it will be faster to run and easier to interpret.3. Remove gravitational effects from the accelerometerdata so that our algorithm will focus on the movement ofthe subject, not the movement of the phone. A simple highpass filter such as a biquad filter is commonly used for this.We start with a basic decision tree.feat53 335.449feat53 335.449Sittingfeat3 2.50002feat3 2.50002feat56 12686SittingWalking 1%feat11 0.45feat11 0.45Walking 1% 1% 99% 1%1%93%5% 1%40%59%Running unningBy testing your model against data that wasn’t usedin the modeling process, you see how it will performwith unknown data. 1%99%feat56 12686TRUECLASSStanding 99%DStandingRu4. Divide the data into two sets. We save part of the data fortesting (the test set) and use the rest (the training set) to buildmodels. This is referred to as holdout, and is a useful crossvalidation technique.To see how well it performs, we plot the confusion matrix, a tablethat compares the classifications made by the model with theactual class labels that we created in step 1.PREDICTED CLASSThe confusion matrix shows that our model is having troubledistinguishing between dancing and running. Maybe a decisiontree doesn’t work for this type of data. We’ll try a fewdifferent algorithms.Getting Started with Machine Learning9Getting Started with Machine Learning12

4 Step Four: Build and Train the Model continuedWe start with a K-nearest neighbors (KNN), a simple algorithmthat stores all the training data, compares new points to thetraining data, and returns the most frequent class of the “K”nearest points. That gives us 98% accuracy compared to 94.1%for the simple decision tree. The confusion matrix looksbetter, too:SittingStandingTRUECLASS 99%We try a linear discriminant model, but that doesn’t improve theresults. Finally, we try a multiclass support vector machine (SVM).The SVM does very well—we now get 99% accuracy:SittingStanding1%2%98%Running 1%1%97%1%Dancing1%1%6%92%TRUECLASS 99% 1%Machine Learning Made Easy 34:34Signal Processing and Machine Learning Techniques for Sensor Data Analytics 42:45 1%Read 99% 1%Supervised Learning Workflow and Algorithms 1% 99%WalkingRunning 1%DancingData-Driven Insights with MATLAB Analytics: An Energy Load Forecasting Case Study2% 1%3%96%ExploreMATLAB Machine Learning Examplesgngngngincikinnan98%Ru SiStPREDICTED CLASSWatchD99%1%Ready for a deeper dive? Explore these resources to learn more aboutmachine learning methods, examples, and tools.However, KNNs take a considerable amount of memory to run,since they require all the training data to make a prediction. 1%WalkingLearn MoreClassify Data with the Classification Learner AppPREDICTED CLASSWe achieved our goal by iterating on the model and tryingdifferent algorithms. If our classifier still couldn’t reliablydifferentiate between dancing and running, we’d look into ways toimprove the model.Getting Started with Machine Learning13 2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for a list of additional trademarks.Other product or brand names may be trademarks or registered trademarks of their respective holders.5 Step Five: Improve the ModelImproving a model can take two different directions: make themodel simpler or add complexity.SimplifyFirst, we look for opportunities to reduce the number of features.Popular feature reduction techniques include: Correlation matrix – shows the relationship betweenvariables, so that variables (or features) that are not highlycorrelated can be removed.A good model includes only the features with the mostpredictive power. A simple model that generalizes well isbetter than a complex model that may not generalize ortrain well to new data.In machine learning, as in many othercomputational processes, simplifying themodel makes it easier to understand,more robust, and more computationallyefficient.ApplyingUnsupervised Learning Principal component analysis (PCA) – eliminatesredundancy by finding a combination of features thatcaptures key distinctions between the original features andbrings out strong patterns in the dataset. Sequential feature reduction – reduces featuresiteratively on the model until there is no improvementin performance.Next, we look at ways to reduce the model itself. We cando this by: Pruning branches from a decision tree Removing learners from an ensembleGetting Started with Machine Learning145 Step Five: Improve the Model continuedWhen to ConsiderUnsupervised LearningAdd ComplexityIf our model can’t differentiate dancing from running because it isover-generalizing, then we need find ways to make it morefine-tuned. To do this we can either:Unsupervised learning is useful when you want to explore your data butdon’t yet have a specific goal or are not sure what information the datacontains. It’s also a good way to reduce the dimensions of your data. Use model combination – merge multiple simpler models intoa larger model that is better able to represent the trends inthe data than any of the simpler models could on their own. Add more data sources – look at the gyroscope data aswell as the acceleromter data. The gyroscope records theorientation of the cell phone during activity. This data mightprovide unique signatures for the different activities; forexample, there might be a combination of acceleration androtation that’s unique to running.Once we’ve adjusted the model, we validate its performance onthe test data that we set aside during preprocessing.If the model can reliably classify activities on the test data, we’reready to move it to the phone and start tracking.Getting Started with Machine Learning1593014v00

Unsupervised Learning TechniquesCommon Hard Clustering Algorithms continuedAs we saw in section 1, most unsupervised learning techniques area form of cluster analysis.Example: Using k-Means Clustering to Site Cell Phone TowersA cell phone company wants to know the number and placementof cell phone towers that will provide the most reliable service. Foroptimal signal reception, the towers must be located withinclusters of people.In cluster analysis, data is partitioned into groups based on somemeasure of similarity or shared characteristic. Clusters are formedso that objects in the same cluster are very similar and objects indifferent clusters are very distinct.The workflow begins with an initial guess at the number of clustersthat will be needed. To evaluate this guess, the engineers compareservice with three towers and four towers to see how well they’reable to cluster for each scenario (in other words, how well thetowers provide service).Clustering algorithms fall into two broad groups: Hard clustering, where each data point belongs to onlyone cluster Soft clustering, where each data point can belong to morethan one clusterA phone can only talk to one tower at a time, so this is a hardclustering problem. The team uses k-means clustering becausek-means treats each observation in the data as an object havinga location in space. It finds a partition in which objects withineach cluster are as close to each other as possible and as far fromobjects in other clusters as possible.Gaussian mixture model used to separate data into two clusters.You can use hard or soft clustering techniques if you already knowthe possible data groupings.If you don’t yet know how the data might be grouped: Use self-organizing feature maps or hierarchicalclustering to look for possible structures in the data. Use cluster evaluation to look for the “best” numberof groups for a given clustering algorithm.Applying Unsupervised LearningAfter running the algorithm, the team can accurately determine theresults of partitioning the data into three and four clusters.3Applying Unsupervised LearningCommon Hard Clustering AlgorithmsCommon Soft Clustering Algorithmsk-Meansk-MedoidsFuzzy c-MeansGaussian Mixture ModelHow it WorksPartitions data into k number of mutually exclusive clusters.How well a point fits into a cluster is determined by thedistance from that point to the cluster’s center.How It WorksSimilar to k-means, but with the requirement that the clustercenters coincide with points in the data.How it WorksPartition-based clustering when data points may belong tomore than one cluster.Best Used.Best Used.How It WorksPartition-based clustering where data points come fromdifferent multivariate normal distributions with certainprobabilities.Best Used. When the number of clusters is known When the number of clusters is known When the number of clusters is known For fast clustering of categorical data For pattern recognition For fast clustering of large data sets To scale to large data sets When clusters overlapResult: Cluster centersResult: Cluster centers thatcoincide with data pointsApplying Unsupervised LearningBest Used. When a data point might belong to more thanone cluster When clusters have different sizes and correlationstructures within themResult: Cluster centers(similar to k-means) butwith fuzziness so thatpoints may belong tomore than one cluster4Result: A model ofGaussian distributionsthat give probabilities ofa point being in a clusterApplying Unsupervised LearningCommon Hard Clustering Algorithms continuedCommon Soft Clustering Algorithms continuedHierarchical ClusteringSelf-Organizing MapHow it WorksProduces nested sets of clusters by analyzing similaritiesbetween pairs of points and grouping objects into a binary,hierarchical tree.How It WorksNeural-network based clustering that transforms a datasetinto a topology-preserving 2D map.Example: Using Fuzzy c-Means Clustering to AnalyzeGene Expression DataBest Used.7A team of biologists is analyzing gene expression data frommicroarrays to better understand the genes involved in normal andabnormal cell division. (A gene is said to be “expressed” if it isactively involved in a cellular function such as protein production.)Best Used. To visualize high-dimensional data in 2D or 3D When you don’t know in advance how many clustersare in your data6The microarray contains expression data from two tissue samples.The researchers want to compare the samples to determine whethercertain patterns of gene expression are implicated incancer proliferation. To deduce the dimensionality of data by preserving itstopology (shape) You want visualization to guideyour selectionAfter preprocessing the data to remove noise, they cluster the data.Because the same genes can be involved in several biologicalprocesses, no single gene is likely to belong to one cluster only.The researchers apply a fuzzy c-means algorithm to the data. Theythen visualize the clusters to identify groups of genes that behave ina similar way.Result: Dendrogram showingthe hierarchical relationshipbetween clustersResult:Lower-dimensional(typically 2D)representationApplying Unsupervised Learning5Applying Unsupervised Learning8

Improving Models with Dimensionality ReductionUsing Factor AnalysisMachine learning is an effective method for finding patterns inbig datasets. But bigger data brings added complexity.Your dataset might contain measured variables that overlap,meaning that they are dependent on one another. Factoranalysis lets you fit a model to multivariate data to estimatethis sort of interdependence.As datasets get bigger, you frequently need to reduce thenumber of features, or dimensionality.In a factor analysis model, the measured variables depend ona smaller number of unobserved (latent) factors. Because eachfactor might affect several variables, it is known as a commonfactor. Each variable is assumed to be dependent on a linearcombination of the common factors.Example: EEG Data ReductionExample: Tracking Stock Price VariationSuppose you have electroencephalogram (EEG) data that captureselectrical activity of the brain, and you want to use this data topredict a future seizure. The data was captured using dozens ofleads, each corresponding to a variable in your original dataset.Each of these variables contains noise. To make your predictionalgorithm more robust, you use dimensionality reduction techniquesto derive a smaller number of features. Because these features arecalculated from multiple sensors, they will be less susceptible tonoise in an individual sensor than would be the case if you usedthe raw data directly.Over the course of 100 weeks, the percent change in stock priceshas been recorded for ten companies. Of these ten, four aretechnology companies, three are financial, and a further threeare retail. It seems reasonable to assume that the stock pricesfor companies in the same sector will vary together as economicconditions change. Factor analysis can provide quantitativeevidence to support this premise.Applying Unsupervised Learning9Applying Unsupervised LearningCommon Dimensionality Reduction TechniquesUsing Nonnegative Matrix FactorizationThe three most commonly used dimensionality reductiontechniques are:This dimension reduction technique is based on a low-rankapproximation of the feature space. In addition to reducingthe number of features, it guarantees that the features arePrincipal component analysis (PCA)—performs a lineartransformation on the data so that most of the variance orinformation in your high-dimensional dataset is captured by thefirst few principal components. The first principal componentwill capture the

scoring and algorithmic trading Image processing and computer vision, for face recognition, motion detection, and object detection Computational biology, for tumor detection, drug discovery, and DNA sequencing Energy production, fo