Introduction To Pattern Recognition And Machine Learning - Lagout

1y ago
10 Views
2 Downloads
2.29 MB
402 Pages
Last View : 2d ago
Last Download : 3m ago
Upload by : Aarya Seiber
Transcription

8037 9789814335454 tp.indd 126/2/15 12:15 pm

IISc Lecture Notes SeriesISSN: 2010-2402Editor-in-Chief: Gadadhar MisraEditors: Chandrashekar S JogJoy KuriK L SebastianDiptiman SenSandhya VisweswariahPublished:Vol. 1: Introduction to Algebraic Geometry and Commutative Algebraby Dilip P Patil & Uwe StorchVol. 2: Schwarz’s Lemma from a Differential Geometric Veiwpointby Kang-Tae Kim & Hanjin LeeVol. 3: Noise and Vibration Controlby M L MunjalVol. 4: Game Theory and Mechanism Designby Y NarahariVol. 5Introduction to Pattern Recognition and Machine Learningby M. Narasimha Murty & V. Susheela DeviDipa - Introduction to pattern recognition.indd 110/4/2015 1:29:09 PM

World Scientific8037 9789814335454 tp.indd 226/2/15 12:15 pm

Published byWorld Scientific Publishing Co. Pte. Ltd.5 Toh Tuck Link, Singapore 596224USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601UK office: 57 Shelton Street, Covent Garden, London WC2H 9HELibrary of Congress Cataloging-in-Publication DataMurty, M. Narasimha.Introduction to pattern recognition and machine learning / by M Narasimha Murty &V Susheela Devi (Indian Institute of Science, India).pages cm. -- (IISc lecture notes series, 2010–2402 ; vol. 5)ISBN 978-98143354541. Pattern recognition systems. 2. Machine learning. I. Devi, V. Susheela. II. Title.TK7882.P3M87 2015006.4--dc232014044796British Library Cataloguing-in-Publication DataA catalogue record for this book is available from the British Library.Copyright 2015 by World Scientific Publishing Co. Pte. Ltd.All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,electronic or mechanical, including photocopying, recording or any information storage and retrievalsystem now known or to be invented, without written permission from the publisher.For photocopying of material in this volume, please pay a copying fee through the Copyright ClearanceCenter, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopyis not required from the publisher.In-house Editors: Chandra Nugraha/Dipasri SardarTypeset by Stallion PressEmail: enquiries@stallionpress.comPrinted in SingaporeDipa - Introduction to pattern recognition.indd 310/4/2015 1:29:09 PM

Series PrefaceWorld Scientific Publishing Company - Indian Institute of Science CollaborationIISc Press and WSPC are co-publishing books authored by world renowned scientists and engineers. This collaboration, started in 2008 during IISc’s centenaryyear under a Memorandum of Understanding between IISc and WSPC, has resultedin the establishment of three Series: IISc Centenary Lectures Series (ICLS), IIScResearch Monographs Series (IRMS), and IISc Lecture Notes Series (ILNS).This pioneering collaboration will contribute significantly in disseminating currentIndian scientific advancement worldwide.The “IISc Centenary Lectures Series” will comprise lectures by designatedCentenary Lecturers - eminent teachers and researchers from all over the world.The “IISc Research Monographs Series” will comprise state-of-the-art monographs written by experts in specific areas. They will include, but not limited to,the authors’ own research work.The “IISc Lecture Notes Series” will consist of books that are reasonably selfcontained and can be used either as textbooks or for self-study at the postgraduatelevel in science and engineering. The books will be based on material that has beenclass-tested for most part.Editorial Board for the IISc Lecture Notes Series (ILNS):Gadadhar Misra, Editor-in-Chief (gm@math.iisc.ernet.in)Chandrashekar S Jog (jogc@mecheng.iisc.ernet.in)Joy Kuri (kuri@cedt.iisc.ernet.in)K L Sebastian (kls@ipc.iisc.ernet.in)Diptiman Sen (diptiman@cts.iisc.ernet.in)Sandhya Visweswariah (sandhya@mrdg.iisc.ernet.in)Dipa - Introduction to pattern recognition.indd 210/4/2015 1:29:09 PM

May 2, 201314:6BC: 8831 - Probability and Statistical TheoryThis page intentionally left blankPST ws

April 8, 201513:2Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-fmpage viiTable of ContentsAbout the ��ers: An Introduction . . . . . . . . . . . . . .An Introduction to Clustering . . . . . . . . . . . . .Machine Learning . . . . . . . . . . . . . . . . . . .Types of Data1.2.3.4.5142537Features and Patterns . . . . . . . . . . . .Domain of a Variable . . . . . . . . . . . .Types of Features . . . . . . . . . . . . . .3.1. Nominal data . . . . . . . . . . . . .3.2. Ordinal data . . . . . . . . . . . . . .3.3. Interval-valued variables . . . . . . .3.4. Ratio variables . . . . . . . . . . . . .3.5. Spatio-temporal data . . . . . . . . .Proximity measures . . . . . . . . . . . . .4.1. Fractional norms . . . . . . . . . . .4.2. Are metrics essential? . . . . . . . . .4.3. Similarity between vectors . . . . . .4.4. Proximity between spatial patterns .4.5. Proximity between temporal patternsvii.3739414145484949505657596162

April 8, 201513:2Introduction to Pattern Recognition and Machine Learning - 9in x 6inviiipage viiiTable of Contents4.6.4.7.4.8.4.9.3.b1904-fmMean dissimilarity . . . . . . . .Peak dissimilarity . . . . . . . .Correlation coefficient . . . . . .Dynamic Time Warping (DTW). . . . . . . . . . . . .distance.Feature Extraction and Feature Selection1.2.3.4.5.6.7.8.9.10.11.12.13.Types of Feature Selection . . . . . . . . . . . . . .Mutual Information (MI) for Feature Selection . .Chi-square Statistic . . . . . . . . . . . . . . . . .Goodman–Kruskal Measure . . . . . . . . . . . . .Laplacian Score . . . . . . . . . . . . . . . . . . . .Singular Value Decomposition (SVD) . . . . . . .Non-negative Matrix Factorization (NMF) . . . . .Random Projections (RPs) for FeatureExtraction . . . . . . . . . . . . . . . . . . . . . .8.1. Advantages of random projections . . . . . .Locality Sensitive Hashing (LSH) . . . . . . . . . .Class Separability . . . . . . . . . . . . . . . . . .Genetic and Evolutionary Algorithms . . . . . . .11.1. Hybrid GA for feature selection . . . . . . .Ranking for Feature Selection . . . . . . . . . . . .12.1. Feature selection based on an optimizationformulation . . . . . . . . . . . . . . . . . . .12.2. Feature ranking using F-score . . . . . . . .12.3. Feature ranking using linear support vectormachine (SVM) weight vector . . . . . . . .12.4. Ensemble feature ranking . . . . . . . . . . .12.5. Feature ranking using numberof label changes . . . . . . . . . . . . . . . .Feature Selection for Time Series Data . . . . . . .13.1. Piecewise aggregate approximation . . . . .13.2. Spectral decomposition . . . . . . . . . . . .13.3. Wavelet decomposition . . . . . . . . . . . .13.4. Singular Value Decomposition (SVD) . . . .13.5. Common principal component loading basedvariable subset selection (CLeVer) . . . . . 101.103103103104104104.104

April 8, 201513:2Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-fmTable of Contents4.5.ixBayesian Learning1.2.3.4.5.6.Document Classification . . . . . . . . . . . .Naive Bayes Classifier . . . . . . . . . . . . .Frequency-Based Estimation of ProbabilitiesPosterior Probability . . . . . . . . . . . . . .Density Estimation . . . . . . . . . . . . . . .Conjugate Priors . . . . . . . . . . . . . . . .111.Classification1.2.3.4.5.6.7.Classification Without Learning . . . . . . .Classification in High-Dimensional Spaces . .2.1. Fractional distance metrics . . . . . . .2.2. Shrinkage–divergence proximity (SDP)Random Forests . . . . . . . . . . . . . . . .3.1. Fuzzy random forests . . . . . . . . . .Linear Support Vector Machine (SVM) . . .4.1. SVM–kNN . . . . . . . . . . . . . . . .4.2. Adaptation of cutting plane algorithm4.3. Nystrom approximated SVM . . . . . .Logistic Regression . . . . . . . . . . . . . . .Semi-supervised Classification . . . . . . . . .6.1. Using clustering algorithms . . . . . . .6.2. Using generative models . . . . . . . .6.3. Using low density separation . . . . . .6.4. Using graph-based methods . . . . . .6.5. Using co-training methods . . . . . . .6.6. Using self-training methods . . . . . . .6.7. SVM for semi-supervised classification6.8. Random forests for semi-supervisedclassification . . . . . . . . . . . . . . .Classification of Time-Series Data . . . . . .7.1. Distance-based classification . . . . . .7.2. Feature-based classification . . . . . . .7.3. Model-based classification . . . . . . .page 4155156159160160161162164165166.166167168169170

April 8, 201513:2Introduction to Pattern Recognition and Machine Learning - 9in x 6inx6.Table of ContentsClassification using Soft Computing Techniques1.2.3.4.5.6.7.b1904-fmIntroduction . . . . . . . . . . . . . . . . . . .Fuzzy Classification . . . . . . . . . . . . . . .2.1. Fuzzy k-nearest neighbor algorithm . . .Rough Classification . . . . . . . . . . . . . . .3.1. Rough set attribute reduction . . . . . .3.2. Generating decision rules . . . . . . . . .GAs . . . . . . . . . . . . . . . . . . . . . . . .4.1. Weighting of attributes using GA . . . .4.2. Binary pattern classification using GA .4.3. Rule-based classification using GAs . . .4.4. Time series classification . . . . . . . . .4.5. Using generalized Choquet integral withsigned fuzzy measure for classificationusing GAs . . . . . . . . . . . . . . . . .4.6. Decision tree induction usingEvolutionary algorithms . . . . . . . . .Neural Networks for Classification . . . . . . .5.1. Multi-layer feed forward networkwith backpropagation . . . . . . . . . . .5.2. Training a feedforward neural networkusing GAs . . . . . . . . . . . . . . . . .Multi-label Classification . . . . . . . . . . . .6.1. Multi-label kNN (mL-kNN) . . . . . . .6.2. Probabilistic classifier chains (PCC) . . .6.3. Binary relevance (BR) . . . . . . . . . .6.4. Using label powersets (LP) . . . . . . . .6.5. Neural networks for Multi-labelclassification . . . . . . . . . . . . . . . .6.6. Evaluation of multi-label classification .177.177178179179180181182182184185187. . .187. . . . .191195. . .197.199202203204205205. . . . .206209.Data Clustering2151.2.215218219Number of Partitions . . . . . . . . . . . . . . . . .Clustering Algorithms . . . . . . . . . . . . . . . . .2.1. K-means algorithm . . . . . . . . . . . . . . .page x

April 8, 201513:2Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-fmTable of Contents2.2.2.3.3.4.5.8.Leader algorithm . . . . . . . . . . .BIRCH: Balanced Iterative Reducingand Clustering using Hierarchies . . .2.4. Clustering based on graphs . . . . . .Why Clustering? . . . . . . . . . . . . . . .3.1. Data compression . . . . . . . . . . .3.2. Outlier detection . . . . . . . . . . .3.3. Pattern synthesis . . . . . . . . . . .Clustering Labeled Data . . . . . . . . . . .4.1. Clustering for classification . . . . . .4.2. Knowledge-based clustering . . . . .Combination of Clusterings . . . . . . . . .xi. . . . .223.225230241241242243246246250255.Soft Clustering1.2.3.4.5.6.7.page xiSoft Clustering Paradigms . . . . . . . . . . .Fuzzy Clustering . . . . . . . . . . . . . . . .2.1. Fuzzy K-means algorithm . . . . . . .Rough Clustering . . . . . . . . . . . . . . . .3.1. Rough K-means algorithm . . . . . . .Clustering Based on Evolutionary AlgorithmsClustering Based on Neural Networks . . . .Statistical Clustering . . . . . . . . . . . . . .6.1. OKM algorithm . . . . . . . . . . . . .6.2. EM-based clustering . . . . . . . . . . .Topic Models . . . . . . . . . . . . . . . . . .7.1. Matrix factorization-based methods . .7.2. Divide-and-conquer approach . . . . .7.3. Latent Semantic Analysis (LSA) . . . .7.4. SVD and PCA . . . . . . . . . . . . . .7.5. Probabilistic Latent Semantic Analysis(PLSA) . . . . . . . . . . . . . . . . . .7.6. Non-negative Matrix Factorization(NMF) . . . . . . . . . . . . . . . . . .7.7. LDA . . . . . . . . . . . . . . . . . . .7.8. Concept and topic . . . . . . . . . . . . . . .307. . . . . . . . . .310311316

April 8, 201513:2xii9.Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-fmTable of ContentsApplication — Social and Information Networks1.2.3.4.5.6.7.IndexIntroduction . . . . . . . . . . . . . . . . . . .Patterns in Graphs . . . . . . . . . . . . . . . .Identification of Communities in Networks . . .3.1. Graph partitioning . . . . . . . . . . . .3.2. Spectral clustering . . . . . . . . . . . . .3.3. Linkage-based clustering . . . . . . . . .3.4. Hierarchical clustering . . . . . . . . . .3.5. Modularity optimization for partitioninggraphs . . . . . . . . . . . . . . . . . . .Link Prediction . . . . . . . . . . . . . . . . . .4.1. Proximity functions . . . . . . . . . . . .Information Diffusion . . . . . . . . . . . . . .5.1. Graph-based approaches . . . . . . . . .5.2. Non-graph approaches . . . . . . . . . .Identifying Specific Nodes in a Social NetworkTopic Models . . . . . . . . . . . . . . . . . . .7.1. Probabilistic latent semantic analysis(pLSA) . . . . . . . . . . . . . . . . . . .7.2. Latent dirichlet allocation (LDA) . . . .7.3. Author–topic model . . . . . . . . . . . 5. . . . . . .355357359365page xii

April 8, 201513:2Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-fmAbout the AuthorsProfessor M. Narasimha Murty completed his B.E., M.E., andPh.D. at the Indian Institute of Science (IISc), Bangalore. He joinedIISc as an Assistant Professor in 1984. He became a professor in 1996and currently he is the Dean, Engineering Faculty at IISc. He hasguided more than 20 doctoral students and several masters studentsover the past 30 years at IISc; most of these students have worked inthe areas of Pattern Recognition, Machine Learning, and Data Mining. A paper co-authored by him on Pattern Clustering has around9600 citations as reported by Google scholar. A team led by himhad won the KDD Cup on the citation prediction task organized bythe Cornell University in 2003. He is elected as a fellow of both theIndian National Academy of Engineering and the National Academyof Sciences.Dr. V. Susheela Devi completed her PhD at the Indian Instituteof Science in 2000. Since then she has worked as a faculty in theDepartment of Computer Science and Automation at the IndianInstitute of Science. She works in the areas of Pattern Recognition, Data Mining, Machine Learning, and Soft Computing. She hastaught the courses Data Mining, Pattern Recognition, Data Structures and Algorithms, Computational Methods of Optimization andArtificial Intelligence. She has a number of papers in internationalconferences and journals.xiiipage xiii

May 2, 201314:6BC: 8831 - Probability and Statistical TheoryThis page intentionally left blankPST ws

April 8, 201513:2Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-fmPrefacePattern recognition (PR) is a classical area and some of the importanttopics covered in the books on PR include representation of patterns,classification, and clustering. There are different paradigms for pattern recognition including the statistical and structural paradigms.The structural or linguistic paradigm has been studied in the earlydays using formal language tools. Logic and automata have beenused in this context. In linguistic PR, patterns could be representedas sentences in a logic; here, each pattern is represented using a setof primitives or sub-patterns and a set of operators. Further, a classof patterns is viewed as being generated using a grammar; in otherwords, a grammar is used to generate a collection of sentences orstrings where each string corresponds to a pattern. So, the classification model is learnt using some grammatical inference procedure;the collection of sentences corresponding to the patterns in the classare used to learn the grammar. A major problem with the linguisticapproach is that it is suited to dealing with structured patterns andthe models learnt cannot tolerate noise.On the contrary the statistical paradigm has gained a lot ofmomentum in the past three to four decades. Here, patterns areviewed as vectors in a multi-dimensional space and some of theoptimal classifiers are based on Bayes rule. Vectors correspondingto patterns in a class are viewed as being generated by the underlying probability density function; Bayes rule helps in converting theprior probabilities of the classes into posterior probabilities using thexvpage xv

April 8, 2015xvi13:2Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-fmPrefacelikelihood values corresponding to the patterns given in each class.So, estimation schemes are used to obtain the probability densityfunction of a class using the vectors corresponding to patterns in theclass. There are several other classifiers that work with vector representation of patterns. We deal with statistical pattern recognition inthis book.Some of the simplest classification and clustering algorithms arebased on matching or similarity between vectors. Typically, two patterns are similar if the distance between the corresponding vectors islesser; Euclidean distance is popularly used. Well-known algorithmsincluding the nearest neighbor classifier (NNC), K-nearest neighborclassifier (KNNC), and the K-Means Clustering algorithm are basedon such distance computations. However, it is well understood in theliterature that distance between two vectors may not be meaningful if the vectors are in large-dimensional spaces which is the case inseveral state-of-the-art application areas; this is because the distancebetween a vector and its nearest neighbor can tend to the distancebetween the pattern and its farthest neighbor as the dimensionalityincreases. This prompts the need to reduce the dimensionality of thevectors. We deal with the representation of patterns, different typesof components of vectors and the associated similarity measures inChapters 2 and 3.Machine learning (ML) also has been around for a while;early efforts have concentrated on logic or formal language-basedapproaches. Bayesian methods have gained prominence in ML inthe recent decade; they have been applied in both classification andclustering. Some of the simple and effective classification schemesare based on simplification of the Bayes classifier using some acceptable assumptions. Bayes classifier and its simplified version calledthe Naive Bayes classifier are discussed in Chapter 4. Traditionally there has been a contest between the frequentist approacheslike the Maximum-likelihood approach and the Bayesian approach.In maximum-likelihood approaches the underlying density is estimated based on the assumption that the unknown parameters aredeterministic; on the other hand the Bayesian schemes assume thatthe parameters characterizing the density are unknown random variables. In order to make the estimation schemes simpler, the notionpage xvi

April 8, 201513:2Introduction to Pattern Recognition and Machine Learning - 9in x 6inPrefaceb1904-fmpage xviixviiof conjugate pair is exploited in the Bayesian methods. If for a givenprior density, the density of a class of patterns is such that, the posterior has the same density function as the prior, then the prior andthe class density form a conjugate prior. One of the most exploited inthe context of clustering are the Dirichlet prior and the Multinomialclass density which form a conjugate pair. For a variety of such conjugate pairs it is possible to show that when the datasets are largein size, there is no difference between the maximum-likelihood andthe Bayesian estimates. So, it is important to examine the role ofBayesian methods in Big Data applications.Some of the most popular classifiers are based on support vectormachines (SVMs), boosting, and Random Forest. These are discussedin Chapter 5 which deals with classification. In large-scale applications like text classification where the dimensionality is large, linearSVMs and Random Forest-based classifiers are popularly used. Theseclassifiers are well understood in terms of their theoretical properties.There are several applications where each pattern belongs to morethan one class; soft classification schemes are required to deal withsuch applications. We discuss soft classification schemes in Chapter 6.Chapter 7 deals with several classical clustering algorithms includingthe K-Means algorithm and Spectral clustering. The so-called topicmodels have become popular in the context of soft clustering. Wedeal with them in Chapter 8.Social Networks is an important application area related to PRand ML. Most of the earlier work has dealt with the structuralaspects of the social networks which is based on their link structure.Currently there is interest in using the text associated with the nodesin the social networks also along with the link information. We dealwith this application in Chapter 9.This book deals with the material at an early graduate level.Beginners are encouraged to read our introductory book Patternrecognition: An Algorithmic Approach published by Springer in 2011before reading this book.M. Narasimha MurtyV. Susheela DeviBangalore, India

May 2, 201314:6BC: 8831 - Probability and Statistical TheoryThis page intentionally left blankPST ws

April 8, 201512:57Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-ch01Chapter 1IntroductionThis book deals with machine learning (ML) and pattern recognition(PR). Even though humans can deal with both physical objects andabstract notions in day-to-day activities while making decisions invarious situations, it is not possible for the computer to handle themdirectly. For example, in order to discriminate between a chair anda pen, using a machine, we cannot directly deal with the physicalobjects; we abstract these objects and store the corresponding representations on the machine. For example, we may represent theseobjects using features like height, weight, cost, and color. We willnot be able to reproduce the physical objects from the respectiverepresentations. So, we deal with the representations of the patterns,not the patterns themselves. It is not uncommon to call both thepatterns and their representations as patterns in the literature.So, the input to the machine learning or pattern recognition system is abstractions of the input patterns/data. The output of thesystem is also one or more abstractions. We explain this processusing the tasks of pattern recognition and machine learning. In pattern recognition there are two primary tasks:1. Classification: This problem may be defined as follows: There are C classes; these are Class1 , Class2, . . . , ClassC . Given a set Di of patterns from Classi for i 1, 2, . . . , C.D D1 D2 . . . DC . D is called the training set and members of D are called labeled patterns because each patternhas a class label associated with it. If each pattern Xj D is1page 1

April 8, 2015212:57Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-ch01Introduction to Pattern Recognition and Machine Learningd-dimensional, then we say that the patterns are d-dimensionalor the set D is d-dimensional or equivalently the patterns liein a d-dimensional space. A classification model Mc is learnt using the training patternsin D. Given an unlabeled pattern X, assign an appropriate class labelto X with the help of Mc .It may be viewed as assigning a class label to an unlabeled pattern.For example, if there is a set of documents, Dp , from politics classand another set of documents, Ds , from sports, then classificationinvolves assigning an unlabeled document d a label; equivalentlyassign d to one of two classes, politics or sports, using a classifierlearnt from Dp Ds .There could be some more details associated with the definitiongiven above. They are A pattern Xj may belong to one or more classes. For example, adocument could be dealing with both sports and politics. In sucha case we have multiple labels associated with each pattern. In therest of the book we assume that a pattern has only one class labelassociated. It is possible to view the training data as a matrix D of size n dwhere the number of training patterns is n and each pattern isd-dimensional. This view permits us to treat D both as a set andas a pattern matrix. In addition to d features used to representeach pattern, we have the class label for each pattern which couldbe viewed as the (d 1)th feature. So, a labeled set of n patterns could be viewed as {(X1 , C 1 ), (X2 , C 2 ), . . . , (Xn , C n )} whereC i {Class1 , Class2 , . . . , ClassC } for i 1, 2, . . . , n. Also, thedata matrix could be viewed as an n (d 1) matrix with the(d 1)th column having the class labels. We evaluate the classifier learnt using a separate set of patterns,called test set. Each of the m test patterns comes with a classlabel called the target label and is labeled using the classifier learntand this label assigned is the obtained label. A test pattern iscorrectly classified if the obtained label matches with the targetpage 2

April 8, 201512:57Introduction to Pattern Recognition and Machine Learning - 9in x 6inIntroductionb1904-ch01page 33label and is misclassified if they mismatch. If out of m patterns,mc are correctly classified then the % accuracy of the classifier is100 mc.m In order to build the classifier we use a subset of the trainingset, called the validation set which is kept aside. The classification model is learnt using the training set and the validation setis used as test set to tune the model or obtain the parametersassociated with the model. Even though there are a variety ofschemes for validation, K-fold cross-validation is popularly used.Here, the training set is divided into K equal parts and one of themis used as the validation set and the remaining K 1 parts form thetraining set. We repeat this process K times considering a differentpart as validation set each time and compute the accuracy on thevalidation data. So, we get K accuracies; typically we present thesample mean of these K accuracies as the overall accuracy and alsoshow the sample standard deviation along with the mean accuracy.An extreme case of validation is to consider n-fold cross-validationwhere the model is built using n 1 patterns and is validated usingthe remaining pattern.2. Clustering: Clustering is viewed as grouping a collection ofpatterns. Formally we may define the problem as follows: There is a set, D, of n patterns in a d-dimensional space.A generally projected view is that these patterns are unlabeled. Partition the set D into K blocks C1 , C2 , . . . , CK ; Ci is calledthe ith cluster. This means Ci Cj φ and Ci φ for i jand i, j {1, 2, . . . , K}. In classification an unlabeled pattern X is assigned to one ofC classes and in clustering a pattern X is assigned to one ofK clusters. A major difference is that classes have semanticclass labels associated with them and clusters have syntacticlabels. For example, politics and sports are semantic labels;we cannot arbitrarily relabel them. However, in the case ofclustering we can change the labels arbitrarily, but consistently.For example, if D is partitioned into two clusters C1 and C2 ;so the clustering of D is πD {C1 , C2 }. So, we can relabel C1

April 8, 2015412:57Introduction to Pattern Recognition and Machine Learning - 9in x 6inb1904-ch01Introduction to Pattern Recognition and Machine Learningas C2 and C2 as C1 consistently and have the same clustering(set {C1 , C2 }) because elements in a set are not ordered.Some of the possible variations are as follows: In a partition a pattern can belong to only one cluster. However,in soft clustering a pattern may belong to more than one cluster.There are applications that require soft clustering. Even though clustering is viewed conventionally as partitioning aset of unlabeled patterns, there are several applications where clustering of labeled patterns is useful. One application is in efficientclassification.We illustrate the pattern recognition tasks using the two-dimensionaldataset shown in Figure 1.1. There are nine points from classX labeled X1, X2, . . . , X9 and 10 points from class O labeledO1, O2, . . . , O10. It is possible to cluster patterns in each class separately. One such grouping is shown in Figure 1.1. The Xs are clustered into two groups and the Os are also clustered into two groups;there is no requirement that there be equal number of clusters ineach class in general. Also we can deal with more than two classes.Different algorithms might generate different clusterings of each class.Here, we are using the class labels to cluster the patterns as we areclustering patterns in each class separately. Further we can representF2X8 X9X6X7t2bX5X4X3X1 X2t1O9O10O8O7O6O5O1 O3O4O2aFigure 1.1.Classification and clustering.F1page 4

April 8, 201512:57Introduction to Pattern Recognition and Machine Learning - 9in x 6inIntroductionb1904-ch01page 55each cluster by its centroid, medoid, or median which helps in datacompression; it is sometimes adequate to use the cluster representatives as training data so as to reduce the training effort in terms ofboth space and time. We discuss a variety of algorithms for clusteringdata in later chapters.1. Classifiers: An IntroductionIn order to get a feel for classification we use the same data pointsshown in Figure 1.1. We also considered two test points labeled t1and t2 . We briefly illustrate some of the p

The "IISc Lecture Notes Series" will consist of books that are reasonably self-contained and can be used either as textbooks or for self-study at the postgraduate level in science and engineering. The books will be based on material that has been class-tested for most part. Editorial Board for the IISc Lecture Notes Series (ILNS):

Related Documents:

18-794 Pattern Recognition Theory! Speech recognition! Optical character recognition (OCR)! Fingerprint recognition! Face recognition! Automatic target recognition! Biomedical image analysis Objective: To provide the background and techniques needed for pattern classification For advanced UG and starting graduate students Example Applications:

2E1395 - Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification Preface This document1 is a solution manual for selected exercises from "Introduction to Pattern Recog-nition" by Arne Leijon. The notation followed in the text book will be fully respected here. A

the pattern recognition alignment model. For some patterns, Edge mode over-smooths and blur s the pattern line edge(s), degrading pattern recognition performance. In this case, Enhanced Edge maintains the sharp edge lines and provides a robust solution for pattern recognition. Figure 6 shows an example where Enhanced Edge will perform

Pattern Recognition, which can be found on the web as a pdf. This text contains a solid introduction to pattern recognition beyond just neural nets, especially the underlying statistical foundation. The text covers traditional pattern recognition, probability density estimation, single and multiple layer networks.

Pattern Recognition 9 Given an input pattern, make a decision about the "category" or "class" of the pattern Pattern recognition is a very broad subject with many applications In this course we will study a variety of techniques to solve P.R. problems and discuss their relative strengths and weaknesses

The Ultimate Guide to Employee Rewards & Recognition v1.0. Table of contents INTRODUCTION 3 EVOLVING ROLE OF HR 5 REWARDS VS RECOGNITION 8 BENEFITS OF REWARDS AND RECOGNITION 10 TECHNOLOGY IN REWARDS AND RECOGNITION 15 A CULTURE OF PEER TO PEER RECOGNITION 26 SELECTING A REWARDS AND RECOGNITION PLATFORM 30

of pattern recognition problems is illustrated by examples. A tutorial survey of techniques for using contextual information in pattern recognition is presented. Emphasis is placed on the problems of image classification and text recognition, where the text is in the form of machine and handprinted characters, cursive script, and speech.

Multi-View Recognition argmin {.90 0} Fall 2004 Pattern Recognition for Vision . Example Application . Last game sequence “Flap” “Spin” Fall 2004 Pattern Recognition for Vision . . α matrix 2. β matrix 3. 4. Normalize columns » NT2 » NT2 NT » NT2