Weighted Model-Based Clustering For Remote Sensing Image .

3y ago
32 Views
2 Downloads
3.02 MB
36 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Mara Blakely
Transcription

Weighted Model-Based Clustering forRemote Sensing Image AnalysisJoseph W. RichardsDepartment of StatisticsCarnegie Mellon UniversityPittsburgh, PA 15213(jwrichar@stat.cmu.edu)Johanna HardinDepartment of MathematicsPomona CollegeClaremont, CA 91711(jo.hardin@pomona.edu)Eric B. GrosfilsDepartment of GeologyPomona CollegeClaremont, CA 91711(egrosfils@pomona.edu)1

AbstractWe introduce a weighted method of clustering the individualunits of a segmented image. Specifically, we analyze geologicmaps generated from quantitative analysis of remote sensing images, and provide geologists with a powerful method to numerically test the consistency of a mapping with the entire multidimensional dataset of that region. Our weighted model-basedclustering method (WMBC) employs a weighted likelihood andassigns fixed weights to each unit corresponding to the numberof pixels located within the unit. WMBC characterizes eachunit by the means and standard deviations of the pixels withineach unit, and uses the Expectation-Maximization (EM) algorithm with a weighted likelihood function to cluster the units.With both simulated and real data sets, we show that WMBCis more accurate than standard model-based clustering.KEY WORDS: Weighted likelihood; Mixture model; EM algorithm; Geologic map.1INTRODUCTIONAs advancements in technology increase our ability to collect massive data sets, statisticians are in constant pursuit ofefficient and effective methods to analyze large amounts of information. There is no better example of this than in the studyof multi- and hyperspectral images that commonly contain millions of pixels. Powerful clustering methods that automatically2

classify pixels into groups are in high-demand in the scientificcommunity. Image analysis via clustering has been used successfully with problems in a variety of fields, including tissueclassification in biomedical images, unsupervised texture imagesegmentation, analysis of images from molecular spectroscopy,and detection of surface defects in manufactured products (seeFraley and Raftery (1998) for more references).Model-based clustering (Banfield and Raftery 1993; Fraleyand Raftery 2002) has demonstrated very good performance inimage analysis (Campbell, Fraley, Murtagh, and Raftery 1997;Wehrens, Buydens, Fraley, and Raftery 2004). Model-basedclustering uses the Expectation-Maximization (EM) algorithmto fit a mixture of multivariate normal distributions to a dataset by maximum likelihood estimation. A combination of initialization via model-based hierarchical clustering and iterativerelocation using the EM algorithm has been shown to produceaccurate and stable clusters in a variety of disciplines (Banfieldand Raftery 1993).In this paper, we examine the case where manual partitioning of the image has been performed prior to attempts to classify each resulting partition. This situation often arises in theanalysis of remote sensing data where geologic maps, divisionsof regions of land into units, are created by geologists based onanalysis of radar and physical property images (see USGS 2005).In these examples, although the regions are already subdividedinto disjoint material units, our goal as statisticians is to allocate3

the units into groups defined by the quantitative pixel measurements. Clustering the numeric pixel values permits us to quantitatively evaluate the (usually qualitative) work performed bythe geologists, and gives geologists a powerful method to numerically validate their work, compare different geologic mapsof the same region, and test the consistency of the defined material units with respect to the entire available multi-dimensionaldataset.A geologic map is meant to convey the mapmaker’s interpretation of the region depicted. If multiple geologists map thesame area and then compare their results, it is likely that somepercentage of their boundaries and unit definitions will be veryclosely matched, while other areas will bear little resemblancefrom one map to the next. To improve the mapping processand enhance what can be learned from the maps that are generated, it is necessary to develop new approaches that can beused to evaluate whether material units, defined qualitatively onthe basis of geological criteria within a given region, also haverobust, self-similar quantitative properties that can be used tocharacterize the nature of the surface more completely. Thisis particularly critical for maps generated on the basis of radardata interpretation, as the quantitative properties recorded bythe data depend strongly upon the sub-pixel scale physical characteristics of the planet’s surface.The thesis of our paper is that by using the means and standard deviations of the pixel values within each unit of a seg-4

mented image, one obtains accurate clustering results from amodel-based clustering likelihood that weights each unit by thenumber of pixels contained within the unit. Using the meansand standard deviations of the pixel values simultaneously reduces the size of our data set (from millions of pixels to a fewhundreds of groups) and gives information about the centraltendencies and variability of the pixels in a unit. Geologically,this combination can yield important quantitative insight intothe properties of the surface. For instance, in topography dataa smooth, flat plains unit and a highly deformed unit may lie atthe same mean elevation, but the high standard deviation for thedeformed unit provides a quantitative way to assess the amountand pervasiveness of deformation which has occurred. Similarly,in backscatter data a uniform, flat plains unit formed by regionalflooding by lavas may share a mean value with a heavily mottledplains unit formed by overlapping deposits erupted from thousands of small volcanoes but will have distinct variances. Inthis paper, we show that our weighted clustering method highlyoutperforms an analogous non-weighted method and generallyyields better results than a technique that downweights outliersbased on distances (Markatou, Basu, and Lindsay 1998).In Section 2, we briefly describe model-based clustering andthe weighted likelihood function and integrate the two into aweighted model-based clustering method. In Section 3, we design and perform simulations to compare our weighted modelbased clustering technique to other model-based clustering tech-5

niques in a variety of situations. In Section 4, we apply our technique to a real remote sensing data set. Finally, we concludewith a few comments in Section 5.2WEIGHTED MODEL-BASEDCLUSTERING (WMBC)In standard model-based clustering, multivariate observations(x1 , . . . , xn ) are assumed to come from a mixture of G multivariate normal distributions with densityf (x) GXτk φ(x µk , Σk ),(1)k 1where the τk ’s are the strictly-positive mixing proportions of themodel that sum to unity and φ(x µ, Σ) denotes the multivariatenormal density with mean vector µ and covariance matrix Σevaluated at x.The general framework for the geometric constraints acrossclusters was proposed by Banfield and Raftery (1993) throughthe eigenvalue decomposition of the covariance matrix in theformΣk λk Dk Ak DkT ,(2)where Dk is an orthogonal matrix of eigenvectors, Ak is a diagonal matrix whose entries are proportional to the eigenvalues,and λk is a constant that describes the volume of cluster k.These parameters are treated as independent and can either beconstrained to be the same for each cluster or allowed to varyacross clusters. For example, the model Σk λk Dk ADkT (de6

noted VEV) assumes varying volumes, equal shapes, and varying orientations for each cluster. The completely unconstrainedmodel is denoted VVV. For a thorough discussion of these andother models and the MLE derivation for Σ, see Celeux andGovaert (1995).Starting with some initial partition of the n units into Ggroups, we use the Expectation-Maximization (EM) algorithm(Dempster, Laird, and Rubin 1977; McLachlan and Krishnan1997) to update our partition such that the parameter estimatesof the clusters maximize the mixture likelihood. Hierarchical agglomeration has been used successfully to obtain an initial partition (Banfield and Raftery 1993) . The EM algorithm iteratesbetween an M-step and an E-step. The M-step calculates thecluster parameters µ, Σ and τ using the maximum likelihoodestimates (MLEs) of the complete-data loglikelihood,l(µ, Σ, τ x, z) n XGXzik [log(τk φ(xi µk , Σk ))](3)i 1 k 1based on the current allocation of the units into groups, z. TheseMLEs arePnẑik xi,i 1 ẑikPnẑikτ̂k i 1 ,nµ̂k Pi 1n(4)(5)and a model-dependent estimate of Σ̂k (Celeux and Govaert1995). The E-step calculates the conditional probability that aunit xi comes from the k th group using the equationτ̂k φ(xi µ̂k , Σ̂k )ẑik PG,j 1 τ̂j φ(xi µ̂j , Σ̂j )7(6)

based on the current cluster parameters. The M-E iterationcontinues until the value of the loglikelihood function converges.In standard model-based clustering (SMBC), each data pointis given equal importance in the model. However, there aresituations in which some data points are more accurately measured than others, and therefore deserve higher weight in themodel. For example, in segmented pixelated data, those unitswith more pixels will have means and standard deviations thatbetter approximate the true parameters of the underlying distribution. In SMBC, the ability of data point xi to determinethe parameters of cluster k only depends on zik , the posteriorprobability that the unit belongs to that group. To give unitsunequal weights, we introduce the weighted likelihood (WL)(Newton and Raftery 1994; Markatou et al. 1998; Agostinelliand Markatou 2001), where each data point receives a fixedweight, wi (0, 1] based on the number of pixels located insidethe unit, where higher weights give more influence in estimatingthe parameters. In general, the WL function for n independentdata points isL̃(θ) nYfi (xi θ)wi ,(7)i 1where fi is the density function for point xi and θ is a set of parameters. The weighted maximum likelihood estimator (WLE)has been shown to be consistent and asymptotically normal under fixed weights (Wang, van Eeden, and Zidek 2004).The weighted mixture model loglikelihood equation (Marka-8

tou 2000) is l(µ, Σ, τ x, z) n XGXwi zik [log(τk φ(xi µk , Σk ))],(8)i 1 k 1whose only difference from (3) is the additional weights, wi .As in SMBC, weighted model-based clustering (WMBC) beginswith some partition of the data points and proceeds to the Mstep, where the WLEs are computed. For each k 1, . . . , G,the WLE for µk isPnwi ẑik xi,i 1 wi ẑikµ̂k Pi 1n(9)compared to the MLE for µk , (4). Similarly, the WLE for themixing proportion τk isPnwi ẑik,i 1 wii 1τ̂k Pn(10)compared to the MLE for τk , (5), while the WLE of the covariance matrix depends on the model selected. The E-step usesthese estimates exactly as in the standard E-step (6), and thealgorithm continues until the weighted loglikelihood (8) converges.33.1SIMULATED DATASimulation DesignBefore using our WMBC technique to cluster real data sets,we first use simulated data to compare the accuracy of WMBCclusters to those of other model-based clustering techniques ina variety of situations. In each simulation, we generate several9

units, where each unit consists of a random number of pixelsgenerated from a uniform [500,50000] distribution and each pixelis assigned a value from a predefined bivariate normal distribution.We are justified in simulating the pixel values with a normaldistribution (when in actuality pixel values need not be distributed normally) because the data summaries we use in themixture likelihood are the means and standard deviations ofthese pixels. Regardless of the distribution of the pixel values,their mean is asymptotically normally distributed by the Central Limit Theorem, and by a combination of Slutsky’s Theorem, the Central Limit Theorem, and the Delta Method, theirstandard deviation is also asymptotically normally distributed.Therefore, no matter the distribution of the pixel values, a multivariate normal mixture model is appropriate for modeling thesummary statistics used in clustering the units.We simulate units from different bivariate normal distributions corresponding to different groups. Since we are simulatingthe data, we know from which distribution (population) eachdata point is generated. Therefore we can compare differentclustering techniques by comparing the number of points thatare correctly-classified in each. Throughout this section we assume that the number of groups is known, and we initialize theclusters with unsupervised model-based hierarchical classification. We use the covariance model VEV described in Section2.10

3.2Two Cluster SimulationsIn this section, we compare WMBC to SMBC for situationswhere there are two groups (i.e. unit types). In each trial wesimulate 100 units from each of two bivariate normal distributions. These distributions have parameters x , Σ1 µ1 5 4 µ2 5 , Σ2 180 r1 180 170170 r2 170 160 r1 180 170 170 r2 170 160 160 where r1 and r2 are independent, random (uniform on -1 to 1)correlations, and x takes on each of 21 values ranging from 2 to4, in steps of 0.1. For each of these 21 spacings of the meansof the two groups, we generate 1000 data sets and cluster eachone using both the weighted and standard model. Because wecluster each data set with both WMBC and SMBC, we candirectly compare the two techniques for a variety of situations(ranging from widely spaced to heavily overlapping clusters).Results show that WMBC is more accurate for each separation of the means of the two groups, and is far superior thanSMBC when the groups are closer together. Table 1 revealsthat for each separation in the two groups, the average numberof correct classifications for WMBC is greater than the averagenumber of correct classifications for SMBC, and each differenceis significant at the 0.0001 level using both a paired t-test anda non-parametric paired Wilcoxon test. Figure 1 shows that11

for each of the 21 separations of the group means, WMBC produces a more accurate clustering than SMBC in a higher proportion of data sets than vice versa. When cluster means areclose together, WMBC is highly superior, averaging more than4.5 more correctly-classified units per data set and better clusterings in over 75% of simulations. When clusters are widelyspaced, WMBC is also significantly better but loses much ofits superiority because the majority of simulations result in tiesbetween WMBC and SMBC.WMBC performs better than SMBC because it is not easily distracted by outlying data points. Outliers generally comefrom data generated from a small number of pixels, and thusare downweighted by WMBC, and largely ignored by the clusters. In SMBC, however, clusters react more strongly to outliers, growing in volume and subsequently claiming points thatbelong to other groups. When clusters are close or overlapping,outliers can cause a cluster to grow to encompass a large partof another cluster, producing a highly erroneous classification.In WMBC this is avoided because points with large weights aregenerated from many pixels, and thus are extremely likely to benear the true cluster center. When clusters are widely spaced,the advantage enjoyed by WMBC is somewhat lost, as clustersare less likely to grow so much as to claim data points belongingto another cluster.Next, using the same simulation model described above, wesimulate clusters of several different sizes to show that WMBC12

is superior to the SMBC under varied conditions. To simplifyour results, instead of considering all 21 spacings of the clustersas we did above, we will only look at three: widely spaced (separation of means of 1.5), intermediately spaced (separation of0.7), and overlapping (separation of 0.1).When there are an equal number of units in each group,WMBC produces more accurate classifications than SMBC foreach of several group sizes (Table 2). For each separation in thecenters of the groups, a much higher percentage of the simulations result in more accurate clusters by the WMBC method.The average number of correct classifications is higher for theweighted method in each simulation and for all but the smallestgroup size (10) is significant at the 0.0001 level using a pairedWilcoxon test. Again, WMBC performs best when the clustercenters are very close together.When the groups have an unequal number of units, we againobserve that WMBC outperforms SMBC(Table 3). In eachsimulation, we randomly assigned which group had more datapoints. The mean number of correct classifications was greaterfor the weighted method in every situation, with larger discrepancies when the clusters overlapped, and each was significant atthe 0.0001 level.3.3Distance WeightsA weighted-likelihood model that downweights observations inconsistent with the model (outliers) was introduced by Marka-13

tou et al. (1998). They introduce weights based on the Pearsonresidual, δ, where the weights are defined asw(δ) 1 δ2.(δ 2)2(11)The weights take on values on the interval [0,1], with smallerweights corresponding to data points with high Pearson residuals. For a thorough discussion of the construction of the weightequation, see Markatou et al. (1998).We compare a clustering method that weights based on Mahalanobis distance (DW) using (11) to our pixel-weighting technique (PW). Like the DW technique, PW downweights outliers,since any point that is an outlier is likely to come from a unitwith a small number of pixels. Hence, we postulate that thesetwo methods will produce similar results.Results in Table 4 show that relative performances of thetwo methods are dependent on the amount of separation in theclusters. When the clusters are widely spaced, DW tends todo better: in 5 of the 6 simulations DW had a higher averagenumber of correct classifications than PW. However, only oneof these simulations yielded a significant result at the 0.1 level(simulation with 2 groups of 20 units each). Additionally, over96% of the simulations resulted in ties in each widely-spacedcomparison. When the clusters are intermediately-spaced, PWoutperformed DW in 5 of the 6 simulations, and produced significant differences at the 0.05 level in each of these five. Whenthe clusters were closely spaced, PW outperformed DW in allsix simulations, with significant differences in 5 of the 6 at the14

0.0001 level.Overall, PW outperformed DW: in 10 of our simulationsPW yielded significantly better results (at the 0.05 level) ascompared to only 2 simulations where DW significantly outperformed PW. Relative advantage in PW depends largely on thespacing in the clusters. Highly-spaced clusters produce insignificant advantages for DW, while closer clusters give significantand highly-significant advantages to PW. There was one anomalous situation, where the two group sizes were 20 and 20, inwhich DW consistently performed consistently better than PW.A critical drawback to DW is that it requires many more iterations to converge. In 100 simulations, it took PW an averageof 7.49 iterations to converge and DW an average of 18.68 iterations. Also, because the weights in DW are based on theMahalanobis distance from each data point to the center ofits cluster, these values continually change as points are reallocated and covariance matrices change and thus have to berecalculated, causing each iteration to take longer. The changing weights also account for the difficulty of the algorithm toconverge. For example, if a point is reallocated, it will cause itsnew cluster to stretch somewhat in its direction, subsequentlycausing the point’s Mahalanobis distance to decrease and itsweight to rise. On the next iteration, the point’s higher weightwill cause the cluster to stretch even more and the pattern tocontinue, resulting in clusters that are more unstable and lessaccurate than those produced by the fixed-weight

Remote Sensing Image Analysis Joseph W. Richards Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 (jwrichar@stat.cmu.edu) Johanna Hardin Department of Mathematics Pomona College Claremont, CA 91711 (jo.hardin@pomona.edu) Eric B. Grosfils Department of Geology Pomona College Claremont, CA 91711 (egrosfils@pomona.edu) 1. Abstract We introduce a weighted method of .

Related Documents:

Caiado, J., Maharaj, E. A., and D’Urso, P. (2015) Time series clustering. In: Handbook of cluster analysis. Chapman and Hall/CRC. Andrés M. Alonso Time series clustering. Introduction Time series clustering by features Model based time series clustering Time series clustering by dependence Introduction to clustering

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

6. A sample social network graph 7. Influence factor on for information query 8. IF calculation using network data 9. Functional component of clustering 10. Schema design for clustering 11. Sample output of Twitter accounts crawler 12. Flow diagram of the system 13. Clustering of tweets based on tweet data 14. Clustering of users based on .

Data mining, Algorithm, Clustering. Abstract. Data mining is a hot research direction in information industry recently, and clustering analysis is the core technology of data mining. Based on the concept of data mining and clustering, this paper summarizes and compares the research status and progress of the five traditional clustering