Expert Systems With Applications - Stanford University

1y ago

25 Views

2 Downloads

2.10 MB

8 Pages

Last View : 26d ago

Last Download : 3m ago

Upload by : Brady Himes

Report this link

Download PDF

Transcription

Expert Systems with Applications 37 (2010) 3318–3325Contents lists available at ScienceDirectExpert Systems with Applicationsjournal homepage: www.elsevier.com/locate/eswaA non-parametric heuristic algorithm for convex and non-convex dataclustering based on equipotential surfacesFarhad Bayat a,*, Ehsan Adeli Mosabbeb b, Ali Akbar Jalali a, Farshad Bayat caElectrical Engineering Department, Iran University of Science and Technology, Tehran, IranComputer Engineering Department, Iran University of Science and Technology, Tehran, IrancComputer Engineering Department, Azad University of Zanjan, Zanjan, Iranba r t i c l ei n f xPotential functionsUnsupervised and point locationa b s t r a c tIn this paper, using the concepts of ﬁeld theory and potential functions a sub-optimal non-parametricalgorithm for clustering of convex and non-convex data is proposed. For this purpose, equipotential surfaces, created by interaction of the potential functions, are applied. Equipotential surfaces are the geometric location of the points in the space on which the potential is constant. It means all points ineach surface were affected the same by the ﬁeld. Regarding this concept and other characteristics of equipotential surfaces, the outcome of this method will be an optimal solution for the clustering problem. Butwith regard to the existence of several parameters requiring to be set in the algorithm, ﬁnding the globaloptimal solution leads to a high computational complexity and therefore is not practical. Thus by applying some considerations and approximations, the resulting outcome will be a sub-optimal solution, whileappropriate setting of the parameters causes the result to be closer to the global optimal solution. Theadvantage of this method is that it does not need any external parameter setting, such as number of clusters. To this end, an automatic parameter setting algorithm is suggested based on an optimal clusteringindex. Simulation results for a number of standard datasets, illustrate the superb performance of thismethod, especially for non-convexly scattered data. All mentioned characteristics of this method arewidely demanded in different scientiﬁc areas. In this case it has been utilized in the well-known PointLocation Problem (PLP) to reduce computational complexity.Ó 2009 Elsevier Ltd. All rights reserved.1. IntroductionClustering, as an unsupervised pattern classiﬁcation method,has an important role in data analysis and reﬁnement. Classiﬁcation, in general, has a wide domain of applications. It includes allfrom biology and biomedical imaging sciences to military andarcheology applications. According to the sensitiveness and limitations of some mentioned applications, such as medical and militaryapplications, data with certain classes for use in supervised classiﬁcation algorithms, are not simply available. Therefore, obtainingpowerful and reliable unsupervised classiﬁcation algorithms arevery important. So in the recent decades, clustering problem as atool for pattern analysis, classiﬁcation, decision making and information extraction and retrieval, has attracted the attention ofmany researchers. Several approaches and points of view are presented in the literature. Each of these approaches is based on a certain criterion, and has its own advantages and disadvantages. In* Corresponding author. Tel.: 98 2177240487; fax: 98 2177240486.E-mail addresses: fbayat@iust.ac.ir (F. Bayat), eadeli@iust.ac.ir (E.A. Mosabbeb),ajalali@iust.ac.ir (A.A. Jalali), farshad.bayat@gmail.com (F. Bayat).0957-4174/ - see front matter Ó 2009 Elsevier Ltd. All rights reserved.doi:10.1016/j.eswa.2009.10.019general, a comprehensive method and criterion for optimal clustering of any kind of data does not exist.In Dubes et al. (1976), the comparison of different clusteringalgorithms was done using the criteria presented in Fisher et al.(1971). A review of the results of applying some existing limitations on data sources to enhance the clustering process performance was done by Titterington et al. (1985). Limitations appliedin this method are based on the combination of an unknown number of probability density functions with multivariable Gaussianfunctions, in which clustering tries to extract the probability density functions and their parameters. The development of clusteringapplications is presented for pattern recognition by Anderberg(1973), image processing by Jain et al. (1996) and information retrieval by Salton (1991) and Rasmussen (1992).General requirements of a data clustering system are: scalability,ability to recognize different shape, size and density clusters, robustness versus noise and disturbance, least number of input parameters, etc. Based on these criteria, efﬁciency and performance ofclustering algorithms are determined. As a result, old hierarchicalclustering algorithms are of very high computational complexityand qualitatively weak (George & et al., 1999). For instance, Complete-Link method is biased for spherical clusters and Single-Link

3319F. Bayat et al. / Expert Systems with Applications 37 (2010) 3318–3325undertakes chaining (Oyang, 2001), while newer clustering techniques, combining hierarchical and partitioning methods (Gan,2003; Gan et al., 2003), result in more quality and less computationalcomplexity. A hybrid genetic fuzzy k-modes algorithm has been proposed by Gan, Wu, and Yang (2009). They have optimized fuzzy kmodes clustering algorithm using GA, to avoid being stuck in localoptima of the k-modes clustering. Hsu and Huang (2008) have presented a learning based algorithm to cluster data with either categorical or numeric values. They have used a modiﬁed version of anadaptive resonance theory (ART) unsupervised neural network forthis purpose. Wang and et al. (2009) have also presented a clusteringalgorithm based on the extension theory and genetic algorithm, EGA.One of the weak sides of most clustering algorithms is the challenging case of non-convexly scattered data and the discussion ofhow each algorithm behaves in such a circumstance.Different metrics for clustering analysis have been proposed indifferent works, such as entropy, purity and mutual information.Park and Jun (2009) have proposed a k-means-like algorithm,known as k-medoid, which is efﬁcient in the complexity point ofview. The authors have proposed a Random Index to evaluate theperformance of the clustering. Wei, Lee, and Hsu (2003) have donean empirical comparison of some partition-based clustering techniques. They have introduced some characteristics of the data suchas data size, number of clusters, cluster distinctness, cluster asymmetry and data randomness. They have analyzed the effects ofchanges in each of these parameters in the clustering results. AlsoWu et al. (2009) have introduced some cluster validation measuresto be used to evaluate k-means clusters. Normalized variation ofinformation (VI), van Dongen criterion (VD) and Mirkin metric(M) are the measures used as cluster quality quantiﬁers. Theyshowed that using these metrics can avoid bias in the clusteringprocess. All these metrics and cluster quality measure used inthese works are for convex data sets. Some studies (Mitra, Pal, &Siddiqi, 2003; Pal, Ghosh, & Uma Shankar, 2000) have introducedmetrics also used for non-convex data clusters. Such a same metricis used in this paper and will be explained more, later.In this paper, using the idea of potential functions and equipotential surfaces arising from ﬁelds’ interaction a clustering methodfor both convex and non-convex data is presented. In this method,a potential function is assigned to each data sample. Then, by theﬁelds’ interactions in feature space and extraction of the equipotential surfaces the clustering procedure can be conducted. It isvery important to know that most of the unsupervised classiﬁcation techniques are based on the degree of similarity in data sample feature vector, such that members of a data class generallyhave the most similarity. Regarding this and the fundamental concepts of ﬁelds’ theory in physic, applying the equipotential surfacesas the clusters discriminant boundaries the optimal solution wouldbe achieved. This is because the equipotential surfaces introducethe geometric locations in the space on which all points have thesame average membership or similarity to the class inside and outside the boundary. Thus, by proper setting of the reference potential level as the decision boundary, optimal solution could beobtained. Finally, using cluster measures a simple algorithm is presented which can set the reference potential level automatically.This approach determines the number of clusters itself. So, noparameters are left to be set externally. If one wants to determinethe number of the clusters, he/she can change the reference potential by trial and error. Moreover, the proposed method enables usto construct a hierarchical non-parametric data clustering whichis widely demanded in different areas. As an application, thismethod has been used to reduce computational complexity ofthe Point Location Problem (PLP) which is the most time consuming part in the Explicit Model Predictive Control (Bayat et al., 2009).This paper is organized as follows: in Section 2 concepts andsome deﬁnitions used in the subsequent sections are brought up.Then, in Section 3 potential functions are presented. After thatusing the potential functions concept, the clustering algorithm isdemonstrated in Section 4. Finally, Section 5 illustrates someexamples pondering the performance of the proposed algorithm.2. Mathematical background and deﬁnitionsAs described above, the method presented in this paper is basedon the concept of ﬁeld, which is one of the basics in Physics and hasa vast number of applications. Different kinds of ﬁelds on Physicsinclude magnetic, electric, gravity, and nucleus power ﬁelds.Although each of these instances has own different deﬁnitions,the common concept relating them is that instead of studyingthe mutual interaction between components (electric particles,for example), we can use the inﬂuence of the ﬁeld on the components in that working set. In what follows, we utilize this simpleconcept to extract an optimal method for data clustering. The following deﬁnitions are evident and used in the algorithm deﬁnition.Deﬁnition 1. Space ðD; k kÞ in linear vectors set D 2 Rn and realfunction k k : D ! Rþ are called a norm space if all followingconditions fulﬁll:(i)(ii)(iii)(iv)kXk 0; 8X 2 D,kXk ¼ 0 () X ¼ 0; 8X 2 D,kaXk ¼ jaj kXk; 8X 2 D; a 2 R,kX þ Yk kXk þ kYk; 8X; Y 2 D.andDeﬁnition 2. Assume i ¼ 1; . . . ; N; j ¼ 1; . . . ; n; D 2 RnD ¼ fXi 2 Rn jXi ¼ ðxi1 ; . . . ; xin Þ; xij 2 Rg, then the norm spaceðD; k k2 Þ is called a limited norm space relative to k k2 if and onlyif there exists a scalar 0 h 1 such that:kXi k2 h for all Xi 2 D where kXi k22 ¼nXx2ij :j¼1Deﬁnition 3. Scalar function VðXÞ : Rn ! R is called a potentialfunction if:(i) VðXÞ is a continuous smooth function in the given limitednorm space (later we will ﬁnd out that this space is in factthe feature vector space).(ii) VðXÞ is isotopic, i.e., it has symmetric behavior and characteristics in all dimensions.(iii) If V i ðXÞ is the potential function for component Xi, increasingkX Xi k2 should cause V i ðXÞ to decrease and V i ðXÞ ! 0 foreach kX Xi k2 ! 1.In what follows, assuming that the space regarding the featurevector in the clustering problem is a limited norm space, clusteringalgorithm based on the potential function could be extracted.3. Establishing the proper potential functionOnce more assume a vector space with limited norm for the feature vector:D ¼ fXi 2 Rn jXi ¼ ðxi1 ; . . . ; xin Þ; xij 2 Rg;j ¼ 1; . . . ; ni ¼ 1; . . . ; N;ð1Þnwhere Xi 2 R is the feature vector for the ith sample, and xij 2 R isthe jth feature in the ith feature vector. Also ‘N’ is the number ofpatterns and ‘n’ the number of features for each pattern.

3320F. Bayat et al. / Expert Systems with Applications 37 (2010) 3318–3325The potential function for the ith sample, considering the conditions in Deﬁnition 3 will be:2V i ðXÞ ¼ e BkX Xi k2 ;X ¼ ðx1 ; . . . ; xn Þ 2 feature spaceð2Þwhere B is a constant positive coefﬁcient, and Vi(X) is the potentialarising from sample Xi in location X from feature space. Therefore,Vi(X) could be considered as the membership value of pointX ¼ ðx1 ; . . . ; xn Þ from the feature space to the feature vector Xi, suchthat its value in X Xi will be the maximum value and increasingdistance kX Xi k2 causes the membership value to decrease. InFig. 1, a diagram of a potential function with n 2, Xi (0, 0) andB 1 is drawn.As far as clustering is basically putting similar data in a sameclass, the membership value of a sample point in feature spacecould be used as the similarity measure. To this end, the averagemembership function could be deﬁned as:NXe ðXÞ ¼ AVV i ðXÞN i¼1ð3Þe ðXÞ 1. FinalThe constant coefﬁcient ‘A’ is chosen such that 0 Vly, with regard to the potential function represented in 2 (adescending monotone function with a global maximum in X Xie ðXÞ will become a limited function with several local max(Fig. 1), Vimums. The location and circumstance of these local maximums aredetermined by the density and distribution of the data in the feature space.4. Clustering algorithmIn order to extract a clustering algorithm using the averagemembership function presented in Eq. (3), the concept of equipotential surfaces could be utilized. Equipotential surfaces are thegeometric locations of points in the space that have a same averagemembership value and are formed as closed islands around localmaximums. Thus, considering the reference potential level (Vref),number and shape of the islands differ. Points inside the area ofthe equipotential surface have a average potential value more thanVref, Island Boundary Potential (V(X) Vref). As a result, points inside each island share the most similarity within the feature spaceand therefore, each island could be considered as a cluster with itscentroid on the maximum potential location.e min to Ve max , theWith this deﬁnition, when Vref changes from Vnumber of islands (clusters) changes from 1 (including all samples)to ‘N’ (each sample a single cluster). In the following section, details and stages of the clustering algorithm are presented. First,Fig. 1. Diagram of a potential function with n 2, Xi (0, 0) and B 1.the algorithm is given for the special case of n 2 (feature vectorhas two elements for each sample), and then the algorithm is generalized for all cases.4.1. Stages of the clustering algorithmAs far as, the case n 2 is more comprehensible because of theease of its graphical representation, and also because the case n 2could give the solution in lots of practical applications, the clustering algorithm here is based on the case n 2 and then is developedfor general case.Stage 1. Choose dx1 and dx2, the search steps in feature space.Choosing these parameters should be done based on the expectedresolution and accuracy. One good initial guess for dxi is the half ofminimum distance between any two points in the ith axisdirection.Stage 2. Divide the feature space considering dx2, dx1 within theboundary x1min x1 x1max and x2min x2 x2max , as shown inFig. 2.ximax, ximin is the variation boundary for the ith element inthe feature space (ith feature) in space ‘D’, with assumption that‘D’ is a limited norm space for sure ximax, ximin exist. The ‘B’ coefﬁcient in Eq. (2) is a free tuning parameter. According to theauthors’ experience an empirical rule that generally issues a goodresult is:NﬃB ﬃﬃﬃﬃﬃQnni¼1 ðximax ximin Þð4ÞNote that Eq. (4) has a root in the geometrical density of spacepartitions.hihiAssuming q ¼ x2maxdx x2 2min ; p ¼ x1maxdx x1 1min , stage 2 is the same asgenerating a q p matrix in which element (i, j) introduces vector(x1i, x2j) from the feature space and its value equals the averagepotential in that point. This matrix is called the potential matrix(PMq p) and is determined using the following equation:e ðXÞPMði; jÞ ¼ Vð5Þwhere ði ¼ 1; . . . ; pÞ; ðj ¼ 1; . . . ; qÞ; X ¼ ði dx1 ; j dx2 Þ.Stage 3. Using the PM matrix and choosing an appropriate valuefor Vref in ½ 0 1 , matrix NPM (Normalized Potential Matrix) is deﬁned as:NPMði; jÞ ¼ signðPMði; jÞ V ref Þð6ÞFig. 2. The feature space division in the boundary x2min x2 x2max ; x1min x1 x1max .

3321F. Bayat et al. / Expert Systems with Applications 37 (2010) 3318–3325where sign( ) is deﬁned as:signðsÞ ¼ 1; s 00; s 0Stage 4. As an example assume that the resulting matrix fromstage 3 is as what is illustrated in Fig. 3.In such a case, elements of the NPM matrix having a value of 1are the points lying in the equipotential surfaces (for the chosenVref). Now for the clustering purpose the following step should betaken into account:andb ¼ ðx1r ; x2t Þ,with(i) Twodataa ¼ ðx1i ; x2j Þ,ði; r 2 ½1; . . . ; p Þ; ðj; t 2 ½1; . . . ; q Þ are members of a samecluster iff in the matrix NPM, there exists a path with nonzero elements form a to b.(ii) Suppose that there exists Nc different clusters in the Lth iteration, if data c (x1a, x2b) connected to any of the Nc isolatedclusters through a non-zero element in the matrix, ‘c’ is apart of that cluster, otherwise in the case that ‘c’ is not connected to any of the Nc clusters, ‘c’ is considered as a newcluster.(iii) Considering the (i) and (ii), in order to reduce the mass ofcomputation, we take in use this fact that there is a connection between all elements inside the each cluster. Therefore,it is sufﬁcient to consider a single element for each cluster asa representative and when checking whether or not to joindata ‘c’ to this cluster just a single path examination sufﬁces.Stage 5. After determining the location and number of clusters,if all the considered conditions and requirements are met, the optimal solution is obtained. Otherwise, we need to return to stage 3and change Vref to meet the required conditions.In order to set Vref, it is very important to take into account thatincreasing Vref from 0 to 1 yields to changing the number of clustersfrom 1 (all data samples as a single cluster) to ‘N’ (each data pointas a cluster for itself).4.2. Clustering algorithm: general caseExtending this algorithm for the general case is straightforward.To this end, we only need to replace the followings in the algorithmfor the case n 2:Stage 1: Choosing search steps dx1 ; . . . ; dxn , the same as the previous case.Stage 2: Dividing the feature space based on dxi ði ¼ 1; . . . ; nÞ ininterval ximin xi ximax .Stage 3: Assuming pi ¼ ðximax ximin Þ dxi ; i ¼ 1; . . . ; n andchoosing an appropriate Vref, matrixes PM and NPM with dimension p1 p2 pn , would be determined using:e ðXÞPMðr 1 ; r 2 ; . . . ; r n Þ ¼ VNPMðr 1 ; r2 ; . . . ; rn Þ ¼ signðPMðr 1 ; r 2 ; . . . ; r n Þ V ref ÞX ¼ ðr1 dx1 ; . . . ; r n dxn Þ;ð7Þð1 r i pi Þ; ði ¼ 1; . . . ; nÞStage 4: For data clustering aim, ﬁrst we search the NPM matrixfor an element with value 1, and choose it as the representative forthe ﬁrst cluster. Then assume that in the Lth iteration there exist Ncdifferent clusters, if the new data c ¼ ðx1 ; . . . ; xn Þ is connected toany of the Nc existing clusters through a non-zero path in theNPM matrix; ‘c’ will be joined to that cluster. Otherwise if ‘c’ isnot connected to any of the existing clusters’ representatives thenit must consider as a new cluster’s representative.Stage 5: After determining the location and number of clusters,if the prescribed requirements and conditions are met the optimalsolution is obtained. Otherwise, we need to return to the stage 3and increase or decrease Vref to meet the requirements.As far as the algorithm is straight forward, the computationalcomplexity of this algorithm could be simply computed. The algorithm could be summarized as ﬁlling the NPM matrix. In order todo that each dimension i is divided regarding its dxi. This dividesthe feature space to relatively small cells. The number of these cellsQis ni¼1 pi , as pis deﬁned earlier. Each of the N data points affect onthese cells. The interaction of the potential functions of each datapoint assigns a potential value to each cell, after which the NPMmatrix could be extracted. As a result, the computationalcomplex Qity of this proposed algorithm is of O N ni¼1 pi . This complexityis sound and as far as is no more than polynomial with respect to Nis of great interest.4.3. Setting Vref automaticallyThe algorithm’s only parameter which has to be set externally isthe reference potential. Pal et al. (2000) have used a cluster validation index b, which is the ratio of the total variation and withincluster variation. This type of measure is widely used as a good criterion for feature selection and cluster analysis (Mitra et al., 2003;Pal et al., 2000). b is deﬁned as:Pk Pnii¼1j¼1 ðX ijb ¼ Pk Pnii¼1j¼1 ðX ij XÞT ðX ij XÞ X i ÞT ðX ij X i Þð8Þwhere ni is the number of points in the ith cluster ði ¼ 1; 2; . . . ; kÞ, Xijis the feature vector of the jth pattern in cluster i ðj ¼ 1; 2; . . . ; ni Þ, X iis the mean of ni patterns of the ith cluster, n is the total number ofpatterns, and X is the mean value of the entire set of patterns.This cluster validation index could be taken into use to determine an acceptable value for Vref. A simple algorithm could doso: set the reference potential to an initial value and then try tooptimize that value using a local heuristic search, like simple greedy search or hill climbing to optimize the clustering process. Themore the value of b, the better the algorithm performs. For eachVref, number of clusters is evident and the b index could be calculated. The search algorithm changes the value of Vref and tries tomaximize b.5. Experimental resultsFig. 3. NPM matrix acquired from stage 3.In this section, some examples are illustrated to demonstratethe effectiveness and performance of the proposed algorithm.

3322F. Bayat et al. / Expert Systems with Applications 37 (2010) 3318–3325Fig. 4. Diagram of sample artiﬁcial data: xi1 ¼ ½2; 3; 4; 5; 7; 8; 1; 3; 1; 9; 8; 9; 7;7:5 ; xi2 ¼ ½6; 4; 7; 2; 9; 4; 3; 6; 5; 5; 2; 3; 9; 9 .Fig. 6. NPM matrix for V ref ¼ 0:04, and the resulting clusters.5.1. Example 1 (artiﬁcial data)Assume that the sample data illustrated in Fig. 4 is given.The potential functions are deﬁned as:V i ðXÞ ¼ expð BkX Xi k22 Þ ¼ exp B!2Xðxj xij Þ2 ;j¼1X ¼ ðx1 ; x2 Þ; Xi ¼ ðxi1 ; xi2 Þ; i ¼ 1; 2; . . . ; N ¼ 14From Eq. (4), we can conclude B 4. Finally, the potential functionfor A 5.221 is as:e ðXÞ ¼ 5:221V1414Xexpð 2kX Xi k22 Þj¼1As shown in Fig. 5a, average potential function has several differentlocal maximum points in the feature space (x1, x2). Fig. 5b shows thereference potential surface Vref 0.04, together with the potentialfunction itself. The shared boundaries between these two surfacesFig. 7. Iris data dispersion diagram.e ðXÞ; (b) comparisons of Ve ðXÞ and V ref ¼ 0:04.Fig. 5. Graphical representation of (a) normalized average potential function V

F. Bayat et al. / Expert Systems with Applications 37 (2010) 3318–3325Fig. 9. Cluster areas for V ref ¼ 0:1.Fig. 8. Comparing normalized average potential function with V ref ¼ 0:1.Fig. 10. (Arc cluster) Result of the clustering algorithm and representation of the generated cluster area for V ref ¼ 0:1.Fig. 11. (Two complex arc clusters) Result of applying the clustering algorithm for V ref ¼ 0:1.3323

3324F. Bayat et al. / Expert Systems with Applications 37 (2010) 3318–3325Fig. 12. (Geometric clusters) Result of applying the clustering algorithm and representation of the generated cluster area for V ref ¼ 0:05.are, in fact, the equipotential levels and the inner area of equipotential boundaries are the generated clusters for the considered reference potential. For the next stage, choosing dx1 dx2 0.2 ande ðXÞ yields to extracting the NPM matrixcomparing Vref 0.04 and Vfrom Eq. (6), as illustrated in Fig. 6. Finally, the data is clustered as:Cluster 1 ¼ fð1; 3Þ; ð1; 5Þ; ð2; 6Þ; ð3; 4Þ; ð3; 6Þ; ð4; 7ÞgCluster 2 ¼ fð5; 2Þg; Cluster 3 ¼ fð7; 9Þ; ð7:5; 9ÞgCluster 4 ¼ fð8; 2Þ; ð8; 4Þ; ð9; 3Þ; ð9; 5ÞgIf the acquired results do not satisfy the problem requirements, wedo the same as stage 5.5.2. Example 2 (real Iris data with two features)In this example, we use the Iris dataset which is frequently usedin many papers to illustrate the performance of clustering/classiﬁcation algorithms. Fig. 7 shows the diagram of 150 Iris data samples with two features.The potential function is considered as explained in Eqs. (2) and(3), and the coefﬁcient B is determined as B 41 using Eq. (4). Finally, the average potential functions for A 18.423 are as:NXe ðXÞ ¼ AVexpð 41kX Xi k22 ÞN j¼1¼1502X18:423 Xexp 41ðxj xij Þ2150 j¼1j¼1!After some repetitions, Vref 0.1 is assigned.The reference potential plane Vref 0.1 and the average potential function are shown in Fig. 8.After that by choosing dx1 0.09, dx2 0.06 and comparinge ðXÞ with regard to Eq. (6), the cluster areas are exVref 0.1 and Vtracted as illustrated in Fig. 9.5.3. More non-convex casesNow it is time to test the algorithm with more difﬁcult andhighly non-convex data. These data are particularly good performance measures for clustering algorithms evaluations.Applying different clustering algorithms to these data sets usually yields to different results. The results of applying the proposedFig. 13. (3D clusters) Result of applying the clustering algorithm and representation of the generated cluster area for V ref ¼ 0:05.clustering algorithm are illustrated in Figs. 10–13. The resultingoutcomes can prove the reliable, robust and optimal performanceof this algorithm especially in non-convex cases.6. Conclusion and future worksUnsupervised classiﬁcation algorithms are generally based onsimilarity measure of the feature vector; therefore the elementsforming a cluster are the most similar ones among the others. Onthis basis and relying on the potential functions and equipotentialsurfaces, an optimal clustering algorithm is presented that can bealso used for non-convex datasets or datasets with any kind of dispersion. In this method a potential function is assigned to any ofthe data samples, and then interacting the ﬁelds in the featurespace extracts the equipotential surfaces. Using these equipotential surfaces and the proposed algorithm the clustering processcould be accomplished. Taking advantage of these equipotentialboundaries, as discussed before, leads to ﬁnding the optimal clustering solution. This algorithm is not dependent to any externalparameter setting.

F. Bayat et al. / Expert Systems with Applications 37 (2010) 3318–3325The experimental results illustrate the desirable, robust performance of the proposed method in clustering fragmented, complex,non-convex data with different shape, size and densities. On theother hand, this method has a reasonable computational complexity comparing with other clustering methods. The main reason ofthe computational complexity reduction is ﬁguring the total potential function out once at the beginning and using in it the subsequent stages with no need of recalculations. This later feature ismore noticeable with increasing the number of data samples andthe dimensions of the feature vector.For the future work, in order to decrease the execution time ofthe algorithm some heuristics could be taken into account. Usingnon-uniform dividing of the feature space could be one. Insteadof using a same dx for a dimension in the feature space, cells withdifferent sizes could be used. The distribution of the data pointscan affect the choice of cell sizes. Or a simple pre-reﬁnement phasecan be added to reﬁne the data points, and for example merge veryclose data points to a single representative with a bit of higher impacts in the potential function.ReferencesAnderberg, M. R. (1973). Cluster analysis for applications. New York: Academic PressInc.Bayat, F., Jalali, A. A., & Jahed Motlag, M. R. (2009). Utilizing multi-parametricprogramming for optimal control of linear constrained systems. Iranian Journalof Control, 3(1), 12–22.Dubes, R. C. et al. (1976). Clustering techniques: The user’s dilemma. PatternRecognition, 8, 247–260.Fisher, L. et al. (1971). Admissible clustering procedures. Biometrika, 58, 91–104.Gan, W. Y. (2003). Study on clustering problem for data mining foundations. PhDthesis, University of Science and Technology, Nanjing.3325Gan, W. Y. et al. (2003). Optimal choice of parameters for a density-based clusteringalgorithm. In Proceedings of the 19th international conference on rough sets, fuzzysets, data mining (pp. 603–606).Gan, G., Wu, J., & Yang, Z. (2009). A genetic fuzzy k-modes algorithm for clusteringcategorical data. Expert Systems with Applications, 36, 1615–1620.George, K. et al. (1999). Chameleon: A hierarchical clustering algorithm usingdynamic modeling. IEEE Computer, 27(3), 229–241.Hsu, C.-C., & Huang, Y.-P. (2008). Incremental clustering of mixed data based ondistance hierarchy. Expert Systems with Applications, 35, 1177–1185.Jain, A. K. et al. (1996). Image segmentation using clustering. Advances in imageunderstanding: A Festschrift for Azriel Rosenfeld. Piscataway, NJ: IEEE Press (pp.65–83).Mitra, P., Pal, S. K., & Siddiqi, M. A. (2003). Non-convex clustering using expectationmaximization algorithm with rough set initialization. Pattern RecognitionLetters, 24, 863–873.Oyang, Y. J. et al. (2001). A study on hierarchical data clustering algorithm based ongravity theory. In Proceeding of ﬁfth European conference on principles andpractice of knowledge (pp. 350–361).Pal, S. K., Ghosh, A., & Uma Shankar, B. (2000). Segmentation of remotely sensedimages with fuzzy thresholding, and quantitative evaluation. InternationalJourna

tion, in general, has a wide domain of applications. It includes all from biology and biomedical imaging sciences to military and archeology applications. According to the sensitiveness and limita- . Expert Systems with Applications 37 (2010) 3318-3325 Contents lists available at ScienceDirect Expert Systems with Applications

Related Documents:

SEISMIC: A Self-Exciting Point Process Model for ...

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity Qingyuan Zhao Stanford University qyzhao@stanford.edu Murat A. Erdogdu Stanford University erdogdu@stanford.edu Hera Y. He Stanford University yhe1@stanford.edu Anand Rajaraman Stanford University anand@cs.stanford.edu Jure Leskovec Stanford University jure@cs.stanford .

71 Views

3y ago

Domain Adversarial Training for QA Systems

Domain Adversarial Training for QA Systems Stanford CS224N Default Project Mentor: Gita Krishna Danny Schwartz Brynne Hurst Grace Wang Stanford University Stanford University Stanford University deschwa2@stanford.edu brynnemh@stanford.edu gracenol@stanford.edu Abstract In this project, we exa

57 Views

2y ago

Deep Learning for Aspect-Based Sentiment Analysis - Stanford University

Stanford University Stanford, CA 94305 bowang@stanford.edu Min Liu Department of Statistics Stanford University Stanford, CA 94305 liumin@stanford.edu Abstract Sentiment analysis is an important task in natural language understanding and has a wide range of real-world applications. The typical sentiment analysis focus on

20 Views

1y ago

Predicting COVID-19 in Chest X-Ray Images - Stanford University

Computer Science Stanford University ymaniyar@stanford.edu Madhu Karra Computer Science Stanford University mkarra@stanford.edu Arvind Subramanian Computer Science Stanford University arvindvs@stanford.edu 1 Problem Description Most existing COVID-19 tests use nasal swabs and a polymerase chain reaction to detect the virus in a sample. We aim to

31 Views

1y ago

Expert Systems Development: Some Problems, Motives and Issues in An ...

3.7 Perceived requirement for expert systems 52 3.8 The role of an expert system 52 3.9 Responsibility of expert systems 55 3.10 Motivation for developing an expert system 56 3.11 Validation 59 3.12 Summary 60 4. EXPERT SYSTEM DEVELOPMENT 4.1 Introduction: A very brief background 61 4.2 Categories of information systems and problems 61 4.3 .

15 Views

1y ago

Chapter 1: Introduction to Expert Systems - WordPress.com

Expert Systems: Principles and Programming, Fourth Edition 3 Objectives Examine earlier expert systems which have given rise to today's knowledge-based systems Explore the applications of expert systems in use today Examine the structure of a rule-based expert system Learn the difference between procedural and nonprocedural paradigms

12 Views

1y ago

Developing Expert Systems

the expert's expertise is "good" and the expert himself (or herself) is ' good"then the Expert System will perform admirably. However, models can never be 100% accurate and no expert is omniscient. Because of this, it is important that users of Expert Systems exercise caution in interpreting the answers produced by these systems.

13 Views

1y ago

Perfusionists certified by the American Board of ...

Perfusionists certified by the American Board of Cardiovascular Perfusion through December 31, 2021. LAST FIRST CITY STATE COUNTRY Al-Marhoun Sarah New Orleans LA Alouidor Benjamin Los Angeles CA Alpert Bettina P. Marlborough MA Alpha Debra Reynolds Zionville IN Alshi Hanin Nooraldin H. Jeddah MA Saudi Arabia

51 Views

3y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

11m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

LIST OF EXHIBITORS

jec world 2020 list of exhibitors / updated on november 20, 2019 company name country corelite inc usa coriolis composites france corso magenta france new cousin composites france covestro deutschland ag germany cpic international co.,limited hong-kong cqfd composites france creaform (ametek sas - division creaform) france crepim france ctmi france

2y ago

115 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Google Cheat Sheets - Shake Up Learning

Google Slides Cheat Sheet p. 15-18 Google Sheets Cheat Sheet p. 19-22 Google Drawings Cheat Sheet p. 23-26 Google Drive for iOS Cheat Sheet p. 27-29 Google Chrome Cheat Sheet p. 30-32 ShakeUpLearning.com Google Cheat Sheets - By Kasey Bell 3

2y ago

296 Views

Expert Systems With Applications - Stanford University

It looks like you're using an ad-blocker