NN-Descent On High-Dimensional Data

3y ago
42 Views
2 Downloads
1.07 MB
8 Pages
Last View : 6d ago
Last Download : 3m ago
Upload by : Isobel Thacker
Transcription

NN-Descent on High-Dimensional DataBrankica BratićUniversity of Novi SadFaculty of SciencesNovi Sad, Serbiabrankica.bratic@dmi.uns.ac.rsMichael E. HouleNational Institute of InformaticsTokyo, Japanmeh@nii.ac.jpVincent OriaVladimir KurbalijaUniversity of Novi SadFaculty of SciencesNovi Sad, Serbiakurba@dmi.uns.ac.rsMiloš RadovanovićNew Jersey Inst. of TechnologyDept. of Computer ScienceNewark, New Jersey, USAvincent.oria@njit.eduUniversity of Novi SadFaculty of SciencesNovi Sad, Serbiaradacha@dmi.uns.ac.rsABSTRACT1K-nearest neighbor graphs (K-NNGs) are used in many data-miningand machine-learning algorithms. Naive construction of K-NNGshas a complexity of O(n 2 ), which could be a problem for largescale data sets. In order to achieve higher efficiency, many exactand approximate algorithms have been developed, including theNN-Descent algorithm of Dong, Charikar and Li. Empirical evidence suggests that the practical complexity of this algorithm isin Õ(n 1.14 ), which is a significant improvement over brute forceconstruction. However, NN-Descent has a major drawback — itproduces good results only on data of low intrinsic dimensionality.This paper presents an experimental analysis of this behavior, andinvestigates possible solutions. We link the quality of performanceof NN-Descent with the phenomenon of hubness, defined as thetendency of intrinsically high-dimensional data to contain hubs —points with high in-degrees in the K-NNG. We propose two approaches to alleviate the observed negative influence of hubs onNN-Descent performance.A K-nearest neighbor graph (K-NNG) is a directed graph whosevertices represent elements upon which some distance function isdefined. Vertex v is unidirectionally connected to vertex u only ifvertex u is one of the K elements that are nearest to point v withrespect to the defined distance function. K-NNGs are used in manymachine-learning and data-mining algorithms [8], including classification [7], similarity search [11], clustering and outlier detectionproblems [2, 3, 12], manifold learning [1, 20, 22]. They are also usedin other areas, such as robot motion planning [5] and computergraphics [21].Naive brute-force computation of a K-NNG entails n · (n 1)distance computations, which leads to quadratic time complexity.Numerous methods for K-NNG construction have been developedin order to decrease the computational cost, but many of themintroduce certain restrictions and therefore do not apply to the general case. For example, many efficient methods are developed forEuclidean or other Lp distance metrics [4, 6], whereas others canbe applied to more general metric spaces [14]. In order to bypassall those restrictions while still preserving efficiency, Dong, Charikar and Li created the NN-Descent [10] algorithm, that efficientlyproduces highly accurate K-NNG approximations independentlyof the underlying distance function. As reported by the authors,empirical complexity of NN-Descent is in Õ(n 1.14 ). Although theresults produced by this algorithm are mostly excellent, there isstill one major drawback — NN-Descent often produces inaccurateresults when it is used on data of high intrinsic dimensionality.Park et al. [15] developed a modification of NN-Descent basedon a greedy filtering method, with which they obtained improvedresults for high-dimensional data together with the cosine similarity measure. Houle et al. [13] introduced NNF-Descent (NearestNeighbor Feature Descent), an NN-Descent-based feature sparsification method optimized for image databases. It uses the Laplacianscore in order to identify noisy values, which are then replacedwith zeros. Debatty et al. [9] presented a method for constructing aK-NNG for large text data sets. This method applies NN-Descentonly to smaller buckets of data, leading to a lower execution time.Although several improvements have been proposed for NNDescent, so far none of them have given an explanation for thevariability of its performance, nor have they given a universal solution to the problem. In this paper we will provide an explanationCCS CONCEPTS Information systems Information retrieval; Theory ofcomputation Design and analysis of algorithms;KEYWORDSNN-Descent, k-nearest neighbor graph, hubnessACM Reference Format:Brankica Bratić, Michael E. Houle, Vladimir Kurbalija, Vincent Oria, and Miloš Radovanović. 2018. NN-Descent on High-Dimensional Data. In WIMS’18: 8th International Conference on Web Intelligence, Mining and Semantics, June 25–27, 2018, Novi Sad, Serbia. ACM, New York, NY, USA, 8 ssion to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from permissions@acm.org.WIMS ’18, June 25–27, 2018, Novi Sad, Serbia 2018 Association for Computing Machinery.ACM ISBN 978-1-4503-5489-9/18/06. . . UCTION

WIMS ’18, June 25–27, 2018, Novi Sad, Serbiafor this variation in terms of hubness, a measure of the variation innearest neighbor relationships [16]. We also propose two approaches that improve the quality of NN-Descent approximations forhigh-dimensional datasets.Section 2 will describe the necessary background by providingan overview of the NN-Descent algorithm and the phenomenon ofhubness. In Section 3 we show how a minor modification of the NNDescent algorithm can produce estimates of hubness. Experimentalresults presented in Section 4 demonstrate the empirical effect ofhubness on NN-Descent. Section 5 proposes methods that to someextent overcome the problems of NN-Descent on high-dimensionaldata. Finally, Section 6 gives an overview of the conclusions andresults, and indicates directions for future work.2BACKGROUNDIn this section we will review the NN-descent algorithm (Section 2.1)and the hubness phenomenon (Section 2.2), which, as we will demonstrate later, has a significant impact on NN-Descent.2.1B. Bratić et al.Algorithm 1: Outline of NN-Descent algorithm.input : dataset D, distance function dist, neighborhood size Koutput : K-NNG G1 foreach data point u D do2Initialize G by randomly generating a tentative K-NN listfor u with an assigned distance of ;3 end4 repeat5foreach data point u D do6Check different pairs of u’s neighbors (v, w) in u’sK-NN and R-NN (reverse nearest neighbor) lists, andcompute dist(v, w);7Use ⟨v, dist(v, w)⟩ to update w’s K-NN list, and use⟨w, dist(v, w)⟩ to update v’s K-NN list;8end9 until G converges;10 return G.NN-DescentThe main purpose of the NN-Descent algorithm is to create a goodapproximation of the true K-NNG, and to create it as fast as possible. The basic assumption made by NN-Descent can be summarizedas “my neighbor’s neighbor is also likely to be my neighbor.” Thealgorithm starts by creating a random K-NNG, and then iterativelyimproves the neighbor lists by using the current neighborhood relationships to suggest new neighbor candidates. During the executionof NN-Descent, for a data point u, we will refer to the current listof its K nearest neighbors as the NN list of u. A reverse neighborof u is a data point which has u in its own NN list; we will refer tothe current list of reverse neighbors of u as its RNN list.NN-Descent is organized into stages, during each of which asingle improvement iteration is applied to each of the data points.The algorithm iteration for u examines its NN and RNN lists tosee whether any points in these lists should be added to the NNlists of any of the others. If this is the case, the algorithm makesthe necessary changes, and increases the number of total updates.After every stage has completed, the algorithm tests a terminationcondition based on the number of updates that were applied during the stage. If the number of updates is smaller than a suppliedthreshold, the algorithm terminates; otherwise, it continues withthe next stage. An outline of NN-Descent is shown as Algorithm 1.The speed of NN-Descent is strongly influenced by the K value.As K becomes larger, the algorithm becomes slower. To be moreprecise, the time complexity has a quadratic dependence on K, sinceduring each iteration, points appearing in NN or RNN lists of thesame point are evaluated as candidates for each others NN lists. As away to reduce the number of combinations of points considered, theauthors introduced sampling [10]. Sampling reduces the numberof evaluations by taking a random selection of NN and RNN lists,and then operates only on the points from those lists. Samplingis controlled by a parameter ρ that takes a value from 0 to 1. Thealgorithm takes ρ · K points from both NN and RNN lists.The second drawback of NN-Descent is that quality of approximation it produces is highly influenced by the intrinsic dimensionality of the data set, in that the quality of the approximationdecreases as the intrinsic dimensionality increases. The reason forthis has not been adequately explained. Although low-dimensionalembeddings have often been used as a general method of reducingthe complexity of data indexing, no solutions have yet been proposed that allow the NN-Descent strategy to work more effectivelywith high-dimensional data. In this paper we will explain the influence of high dimensionality on NN-Descent, and will also proposetwo approaches that are designed to overcome this challenge tosome extent.2.2HubnessHubness is an aspect of the curse of dimensionality pertaining tonearest neighbors which has come to the attention of the researchcommunity only relatively recently [16]. Let D Rd be a set ofdata points and let the k-occurrences of point x D be the numberof times x occurs in k-nearest-neighbor lists of other points from D.Nk (x) is then the number of k-occurrences of point x D — thatis, the in-degree of node x in the K-NNG. As the dimensionality ofthe data increases, the distribution of Nk (x) becomes considerablyskewed. As a consequence, some data points, which we will refer toas hubs, are included in many more k-nearest-neighbor lists thanother points.In the remainder of the discussion we will refer to the numberof k-occurrences of point x D as its hubness value. If a data setcontains many points that are hubs — that is, if the distribution ofhubness values is highly skewed — it can be said that the hubnessphenomenon is present therein. It has been shown that hubness,as a phenomenon, appears in high-dimensional data as an inherent property of intrinsically high dimensionality, and is not anartifact of finite samples nor a peculiarity of specific data sets [16].It was shown that hubness influences various data-mining andmachine-learning algorithms [16–19, 23], often in a negative way.Practical computation of (exact) hubness values entails the creationof the K-NNG, from which hubness values can be easily obtainedby extracting the in-degrees of nodes.

NN-Descent on High-Dimensional Data3HUBNESS ESTIMATION USINGNN-DESCENTApproximate hubness values for each data point could be veryeasily calculated during NN-Descent, with minimal impact on algorithm performance. At the very beginning, we initialize the hubnessvalues of each data set point to zero. Then, during algorithm execution, we increase the hubness value of a given point by one if thatpoint is added to the NN list of some other point, and analogously,we decrease the hubness value by one if the point is removed fromsome NN list.Some of the results that we obtained after this small upgrade ofNN-Descent algorithm are illustrated in Figure 1, which shows thecorrelation between estimated and exact hubness values of datapoints. For the purposes of the experimentation in this paper, wecreated several synthetic data sets drawn from a Gaussian distribution with standard deviation 1 and mean (0, 0, . . . , 0). The data setseach have n 10000 points, but differ in their dimensionality d.During the analysis we also varied the size of the nearest neighborlists, represented by the parameter K. We considered the values 10,20 and 50 for K, and the values 2, 10, 20 and 100 for parameter d.As a distance measure we used the Euclidean distance. In Figure 1we show results only for the data set of highest dimensionality(d 100), since the hubness phenomenon is known to strengthenas the dimensionality of the data increases. As can be seen, thecorrelation between real and estimated hubness values is very high.A shortcoming of NN-Descent is that estimated hubness values arelower then they should be for points with low true hubness values,and greater then they should be for points with high true hubness.The precision improves as K increases, but even for small K values,the algorithm produces strong correlations.4WIMS ’18, June 25–27, 2018, Novi Sad, Serbia(a)(b)INFLUENCE OF HUBNESS ON NN-DESCENTNN-Descent is known to produce large amount of incorrect Knearest neighbors when applied upon high-dimensional data. Sincehubness is a phenomenon that appears in high-dimensional data,and at the same time was shown to influence many other datamining algorithms, in this section we investigate whether it alsoinfluences the performance of NN-Descent.For a given data set point, we will define the hit count to be thenumber of K-nearest neighbors produced by NN-Descent that areat the same time one of the point’s K-nearest neighbors in the exactK-NNG. In other words, the hit count tells us how many correctselections NN-Descent has made for the neighborhood list of agiven data point. The first part of our analysis will be to examinethe correlation between hit count values and hubness values of datapoints. In order to do that, we first calculated the exact K-NNGs,as well as hubness values for each data set point. We then ran NNDescent without sampling and early termination, and calculatedhit count values based on the previously calculated K-NNG andNN-Descent graph approximation. We did this for all generateddata sets.Some of the results can be seen in Figure 2. Let us now saythat for the points of fixed hubness value, hit count values aredistributed between hcmin and hcmax . Results are showing us thatboth values are directly proportional to the hubness value, with thisbehavior being much more evident in the case of the hcmin value.(c)Figure 1: Correlation between estimated and exact hubnessvalues of data set points. (a) K 10, d 100; (b) K 20, d 100;(c) K 50, d 100. Here, K is the size of the neighbor lists,and d is the dimensionality of the data set.In the case of hcmax , the phenomenon is less evident (Figure 2d).As a consequence, points with extremely high hubness values havehcmin hcmax K, which means that their hit counts usuallyattain the greatest possible value. For the lowest hubness values, itholds that hcmin is near zero, while the hcmax depends on K and d;however, for all tested data sets, this value is K or very nearly K. Inother words, some points with extremely low hubness values havea high hit count, and some have hit counts that are very low, which

WIMS ’18, June 25–27, 2018, Novi Sad, SerbiaB. Bratić et al.(a)(b)(c)(d)Figure 2: Hubness and hit count values of data set points. (a) K 10, d 2; (b) K 10, d 10; (c) K 10, d 20; (d) K 10, d 100.Here, K is the size of the neighbor lists, and d is the dimensionality of the data set.means that the probability of error when determining the K nearestneighbors of point x is inversely proportional to the hubness of x.The described phenomenon is more evident in data sets with lowerK values and greater dimensionality d. If K is sufficiently large,and d is sufficiently small, the phenomenon is less evident, as thehubness values of the points are more uniform. In these cases, NNDescent produces a very good approximation of K-NNG. In furtherresearch we will concentrate on settings for which the performanceof NN-Descent is poor, as we saw for the case where d 100.Now that the influence of hubness on NN-Descent has been established, we examine the cause of this phenomenon. As was alreadydescribed in Section 2.1, the NN-Descent algorithm improves theK-NN graph approximation in each iteration. If the hubness of thedata set is high, then in the first few stages hubs will be placedamong the K nearest neighbors of a large number of data points.This implies that the NN lists of a majority of points will containhubs. Points with lower hubness values will quickly be expelledfrom NN lists of other points. This expulsion of non-hubs fromNN lists implies that the neighborhoods of those points, accordingto the nature of the algorithm, will not be updated as often — inorder for the NN list of a given point to be updated, that pointmust be present in the NN or RNN lists of other data points. Ifa point is rapidly expelled from NN lists, then it will be updatedonly from RNN lists of the points that are very unlikely to be theirtrue neighbors, since the initial approximation of the K-NN graphis essentially random. For some points of low hubness value, thisproduces relatively poor results.Another issue to consider is that some points with low hubnessvalues may still have high hit count values, while other low-hubnesspoints may have very low hit counts. In order to quantify the extentto which the variation in hit count values depends on hubness, wecalculated hit count values and their standard deviations for eachdata point, over 100 runs of NN-Descent, each time generatingthe random graph with a different random seed. The results areshown in Figure 3, where the intensity of the red color representsthe mean of hit count values among different runs — points withgreater intensity have greater average hit count values. As canbe seen, points with low hubness values have hit counts with arelatively high standard deviation, while points with high hubnessvalues have hit count standard deviations that tend toward zero.This implies that hit count values vary significantly across differentruns, while the only difference between runs is the initial random

NN-Descent on High-Dimensional DataWIMS ’18, June 25–27, 2018, Novi Sad, Serbiagraph. All of this leads us to the conclusion that the main factor thataffects the hit count of points with low hubness is the initial randomK-NNG: if the point is in a high-quality neighborhood in the initialrandom graph, then NN-Descent is more likely to produce a highhit count. Otherwise, the point will be quickly expelled from NNlists, and will stay trapped in the RNN lists of points that are notits true neighbors, leading to a relatively low probability that thepoint’s NN list will be updated with true neighbors.(a)Figure 3: Results with K 10 and d 100; the y-axis showsstandard deviations of hit count values for 100 runs of NNDescent, while the x-axis shows hubness values of the corresponding data points.5(b)PROPOSED METHODS FOR IMPROVINGNN-DESCENT ON HIGH-DIMENSIONALDATAIn order to overcome the problem described in Section 4, we implemented two different approaches. The first one is based on the ideathat NN lists should not be updated strictly according to the NN andRNN lists of other points. The goal is to integrate the informationabout hubness values into the choice of points to compare withthe current approximation of the NN list. A high hubness valueindicates that a certain point already has a reasonably stable NNlist, and vice versa — a low hubness value suggests that the pointhas a greater probability of being assigned an incorrect NN list.The second approach is based on the observation that for greaterK values the algorithm tends to be more accurate. If we run theNN-Descent algorithm with a gr

and the hubness phenomenon (Section 2.2), which, as we will de-monstrate later, has a significant impact on NN-Descent. 2.1 NN-Descent The main purpose of the NN-Descent algorithm is to create a good approximation of the true K-NNG, and to create it as fast as possi-ble. The basic assumption made by NN-Descent can be summarized

Related Documents:

Mirror descent 5-2 Convex and Lipschitz problems minimizex f (x) subject to x ! C f is convex andLf-Lipschitz continuous Mirror descent 5-35 Outline Mirror descent Bregman divergence Alternative forms of mirror descent Convergence analysis f (xt) !! f (xt),x " xt " " 1 2!t #x " xt#2

Method of Gradient Descent The gradient points directly uphill, and the negative gradient points directly downhill Thus we can decrease f by moving in the direction of the negative gradient This is known as the method of steepest descent or gradient descent Steepest descent proposes a new point

2.2 DESCENT THEORY 2.2.1 Development of Descent Theory Descent theory also known as lineage theory came to the fore in the 1940s with the publication of books like The Nuer (1940), African Political Systems (1940) etc. This theory was in much demand in the discussion of social structure in British anthropology after the 2nd World War. It had .

5.4.2 Steepest descent It is a close cousin to gradient descent and just change the choice of norm. Let’s suppose q;rare complementary: 1 q 1 r 1. Steepest descent just update x x t x, where x kuk r u u argmin kvk q 1 rf(x)T v If q 2, then x r f(x), which is exactly gradient descent.

2 f( ). While any method capable of minimizing this objective function can be applied, the standard approach for differentiable functions is some form of gradient descent, resulting in a sequence of updates t 1 t trf( t). The performance of vanilla gradient descent, however, is hampered by the fact that it only makes use

2 Lab 12. Gradient Descent Methods At each step, solve the following one-dimensional optimization problem. k argmin f(x k Df(x k)T) Using this choice is called exact steepest descent . This option is more expensive per iteration than the above strategy, but it results in fewer iterations before convergence. Problem 1.

2 KEITH CONRAD 2. Irrationality by descent Here is the usual proof that p 2 is irrational, expressed using the idea of descent. Example 2.1. We assume p 2 is rational, so p 2 a bwith positive integers aand b. Squaring both sides and clearing the denominator, 2b2 a2. (This is an equation we want to show is not solvable in positive integers .

2nd Grade ELA-Writing Curriculum . Course Description: Across the writing genres, students learn to understand —and apply to their own writing—techniques they discover in the work of published authors. This writing course invites second-graders into author studies that help them craft powerful true stories. They engage in a poetry unit that focuses on exploring and using language in .