Parallel K Nearest Neighbor Graph Construction Using Tree-Based Data .

1y ago
6 Views
1 Downloads
1.01 MB
8 Pages
Last View : 30d ago
Last Download : 3m ago
Upload by : Maleah Dent
Transcription

Parallel k Nearest Neighbor Graph Construction UsingTree-Based Data StructuresNazneen RajaniKate McArdleInderjit S. DhillonDept. of Computer ScienceUniversity of Texas at AustinAustin, TX 78712-1188, USAThe Center for AdvancedResearch in SoftwareEngineeringUniversity of Texas at AustinAustin, TX 78712-1188, USADept. of Computer ScienceUniversity of Texas at AustinAustin, TX 78712-1188, .mca@utexas.eduABSTRACTConstruction of a nearest neighbor graph is often a necessary step in many machine learning applications. However,constructing such a graph is computationally expensive, especially when the data is high dimensional. Python’s opensource machine learning library Scikit-learn uses k-d treesand ball trees to implement nearest neighbor graph construction. However, this implementation is inefficient for largedatasets. In this work, we focus on exploiting these underlying tree-based data structures to optimize parallel execution of the nearest neighbor algorithm. We present parallelimplementations of nearest neighbor graph construction using such tree structures, with parallelism provided by theOpenMP and the Galois framework. We empirically showthat our parallel and exact approach is efficient as well asscalable, compared to the Scikit-learn implementation. Wepresent the first implementation of k-d trees and ball treesusing Galois. Our results show that k-d trees are faster whenthe number of dimensions is small (2d N ); ball trees onthe other hand scale well with the number of dimensions.Our implementation of ball trees in Galois has almost linearspeedup on a number of datasets irrespective of the size anddimensionality of the data.1.INTRODUCTIONConstruction of a nearest neighbor graph is often a necessary step in data mining, computer vision, robotics, machinelearning, and other research fields. The nearest neighbor, orin general, the k nearest neighbor (kNN) graph of a data setis obtained by connecting each instance in the data set toits k closest instances from the data set, where a distancemetric defines closeness. However, this important step isoften computationally expensive, especially when the datais of high dimensionality. To compute a nearest neighborgraph of a large data set with high-dimensional features,known exact nearest-neighbor search algorithms usually donot provide acceptable performance. To speed up the performance, many applications must settle for approximate knearest neighbor graphs.In this paper, we focus on k-d trees and ball trees, populartools used to construct both exact and approximate nearestneighbor graphs. K-d trees are data structures that organizeThis work is licensed under a Creative Commons Attribution-ShareAlike4.0 International , August 10, Sydney, Australia.points in a d dimensional space by dividing the space intoseveral partitions [3]. Each d-dimensional point in a dataset is represented by a node in the k-d tree, and every levelof the tree splits the space along one of the d dimensions.Thus every node that is not a leaf node implicitly generates ahyperplane perpendicular to the dimension on which its levelsplits, and all nodes in the node’s left subtree fall to the leftof the hyperplane, while all nodes in the node’s right subtreefall to the right of the hyperplane. A ball tree, like the kd tree, is also a binary tree data structure that organizespoints in multidimensional space. Each node owns the setof points in its subtree. Thus the root node has the fullset of points in the dataset and each leaf node has somemaximum number of points, called leaf size. A non-leaf nodedoes not explicitly contain any points, but it points to twochild nodes such that child1.points child2.points φ andchild1.points child2.points node.points. Each node hasa pivot and a radius that determine how the points are splitamong the child nodes [11].To use a k-d tree or a ball tree for nearest neighbor search,the tree must first be built, and then it is searched. One popular implementation of nearest neighbor search with suchtrees is found in Scikit-learn [15], an open-source machinelearning library in Python1 . K-d trees are commonly usedfor nearest neighbor search because for small D ( 20), approximately O(DlogN ) operations are needed in the caseof randomly distributed N points in D dimensions. However, in high-dimensional spaces, the curse of dimensionalitycauses the algorithm to need to visit many more branchesthan in lower-dimensional spaces and the number of operations increases to O(DN ). In particular, when the number ofpoints is only slightly higher than the number of dimensions,the algorithm is only slightly better than a brute force linearsearch of all of the points. If exact results are not neededfrom the nearest neighbor search, k-d trees can be used foran approximate search, by stopping the search early, for example after a given number of leaf nodes in the tree havebeen visited. The complexity for searching a ball tree on theother hand grows as O(DlogN ) even in high dimensionality.The complexity for k-d trees and ball trees also depends onthe leaf size. With a bigger leaf size, more time is spent doinglinear search within the leaf nodes; however, if the leaf size issmaller, building the tree takes more time than building thesame tree with a bigger leaf size. Thus, there is a tradeoffbetween the tree building time and linear search time over1http://scikit-learn.org/

leaf nodes. We note that the construction of a k-d tree isnot affected as much by the leaf size as the construction ofa ball tree due to their underlying structures[15].In this work, we present implementations of exact nearest neighbor search using k-d trees and ball trees in parallel settings, namely OpenMP2 and the Galois system3 , agraph-based framework that provides data-parallelism. Ourprincipal contribution is the implementation of k-d trees andball trees in Galois and comparing our approaches to Scikitlearn’s. The principal bottleneck for parallelization is theparallel tree construction while ensuring full resource utilization. Galois however overcomes it in the way it activatesnodes, Section 3. We provide efficient implementations forboth k-d trees and ball trees. Before describing our implementation in more detail in Section 4, we first provide anoverview of related nearest neighbor algorithms in Section 2and a more detailed background of k-d trees, ball trees, andthe Galois framework in Section 3. In Section 5 we presentour experimental results.2.RELATED WORKThe k-d tree [3] is a well-known nearest neighbor searchalgorithm, as it is very effective in low-dimensional spaces.One popular implementation of exact kNN search using k-dtrees is found in Scikit-learn; this Python library has beenused for several machine learning applications [5]. However,the use of k-d trees for kNN search suffers a decrease in performance for high dimensional data. As such, Scikit-learn’sperformance often drops off as data grows in dimension orin size. The current version of Scikit-learn does not nativelysupport parallelism for its k-d tree and ball tree implementations and there is not much literature on its use for largeand high-dimensional datasets. Some methods have beenproposed to remedy this performance loss while still usingk-d trees. One way of approximating nearest neighbor searchis by limiting the time spent during search, or “time bound”approximate search, as proposed by [2]. In this approach,search through the k-d tree is stopped early after examininga fixed number of leaf nodes.K-d trees do not scale well with dimension, hence balltrees, which are similar to k-d trees in the way they organize points spatially, have been widely studied. Severalalgorithms have been proposed for efficient construction ofball trees on large data. Moore et al. uses the idea of anchors instead of balls and uses the triangle inequality toefficiently build a ball tree that prunes nodes which wouldnot belong to the current child [11]. Kumar et al. do acomprehensive survey of tree-based algorithms for nearestneighbor search [8]. Multiple, randomized k-d trees (a k-dforest) are proposed in [19] as a means to speed up the approximate nearest neighbor search; this is one of the mosteffective methods for matching high dimensional data [12].This approach builds multiple k-d trees that are searched inparallel. While the classic k-d tree construction splits dataon the dimension with the highest variance, the randomizedforest approach chooses the split dimension randomly fromthe top Nd dimensions with the highest variance, where Ndis selected by the user. When searching the randomized k-dforest in parallel, a single priority queue is maintained acrossall the randomized trees.In [17], an implementation of kNN graph construction in23http://openmp.orghttp://iss.ices.utexas.edu/?p projects/galoisa distributed environment is proposed. Message passing isused to communicate between processors in a cluster andefficiently distribute the computation of kNN graphs. Theauthors show that nearly linear speedup can be obtainedwith over one hundred processors. Note the distinction between “multicore” and “distributed” approaches. Distributedis across machines while multicore is shared memory abstraction using multiple cores on the same machine. While [17]use a distributed approach, the focus of this paper is on amulticore approach using Galois.3.BACKGROUNDThe focus of this work is to provide implementations ofkNN graph construction using k-d trees and ball trees inOpenMP and the Galois framework.3.1 k-d TreesMultidimensional binary search trees, better known as kd trees, were first proposed by Bentley in 1975 [3] as anefficient means to store and retrieve information. The k-dtree is an extension of the binary search tree for multidimensional data, in which each d-dimensional point in a dataset is represented by a node in the k-d tree. Every nodein the tree splits the data in its subtrees along one of thed dimensions, such that all descendants of the node whosevalue at the splitting dimension is less than that of the nodeare found in the node’s left subtree, and all descendants ofthe node whose value at the splitting dimension is greaterthan that of the node are found in the node’s right subtree.For descendants of the node whose value at the splitting dimension is equal to that of the node, the choice of left orright subtree is arbitrary.Algorithm 1 KDTree(points, N , d)1: find the dimension with most spread, store as dim2: sort in place all points in increasing order of each’s valueat dim3: median N 24: tree.root.point points[median]5: tree.root.dim dim6: buildSubtree(root, “left”, points, 0, median);7: buildSubtree(root, “right”, points, median 1, N );3.1.1Tree ConstructionThere are several ways to construct a k-d tree, withthe “best” approach depending on the application and thedataset. One may randomly insert nodes one at a time;this approach is useful when the set of data points will varyover time. For our purposes of using k-d trees for nearestneighbor search, we focus instead on a standard top-downapproach presented in [4], in which the set of data pointsremains constant and is known a priori, and the resultingk-d tree is balanced.The top-down recursive algorithm that our approach usesis presented in Algorithms 1 and 2. Given an array of theentire set of data points, the algorithm finds the dimensionwith the maximum spread in the data. It then sorts thearray in place, ordered by each point’s value at that dimension. The point at the median of this now-sorted array isassigned as the tree’s root. Using the recursive buildSubtreefunction, all points to the left of the median in the array arepassed to the root’s left subtree, while all points to the rightof the median in the array are passed to the root’s rightsubtree. The buildSubtree function similarly sorts the givenportion of the array on the dimension with the most spread,

assigns the median within this portion to the parent node’sleft or right child accordingly, and passes the two halves ofits portion of the array to the child’s subtrees. It is important to note that for a given level of the tree, the portionsof the array being sorted never overlap. We note that analternative approach to constructing a k-d tree is to simplybegin with the first dimension as the splitting dimension forthe root and increase the splitting dimension by one for eachlevel down in the subtree. This is the approach we used forthe OpenMP implementation described in the next section.While this implementation results in a faster time to buildthe k-d tree, it often results in a longer time to performnearest neighbor search on the tree.Algorithm 2 buildSubtree(parent, childT ype, points,startIndex, :18:19:20:21:22:23:24:25:if endIndex startIndex thenreturnend ifif endIndex startIndex 1 thenif child “left” thenparent.lef tChild.point points[startIndex]else if child “right” thenparent.rightChild.point points[endIndex]end ifreturnend iffind the dimension with most spread, store as dimsort in place points between startIndex andendIndex in increasing order of each’s value atdimmedian startIndex (endIndex startIndex)/2if child “left” thenparent.lef tChild.point points[median]parent.lef tChild.dim dimbuildSubtree(parent.lef tChild,“left”,points,startIndex, median);buildSubtree(parent.lef tChild,“right”,points,median 1, endIndex);else if child “right” thenparent.rightChild.point points[median]parent.rightChild.dim s,startIndex, median);buildSubtree(parent.rightChild, “right”, points,median 1, endIndex);end ifAlgorithm 3 findKNN(keyP oint, kdtree, k)1: pq new priority queue of potential neighbors, prioritized on their distance to keyP oint2: searchKDSubtree(pq, root, keyP oint, k)3: for each neighborP oint in pq do4:add edge from keyP oint to neighborP oint5: end for3.1.2kNN Graph ConstructionGiven a k-d tree constructed for a set of data points, wenow wish to construct a kNN graph, such that every datapoint in the dataset corresponds to a node in the graph, andthere is an edge from node1 to node2 if node2 is among thek nearest neighbors of node1. To construct the kNN graphfrom the k-d tree, we use a standard approach such as theone presented in [20].The kNN search algorithm we use is presented in Algorithms 3 and 4. The algorithm in 3 is performed forevery point in the dataset. We use a priority queue tostore the k best neighbors of the point in question (whichwe call the keyP oint), prioritized on each neighbor’s distance to the keyP oint. The algorithm calls the recursivesearchKDSubtree function. When it returns, the pointsstored in the priority queue correspond to the k best neighbors for the keyP oint, and edges are added to the kNNgraph from the keyP oint to each neighbor. The recursivesearchKDSubtree function operates on a node in the k-dtree. If the priority queue has fewer than k elements init or if the distance from the keyP oint to the point corresponding to the given node (which we call currentP oint) isless than the distance to the farthest neighbor in the priority queue, then the algorithm adds the currentP oint to thepriority queue, popping the farthest neighbor if necessary.It then searches the current node’s left subtree (if it exists)if the keyP oint value at the dimension on which this nodesplits (in the k-d tree) is less than that of the current node;otherwise, it searches the current node’s right subtree (if itexists). After this recursive subtree search returns, the algorithm considers if the current node’s other subtree mustbe searched: if fewer than k neighbors have been found orif the distance between the keyP oint value at the currentnode’s splitting dimension and that of the currentP oint isless than the overall distance to the farthest neighbor in thepriority queue, the other subtree is searched.Algorithm4searchKDSubtree(pq,currentN ode,keyP oint, 20:21:22:23:3.2currentP oint currentN ode.pointdim currentN ode.dimdist distance from keyP oint to currentP ointif number of points in pq k thenpush currentP oint onto pq with priority distelse if dist priority of max-priority elementin pq thenpop max-priority element from pqpush currentP oint onto pq with priority distend ifif keyP oint[dim] currentP oint[dim] thenif currentN ode has a left child thensearchKDSubtree(pq,currentN ode.lef tChild,keyP oint, k)otherChild currentN ode.rightChildend ifelseif currentN ode has a right child thensearchKDSubtree(pq,currentN ode.rightChild,keyP oint, k)otherChild currentN ode.lef tChildend ifend ifif number of points in pq k OR keyP oint[dim] currentP oint[dim] max priority in pq thensearchKDSubtree(pq, otherChild, keyP oint, k)end ifBall TreesBall trees are very similar to k-d trees, spatially organizing points in multiple dimensions. However, unlike k-dtrees, which split the points parallel to the Cartesian axes,ball trees split data in a series of hyperspheres such that

points closer to each other go to one child while the otherset of nearby points goes to the other child. The structureof the ball tree overcomes the inefficiencies of the k-d treein high dimensions. For a nearest neighbor algorithm, asearch will only need to visit half the datapoints in a balltree but many more in a k-d tree [11]. The number of pointsfor a nearest neighbor search is reduced through use of thetriangle inequality.3.2.1Tree ConstructionMany fast algorithms for ball tree construction have beenproposed. We follow the approach described in [11]. Algorithm 5 describes the recursive method that builds the balltree top-down starting from the root node. As mentionedearlier, each node of the tree (called a ball node) owns a setof points. It has a pivot, which in our implementation isthe centroid of the points owned by that node, and a radius,which is the distance between the pivot and the furthestpoint in the node. Depending on the implementation, thecentroid can also be chosen to be one of the data points.Algorithm 5 buildSubtree(parent, N , points, D, leafS ize)1: parent.data points2: parent.pivot centroid(parent.data)3: parent.radius M ax(distance(parent.pivot, parent.points))4: parent.child1 pointf arthestf romparent.pivot5: parent.child2 pointf arthestf romparent.child16: for point parent.points do7:if dist(point, child1) dist(point, child2) then8:child1.points point9:else10:child2.points point11:end if12: end for13: if child1.points.size leaf Size then14:recursiveBuildSubtree(parent.child1, N , points,D, leaf Size);15: end if16: if child2.points.size leaf Size then17:recursiveBuildSubtree(parent.child2, N , points,D, leaf Size);18: end if3.2.2kNN Graph ConstructionThe ball tree kNN graph construction is very similar to thek-d tree’s. It examines nodes in depth-first order, startingfrom the root. During the search, the algorithm maintains apriority queue, which in our case is a max-heap. At each stepthe heap is maintained such that it contains the k nearestneighbors obtained so far. Algorithm 6 describes the recursive search in a ball tree for kNN graph construction. Figure1 displays the recursive steps for ball tree construction bysplitting on a set of points(figure taken from [10]).3.3Galois FrameworkIn their work on parallelism in algorithms, Pingali et al.[16] propose a data-centric formulation of algorithms, calledthe operator formulation, to replace the traditional programcentric abstraction. They argue that a program-centric abstraction is not adequate for data mining algorithms such asthe k nearest neighbor graph construction. They go on toprove that a generalized form of data-parallelism exists in allalgorithms and, depending on the algorithm’s structure, thisparallelism may be exploited by the program. They demon-Figure 1: Ball Tree construction on a set of points (from[10])strate their work with the Galois programming model [13].The Galois framework is a parallel graph analytics infrastructure recently presented by Nguyen et al. [13]. It is awork-item-based parallelization framework and provides arich programming model that can handle graph computations. Since Galois provides its own schedulers and scalabledata structures with coordinated and autonomous scheduling, the user does not need to manage the program’s concurrency directly. Written in C , Galois does not impose a particular graph partitioning scheme; it may be edgeor vertex based, depending on how the computation is expressed. The user writes sequential code for applications tomanipulate and maneuver around the graphs to accomplishhis particular goal. Galois has recently been shown to perform very well for graph algorithms on large datasets [18].Kulkarni et al. use Galois to do an in-depth research onthe extent of data-parallelism for irregular algorithms suchas clustering which uses the nearest neighbor approach [7].However, it does not currently contain an implementation ofnearest neighbor search using tree-based structures, whichmotivates our decision to use this framework and developsuch an entN ode,keyP oint, k)1: if distance(keyP oint, currentN ode.pivot) distance(keyP oint, pq.f irst) then2:returnpq3: else if currentN odeisaleaf node then4:for point currentN ode.points do5:if distance(keyP oint, point) distance(keyP oint, pq.f irst) then6:push currentP oint onto pq with priority dist7:if size(pq) k then8:pop max-priority element from pq9:end if10:end if11:end for12: else13:let child1 be the node closest to currentNode14:let child2 be the node furthest from currentNode15:searchBalltree(pq, child1, keyP oint, k)16:searchBalltree(pq, child2, keyP oint, k)17: end if

The authors of Galois emphasize that a parallel programcomprises an operator, schedule and parallel data structureand can be informally stated as: Parallel Program Operator Schedule Parallel Data Structure [13]. A Galoisoperator is a user-defined function that operates on a graphnode. Operators acting on different nodes can be executedin parallel and Galois ensures that no two neighboring nodesare operated on simultaneously, thus avoiding conflicts andensuring serializability. In addition, the order in which nodesare operated are provided via a schedule or a worklist. Galoisprovides several implementations for schedules and paralleldata structures, while the user provides the operator andpicks a schedule based on the application.For our implementation of nearest neighbor, we use twoGalois operators, one for building the underlying tree andone for the actual nearest neighbor search. The tree construction operator builds the k-d tree or ball tree as a Galoisgraph. The first step is to split the data points on dimensionof maximum spread. Thereafter the operator adds points ineach split to a parallel worklist. Each node in the worklistcan be operated on in parallel. The nearest neighbor graphconstruction operator has all the data points in its parallelworklist for execution, and it operates on a parallel graphdata structure. For each data point it adds edges in the kNNgraph using the constructed tree to find nearest neighbors.Pingali et al. state that exploiting the structure is themost important factor for efficient parallel implementationand thus encourage tao analysis of algorithms [16]. Tao analysis classifies algorithms based on their topology, node activation strategy and operator type. Because both k-d treesand ball trees have an underlying tree structure in their algorithms, the topology is semi-structured based on their definition [16]. Both the tree construction algorithm and thekNN graph construction algorithm for both k-d trees andball trees are classified as data driven, unordered, morph.4.PARALLEL ALGORITHMSWe extended the general sequential approaches to kNNgraph construction via k-d trees and ball trees as describedin Section 3 to two parallel settings, OpenMP and Galois.Here we discuss our specific parallelization strategies. It isvaluable to notice that the underlying tree construction aswell as the kNN graph construction can be both done in parallel one after the other. However, for all implementations,we found that constructing the underlying tree took only asmall fraction of the total time (underlying tree constructionplus kNN search and graph construction) and in some casesthe overhead of parallelizing tree construction caused theparallel construction to be slower than sequential construction. For these reasons, all implementations construct theunderlying tree sequentially and only the k nearest neighborsearch is parallelized. We now discuss extensions of the sequential versions, described in Section 3 for each of the twoparallel settings separately. We note that all implementations in the Section 4 are exact extensions of the sequentialalgorithms as opposed to the approximate extensions discussed in Section 2.4.1OpenMPExtending sequential nearest neighbor search to a parallelsetting using OpenMP is straightforward. In order to construct a kNN graph, we must find the k nearest neighborsof every node in the dataset. However, these searches arecompletely independent of one another and do not modifythe underlying tree structure. Additionally, the result of asingle search is a list of neighbors for the given node, whichis stored as one row of a two-dimensional array; but eachthread will be searching the neighbors for a unique node,corresponding to a unique row in the shared array. Therefore, the searches can be done in parallel without any concern for data races. We use an OpenMP parallel for loopover the nodes, to find each node’s k nearest neighbors.4.2GaloisExtending sequential nearest neighbor search to the Galois framework required shifting the sequential data structures for both the underlying tree and the kNN to Galoisgraphs. We used the FirstGraph object from Galois, whichallows us to mutate the graph structure (namely, the edgesbetween nodes) as we progress through our algorithms. Wenow discuss the underlying tree construction and the kNNsearch implementation in Galois. To construct the k-d treeas a Galois FirstGraph object, we define a custom nodeclass KDNode that contains the data point associated withthe node, the index of the data point in the original dataset, the dimension on which the node splits its descendants,and a flag indicating whether it is a left child of its parent.We create a KDNode for each data point and insert all ofthese nodes (without any edges) into the graph, which wecall kdtree. As in the sequential version, we find the dimension of maximum spread for all of the data points, and wefind the median point on this dimension and call it the root.We note the index of the root for later use, as the graph itself does not explicitly contain this information. We sort thedata points by the dimension of maximum spread. We thenuse the Galois concept of a worklist for iterating throughthe remaining data points. Each item in the worklist implicitly corresponds to a subtree in the kdtree: it containsthe index of the subtree’s root’s parent node, the portion ofthe data set that the subtree contains, and a flag denoting ifthe subtree’s root is the left or right child of the parent. Weadd the left and right subtrees of the root to the worklist,and we use a Galois for each iterator over the worklist,using an operator we define to recursively add the appropriate edges in the kdtree. This operator highly resemblesthe buildSubtree function from the sequential approach: itfinds the dimension of maximum spread, sorts the subtree’sdata points along that dimension, and finds the corresponding median data point. The KDNode corresponding to thismedian data point will be the root of the subtree, so theKDNode’s dimension and left child flag fields are updated accordingly. An edge is added to the kdtree from the subtree’sparent node to this KDNode. The operator then adds the leftand right subtrees to the worklist to be processed. The treeis complete when the worklist is empty.We then build the kNN graph as a Galois FirstGraphobject, again using KDNodes as the underlying node in thegraph (although we can ignore the dimension and left childfields for nodes in the kNN graph). We refer to our kNNgraph as knngraph. We create a KDNode for each data pointand insert all of these nodes (without any edges) into the knngraph. We then use a Galois do all iterator over all nodesin the knngraph, using an operator we define to traverse thekdtree in search of the given node’s k nearest neighbors.This operator highly resembles the findKNN and searchKDSubtree functions from the sequential approach. Once thekdtree has been searched for a given node’s neighbors, edgesare added to the knngraph from the node to each of its knearest neighbors, and the value of the edge is equal to the

# data points (N)# dimensions (d)MNIST test set10, 000784MNIST training set60, 000784Covtype581, 01254Poker1, 000, 00010RNA271, 6178Housing20, 6408Table 1: Description of data sets used in our experiments(b) MNIST train dataset: N 60000, d 784(a) MNIST test dataset: N 10000, d 784(c) Covtype dataset: N 581012, d 54. Brute force runs out of memoryFigure 2: Runtime and scalability results on datasets that do not satisfy 2d N propertydistance from the node to the neighbor.construction description in Section 3.2.2. Similar to the k-dWe follow a very similar approach for ball trees in Galoistree implementation in Galois, the ball tree implementationand skip the details due to space constraints. Both the kalso maintains a priority queue for searching the nearestd tree and ball tree use max spread among dimensions forneighbors of a given node and adds edges to the knngraphsplitting and thus have a very similar approach fo

sary step in data mining, computer vision, robotics, machine learning, and other research elds. The nearest neighbor, or in general, the knearest neighbor (kNN) graph of a data set is obtained by connecting each instance in the data set to its kclosest instances from the data set, where a distance metric de nes closeness. However, this .

Related Documents:

a nearest neighbor to x if min d(zi, x) d(&, Z) i 1, 2, * ** , n. (1) The nearest neighbor rule decides x belongs to the category e; of its nearest’ neighbor XL. A mistake is made if e:, # 8. Notice that the NN rule utilizes only the classifi

nearest hundred thousand. tf Round 93,206 to the nearest ten. Round 289,996 to the nearest ten thousand. 6 Round 279,996 to the nearest hundred. 7 Round 999,999 to the nearest ten. 3 Round 269,996 to the nearest thousand. 9 Round 219,996 to the nearest ten. 10 Round 10,876 to the nearest hundred. 1 1 R

The totality of these behaviors is the graph schema. Drawing a graph schema . The best way to represent a graph schema is, of course, a graph. This is how the graph schema looks for the classic Tinkerpop graph. Figure 2: Example graph schema shown as a property graph . The graph schema is pretty much a property-graph.

Oracle Database Spatial and Graph In-memory parallel graph analytics server (PGX) Load graph into memory for analysis . Command-line submission of graph queries Graph visualization tool APIs to update graph store : Graph Store In-Memory Graph Graph Analytics : Oracle Database Application : Shell, Zeppelin : Viz .

quence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation.

the test sample are selected. Using majority voting among K neighbors, class label for the test sample is found. For example with 1-nearest neighbor rule, if ω be the true class of a training sample, the predicted class of test sample x is set ωof its nearest neighbor, where m i is a nearest neighbor to x if the distance d(m i,x) min j {d(m j .

Supervised classification in ENVI Feature Extraction is an iterative process from the "nearest neighbor" algorithm based on K-nearest neighbor. The advantage of the approach "nearest neighbor" is that classes are Figure 4. Principle of multi-resolution segmentation modi- fied of [38].

2 FAMILY DAY 012 CONGREGATIONAL CHURCH OF CHRISTIAN FELLOWSHIP!"# %&' ()* ", O COME LET US ADORE HIM! 12.14.12 HERE AT CHRISTIAN FELLOWSHIP FEATURING THE SANCTUARY CHOIR AND SPECIAL GUESTS. GREETINGS From Rev. James K. McKnight photo by: Julian Murray 2010 WELCOME FAMILY AND FRIENDS! The psalmist wrote ÒI will bless the Lord at all times; His praise shall con - tinually be in my Mouth!Ó .