Here we use k-d tree to implement the nearest neighbor search algorithm. The details about k-d tree won't be discussed here. We will only briefly talk about the algorithm. In order to do nearest neighbor search, we first construct a k-d tree given a set of points. The basic operation in constructing a k-d tree is to partition the points (or coordinates) along a particular axis. First we need to find the median point when all the points are projected to the axis. With this median point, the set of points can be partitioned into two subsets. Then we choose another axis and do the partition recursively on the subsets. The axes are chosen in a round-robin way. The process is stopped when all the subsets can't be further partitioned. As seen here, the key step here is to find the median among the points in the set (or subset). Here we will use a select-based algorithm to find the median.
After the k-d tree has been constructed, the next step is to do the query. Here we only give the algorithm to find the the nearest neighbor. First, we need to locate the query point q. This process is similar to a binary search based on q's coordinates. We stop when we encounter a leaf node. Then calculate the distance between this leaf node and q. This is the current best (nearest neighbor). However, we need to move forward since the current best may not be optimal. We just rewind the recursion, go to the parent of the leaf node. Calculate the distance between this (parent) node and q. If necessary , update the current best. We also need to see if there remains some nearer neighbors in the other plane divided by this (parent) node. This can be done by checking if the circle centered at q with the radius current best across the splitting line. If yes, we need to check the other branch of this node. Basically, we repeat previous steps, searching q in that subtree. If we are done with the (parent) node, we just keep going up, check the parent of this (parent) node, until we encounter the root node. The empirical performance for a NNS in 2-d tree is O(logn).
The php code is as follow:
<?php class KD_TreeNode{ function __construct($x, $y) { $this->x = $x; $this->y = $y; $this->axis = -1; $this->left = null; $this->right = null; } public $x; public $y; public $axis; public $left; public $right; } class KD_Tree{ //constructor function __construct(&$param) { if(is_array($param)) { $end = count($param) - 1; $this->root = KD_Tree::createTree($param, 0, $end, 0); } elseif(is_object($param)) { $this->root = $param; } } //@list here is an array of KD_TreeNode //@axis specifies partition by X axis or Y axis private static function partition_KD(&$list, $begin, $end, $pivot, $axis) { $val = $list[$pivot]; $tmp = $list[$end]; $list[$end] = $val; $list[$pivot] = $tmp; $toSwap = $begin; for($i = $begin; $i <= $end-1; $i++) { $cmp1 = 0; $cmp2 = 0; //partition by X if($axis == 0){ $cmp1 = $list[$i]->x; $cmp2 = $val->x; } //partition by Y if($axis == 1){ $cmp1 = $list[$i]->y; $cmp2 = $val->y; } if($cmp1 <= $cmp2) { $tmp = $list[$toSwap]; $list[$toSwap] = $list[$i]; $list[$i] = $tmp; $toSwap++; } } $tmp = $list[$end]; $list[$end] = $list[$toSwap]; $list[$toSwap] = $tmp; return $toSwap; } private static function select_KD(&$list, $begin, $end, $k, $axis) { $idx = 0; $target = 0; srand(); while(1) { $pivot = rand(0, $end-$begin) + $begin; $idx = KD_Tree::partition_KD($list, $begin, $end, $pivot, $axis); $target = $idx - $begin + 1; if($target == $k){ return $idx; } else if($k < $target){ $end = $idx - 1; } else{ $k = $k - $target; $begin = $idx + 1; } } } // create a KD_Tree given a list of KD_Tree nodes private static function createTree(&$list, $begin, $end, $depth) { // Select axis based on depth so that axis cycles // through all valid values $axis = $depth % 2; if($begin == $end){ $list[$begin]->axis = $axis; return $list[$begin]; } // Sort point list and choose median as pivot element $k = floor(($end - $begin)/2 + $begin) + 1; $selected = KD_Tree::select_KD($list, $begin, $end, $k, $axis); //assign the axis $list[$selected]->axis = $axis; if($selected-1 >= $begin){ $list[$selected]->left = KD_Tree::createTree($list, $begin, $selected-1, $depth+1); } if($end >= $selected+1){ $list[$selected]->right = KD_Tree::createTree($list, $selected+1, $end, $depth+1); } return $list[$selected]; } public function queryTree($query_node) { $selected_reference_value = 0; $selected_coordinate = 0; $cur_distance = -1; $cur_node = null; $ret = null; $stack = array(); $node = $this->root; while($node != null) { array_push($stack, $node); if($node->axis == 0){ $selected_reference_value = $node->x; $selected_coordinate = $query_node->x; }else{ $selected_reference_value = $node->y; $selected_coordinate = $query_node->y; } if($selected_coordinate <= $selected_reference_value){ if($node->left != null){ $node = $node->left; }else{ break; } }else{ if($node->right != null){ $node = $node->right; }else{ break; } } } //pop the stack while(count($stack) > 0) { $node = array_pop($stack); //calculate distance as current distance $distance = ($node->x - $query_node->x) * ($node->x - $query_node->x) + ($node->y - $query_node->y) * ($node->y - $query_node->y); if($cur_distance < 0){ $cur_distance = $distance; $cur_node = $node; } else{ if($cur_distance > $distance){ $cur_distance = $distance; $cur_node = $node; } } //if node is a leaf, we can just continure if($node->left == null && $node->right == null) continue; if($node->axis == 0){ $selected_reference_value = $node->x; $selected_coordinate = $query_node->x; }else{ $selected_reference_value = $node->y; $selected_coordinate = $query_node->y; } // evaluate if intersect $distance = ($selected_reference_value - $selected_coordinate) * ($selected_reference_value - $selected_coordinate) - $cur_distance; //intersect, then check the other branch if($distance < 0){ if($selected_coordinate <= $selected_reference_value){ if($node->right != null){ //visit its right $right_tree = new KD_Tree($node->right); $ret = $right_tree->queryTree($query_node); } }else{ if($node->left != null){ //visit its left $left_tree = new KD_Tree($node->left); $ret = $left_tree->queryTree($query_node); } } if($ret != null && $cur_distance > $ret[0]){ $cur_distance = $ret[0]; $cur_node = $ret[1]; } } } return array($cur_distance, $cur_node); } // the root of the tree public $root; } ?>
congratulations, good job.
ReplyDeleteDo you have any examples of how to use it?