Tuesday, August 14, 2012

The PHP implementation of Nearest Neighbor Search (用PHP实现最邻近搜索)

If you are trying to build a Location Based Service (LBS), it is highly likely that you need to do Nearest Neighbor Search (NNS). For example, find the restaurants close to my current location. Since PHP is frequently used in web development,  this post will give the PHP implementation of such algorithm.

Here we use k-d tree to implement the nearest neighbor search algorithm. The details about k-d tree  won't be discussed here. We will only briefly talk about the algorithm. In order to do nearest neighbor search, we first construct a k-d tree given a set of points. The basic operation in constructing a k-d tree is to  partition the points (or coordinates) along a particular axis. First we need to find the median point when all the points are projected to the axis. With this median point, the set of points can be partitioned into two subsets. Then we choose another axis and do the partition recursively on the subsets. The axes are chosen in a round-robin way. The process is stopped when all the subsets can't be further partitioned. As seen here, the key step here is to find the median among the points in the set (or subset). Here we will use a select-based algorithm to  find the median.

After the k-d tree has been constructed, the next step is to do the query. Here we only give the algorithm to find the the nearest neighbor. First, we need to locate the query point q. This process is similar to a binary search based on  q's coordinates. We stop when we encounter a leaf node. Then calculate the distance between this leaf node and  q. This is the current best (nearest neighbor). However, we need to move forward since the current best may not be optimal. We just rewind the recursion, go to the parent of the leaf node.  Calculate the distance between this (parent) node and  q. If necessary , update the current best. We also need to see if there remains some nearer neighbors in the other plane divided by this (parent) node. This can be done by checking if the circle centered at  q with the radius current best across the splitting line. If yes, we need to check the other branch of this node. Basically, we repeat previous steps, searching q in that subtree. If we are done with the (parent) node, we just keep going up, check the parent of this (parent) node, until we encounter the root node. The empirical performance for a NNS in 2-d tree is O(logn).

The php code is as follow:
<?php 

class KD_TreeNode{
 function __construct($x, $y) {
       $this->x = $x;
    $this->y = $y;
    $this->axis = -1;
    $this->left = null;
    $this->right = null;    
        } 
 
 public $x;
 public $y;
 public $axis;
 public $left;
 public $right; 
}

class KD_Tree{
 //constructor
 function __construct(&$param) {
  if(is_array($param)) {
   $end = count($param) - 1;
   $this->root = KD_Tree::createTree($param, 0, $end, 0);
  } elseif(is_object($param)) {
   $this->root = $param;
  }               
    } 
 
 //@list here is an array of KD_TreeNode
 //@axis specifies partition by X axis or Y axis
 private static function partition_KD(&$list, $begin, 
                                      $end, $pivot, $axis)
 {
  $val = $list[$pivot];
  $tmp = $list[$end];
  $list[$end] = $val;
  $list[$pivot] = $tmp;
 
  $toSwap = $begin;
  for($i = $begin; $i <= $end-1; $i++)
  {
   $cmp1 = 0;
   $cmp2 = 0;
   //partition by X
   if($axis == 0){
    $cmp1 = $list[$i]->x;
    $cmp2 = $val->x;
   }
   //partition by Y
   if($axis == 1){
    $cmp1 = $list[$i]->y;
    $cmp2 = $val->y;
   }   
   
   if($cmp1 <= $cmp2)
   {
    $tmp = $list[$toSwap];
    $list[$toSwap] = $list[$i];
    $list[$i] = $tmp;
    $toSwap++;
   }   
  }
 
  $tmp = $list[$end];
  $list[$end] = $list[$toSwap];
  $list[$toSwap] = $tmp; 
   
  return $toSwap;
 }
 
 private static function select_KD(&$list, $begin, $end, $k, $axis)
 {
  $idx = 0;
  $target = 0;
  srand();
  while(1)
  {
   $pivot = rand(0, $end-$begin) + $begin;
   $idx = KD_Tree::partition_KD($list, $begin, $end, 
                                $pivot, $axis);
   $target = $idx - $begin + 1;
  
   if($target == $k){
    return $idx;
   }
   else if($k < $target){
    $end = $idx - 1;
   }
   else{
    $k = $k - $target;
    $begin = $idx + 1;
   }
  }
 }
 
 // create a KD_Tree given a list of KD_Tree nodes
 private static function createTree(&$list, $begin, $end, $depth)
 {
  // Select axis based on depth so that axis cycles 
                // through all valid values 
  $axis = $depth % 2; 
 
  if($begin == $end){
   $list[$begin]->axis = $axis;
   return $list[$begin];
  } 
        
  // Sort point list and choose median as pivot element
  $k = floor(($end - $begin)/2 + $begin) + 1;
  $selected = KD_Tree::select_KD($list, $begin, $end, $k, $axis);  
  //assign the axis
  $list[$selected]->axis = $axis; 
  if($selected-1 >= $begin){
   $list[$selected]->left = 
         KD_Tree::createTree($list, $begin, $selected-1, $depth+1);
  }
  if($end >= $selected+1){
   $list[$selected]->right = 
         KD_Tree::createTree($list, $selected+1, $end, $depth+1);
  }
  
  return $list[$selected];
 }
 
 public function queryTree($query_node)
 {
  $selected_reference_value = 0;
  $selected_coordinate = 0; 
  $cur_distance = -1;
  $cur_node = null;
  $ret = null; 
 
  $stack = array(); 
  $node = $this->root; 
  while($node != null)
  {
   array_push($stack, $node);
  
   if($node->axis == 0){
    $selected_reference_value = $node->x;
    $selected_coordinate = $query_node->x;
   }else{
    $selected_reference_value = $node->y;
    $selected_coordinate = $query_node->y;
   }
   if($selected_coordinate <= $selected_reference_value){
    if($node->left != null){
     $node = $node->left;
    }else{
     break;
    }
   }else{
    if($node->right != null){
     $node = $node->right;
    }else{
     break;
    }
   }
  }
 
  //pop the stack
  while(count($stack) > 0)
  {
   $node = array_pop($stack);
   //calculate distance as current distance 
   $distance = ($node->x - $query_node->x) 
               * ($node->x - $query_node->x) 
               + ($node->y - $query_node->y) 
               * ($node->y - $query_node->y);
  
   if($cur_distance < 0){
    $cur_distance = $distance;
    $cur_node = $node;
   }
   else{
    if($cur_distance > $distance){
     $cur_distance = $distance;
     $cur_node = $node;
    }
   }
  
   //if node is a leaf, we can just continure
   if($node->left == null && $node->right == null)
    continue;
  
   if($node->axis == 0){
    $selected_reference_value = $node->x;
    $selected_coordinate = $query_node->x;
   }else{
    $selected_reference_value = $node->y;
    $selected_coordinate = $query_node->y;
   }
   // evaluate if intersect
   $distance = ($selected_reference_value 
               - $selected_coordinate) 
               * ($selected_reference_value 
               - $selected_coordinate)
               - $cur_distance;
   //intersect, then check the other branch
   if($distance < 0){
    if($selected_coordinate <= $selected_reference_value){
     if($node->right != null){
      //visit its right   
      $right_tree = new KD_Tree($node->right);
      $ret = $right_tree->queryTree($query_node);
    }
    }else{
     if($node->left != null){
      //visit its left   
      $left_tree = new KD_Tree($node->left);
      $ret = $left_tree->queryTree($query_node); 
     }
    }
   
    if($ret != null && $cur_distance > $ret[0]){
     $cur_distance = $ret[0]; 
     $cur_node = $ret[1];
    }
   }  
  }
 
  return array($cur_distance, $cur_node); 
 }
 
 
 // the root of the tree
 public $root;

}

?>


1 comment:

  1. congratulations, good job.
    Do you have any examples of how to use it?

    ReplyDelete