Wednesday, September 5, 2012

Use Python to Access (Windows Azure) Bing Search API

As discussed in previous post, Microsoft had updated its Bing Search API and integrate it into the Windows Azure Marketplace.  But how to access the Search API in your web apps? In this doc, there are examples in Visual Basic and PHP. However, what if you like to use Python? There are a few guides on the Internet about Python wrapper on  Bing Search API, but many of them are out-dated (they are for old APIs). For the rest that work, many of them require you to install certain package, which is annoying sometimes. Actually, the python code to access Bing Search API is quite simple and the key is to do authentication.

The following shows an example:
import urllib2

# create credential for authentication
user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; FDM; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 1.1.4322)'
key= 'YOUR ACCOUNT KEY'
creds = (':%s' % key).encode('base64')[:-1]
auth = 'Basic %s' % creds

request = urllib2.Request('https://api.datamarket.azure.com/Data.ashx/Bing/Search/Web?Query=%27test%27&$top=5&$format=json')
request.add_header('Authorization', auth)
request.add_header('User-Agent', user_agent)

requestor = urllib2.build_opener()
result = requestor.open(request) 
print result.read()

Here, 'key' is your Account key in Windows Azure. You can find it here. You can replace the url used in urllib2.Request() to search what you want.

Tuesday, August 28, 2012

Using Bing Search API (使用 Bing API)

*updated: Bing had changed its api without updating the documentation. Now we need to use something like "https://api.datamarket.azure.com/Data.ashx/Bing/Search/v1/Web?Query=..."

Bing Search API had been moved to Windows Azure Marketplace. The way to use it also slightly changed (e.g., no ApplicationID any more). The perquisites to use Bing Search API is to have a account key in Windows Azure Marketplace. You may need to register in the Marketplace first.

After having the account key, using Bing Search API is straightforward. For example, if we want to search "iphone5", we can do
https://api.datamarket.azure.com/Bing/Search/Web/?Query=%27iphone5%27
 https://api.datamarket.azure.com/Data.ashx/Bing/Search/v1/Web?Query=%27iphone5%27
If you want top10 results, do
https://api.datamarket.azure.com/Bing/Search/Web/?Query=%27iphone5%27&$top=10 
https://api.datamarket.azure.com/Data.ashx/Bing/Search/v1/Web?
Query=%27iphone5%27&$top=10  
If you further want results in format in JSON, do
https://api.datamarket.azure.com/Bing/Search/Web/?Query=%27iphone5%27&$top=10&$format=JSON
https://api.datamarket.azure.com/Data.ashx/Bing/Search/v1/Web
?Query=%27iphone5%27&$top=10&$format=JSON 
You may wonder where the authentication happens since we no longer have an ApplicationID in the url. When you type in the search url, the browser will automatically pop up a dialog asking for user name and password. Leaving the user name blank and type your account key as password. Then you can see the search results right away.

What if you want to search for images? just do
https://api.datamarket.azure.com/Bing/Search/Image/?Query=%27iphone5%27
https://api.datamarket.azure.com/Data.ashx/Bing/Search/v1/Image?Query=%27iphone5%27 
 This post only introduces some basic use of Bing Search API. For more uses, please look at this link.

Tuesday, August 14, 2012

The PHP implementation of Nearest Neighbor Search (用PHP实现最邻近搜索)

If you are trying to build a Location Based Service (LBS), it is highly likely that you need to do Nearest Neighbor Search (NNS). For example, find the restaurants close to my current location. Since PHP is frequently used in web development,  this post will give the PHP implementation of such algorithm.

Here we use k-d tree to implement the nearest neighbor search algorithm. The details about k-d tree  won't be discussed here. We will only briefly talk about the algorithm. In order to do nearest neighbor search, we first construct a k-d tree given a set of points. The basic operation in constructing a k-d tree is to  partition the points (or coordinates) along a particular axis. First we need to find the median point when all the points are projected to the axis. With this median point, the set of points can be partitioned into two subsets. Then we choose another axis and do the partition recursively on the subsets. The axes are chosen in a round-robin way. The process is stopped when all the subsets can't be further partitioned. As seen here, the key step here is to find the median among the points in the set (or subset). Here we will use a select-based algorithm to  find the median.

After the k-d tree has been constructed, the next step is to do the query. Here we only give the algorithm to find the the nearest neighbor. First, we need to locate the query point q. This process is similar to a binary search based on  q's coordinates. We stop when we encounter a leaf node. Then calculate the distance between this leaf node and  q. This is the current best (nearest neighbor). However, we need to move forward since the current best may not be optimal. We just rewind the recursion, go to the parent of the leaf node.  Calculate the distance between this (parent) node and  q. If necessary , update the current best. We also need to see if there remains some nearer neighbors in the other plane divided by this (parent) node. This can be done by checking if the circle centered at  q with the radius current best across the splitting line. If yes, we need to check the other branch of this node. Basically, we repeat previous steps, searching q in that subtree. If we are done with the (parent) node, we just keep going up, check the parent of this (parent) node, until we encounter the root node. The empirical performance for a NNS in 2-d tree is O(logn).

The php code is as follow:
<?php 

class KD_TreeNode{
 function __construct($x, $y) {
       $this->x = $x;
    $this->y = $y;
    $this->axis = -1;
    $this->left = null;
    $this->right = null;    
        } 
 
 public $x;
 public $y;
 public $axis;
 public $left;
 public $right; 
}

class KD_Tree{
 //constructor
 function __construct(&$param) {
  if(is_array($param)) {
   $end = count($param) - 1;
   $this->root = KD_Tree::createTree($param, 0, $end, 0);
  } elseif(is_object($param)) {
   $this->root = $param;
  }               
    } 
 
 //@list here is an array of KD_TreeNode
 //@axis specifies partition by X axis or Y axis
 private static function partition_KD(&$list, $begin, 
                                      $end, $pivot, $axis)
 {
  $val = $list[$pivot];
  $tmp = $list[$end];
  $list[$end] = $val;
  $list[$pivot] = $tmp;
 
  $toSwap = $begin;
  for($i = $begin; $i <= $end-1; $i++)
  {
   $cmp1 = 0;
   $cmp2 = 0;
   //partition by X
   if($axis == 0){
    $cmp1 = $list[$i]->x;
    $cmp2 = $val->x;
   }
   //partition by Y
   if($axis == 1){
    $cmp1 = $list[$i]->y;
    $cmp2 = $val->y;
   }   
   
   if($cmp1 <= $cmp2)
   {
    $tmp = $list[$toSwap];
    $list[$toSwap] = $list[$i];
    $list[$i] = $tmp;
    $toSwap++;
   }   
  }
 
  $tmp = $list[$end];
  $list[$end] = $list[$toSwap];
  $list[$toSwap] = $tmp; 
   
  return $toSwap;
 }
 
 private static function select_KD(&$list, $begin, $end, $k, $axis)
 {
  $idx = 0;
  $target = 0;
  srand();
  while(1)
  {
   $pivot = rand(0, $end-$begin) + $begin;
   $idx = KD_Tree::partition_KD($list, $begin, $end, 
                                $pivot, $axis);
   $target = $idx - $begin + 1;
  
   if($target == $k){
    return $idx;
   }
   else if($k < $target){
    $end = $idx - 1;
   }
   else{
    $k = $k - $target;
    $begin = $idx + 1;
   }
  }
 }
 
 // create a KD_Tree given a list of KD_Tree nodes
 private static function createTree(&$list, $begin, $end, $depth)
 {
  // Select axis based on depth so that axis cycles 
                // through all valid values 
  $axis = $depth % 2; 
 
  if($begin == $end){
   $list[$begin]->axis = $axis;
   return $list[$begin];
  } 
        
  // Sort point list and choose median as pivot element
  $k = floor(($end - $begin)/2 + $begin) + 1;
  $selected = KD_Tree::select_KD($list, $begin, $end, $k, $axis);  
  //assign the axis
  $list[$selected]->axis = $axis; 
  if($selected-1 >= $begin){
   $list[$selected]->left = 
         KD_Tree::createTree($list, $begin, $selected-1, $depth+1);
  }
  if($end >= $selected+1){
   $list[$selected]->right = 
         KD_Tree::createTree($list, $selected+1, $end, $depth+1);
  }
  
  return $list[$selected];
 }
 
 public function queryTree($query_node)
 {
  $selected_reference_value = 0;
  $selected_coordinate = 0; 
  $cur_distance = -1;
  $cur_node = null;
  $ret = null; 
 
  $stack = array(); 
  $node = $this->root; 
  while($node != null)
  {
   array_push($stack, $node);
  
   if($node->axis == 0){
    $selected_reference_value = $node->x;
    $selected_coordinate = $query_node->x;
   }else{
    $selected_reference_value = $node->y;
    $selected_coordinate = $query_node->y;
   }
   if($selected_coordinate <= $selected_reference_value){
    if($node->left != null){
     $node = $node->left;
    }else{
     break;
    }
   }else{
    if($node->right != null){
     $node = $node->right;
    }else{
     break;
    }
   }
  }
 
  //pop the stack
  while(count($stack) > 0)
  {
   $node = array_pop($stack);
   //calculate distance as current distance 
   $distance = ($node->x - $query_node->x) 
               * ($node->x - $query_node->x) 
               + ($node->y - $query_node->y) 
               * ($node->y - $query_node->y);
  
   if($cur_distance < 0){
    $cur_distance = $distance;
    $cur_node = $node;
   }
   else{
    if($cur_distance > $distance){
     $cur_distance = $distance;
     $cur_node = $node;
    }
   }
  
   //if node is a leaf, we can just continure
   if($node->left == null && $node->right == null)
    continue;
  
   if($node->axis == 0){
    $selected_reference_value = $node->x;
    $selected_coordinate = $query_node->x;
   }else{
    $selected_reference_value = $node->y;
    $selected_coordinate = $query_node->y;
   }
   // evaluate if intersect
   $distance = ($selected_reference_value 
               - $selected_coordinate) 
               * ($selected_reference_value 
               - $selected_coordinate)
               - $cur_distance;
   //intersect, then check the other branch
   if($distance < 0){
    if($selected_coordinate <= $selected_reference_value){
     if($node->right != null){
      //visit its right   
      $right_tree = new KD_Tree($node->right);
      $ret = $right_tree->queryTree($query_node);
    }
    }else{
     if($node->left != null){
      //visit its left   
      $left_tree = new KD_Tree($node->left);
      $ret = $left_tree->queryTree($query_node); 
     }
    }
   
    if($ret != null && $cur_distance > $ret[0]){
     $cur_distance = $ret[0]; 
     $cur_node = $ret[1];
    }
   }  
  }
 
  return array($cur_distance, $cur_node); 
 }
 
 
 // the root of the tree
 public $root;

}

?>