Blog

Using KD-Tree to navigate nodes around each other

Posted: April 23, 2024

I’m working for Greenstand Treetracker project in the last 4 monthes. In this project, we found a difficulty that the tree are scattered in some cases. For example, if $T_1, T_2, T_3, ..., T_k$ are scattered around a mountain. To find other trees around $T_1$ we must to zoom out and zoom in which might loss our direction.

To solve this problem, we first considering find the nearliest neighborhoods. However, we have more than 500K trees around the world which means if we compute the distance one by one, the time complexity should be $O(n^2)$ and it will cost a lot.

Review of BST

In algorithm course, we learned the Binary Search Tree(BST), which means if we search a sorted sequence, our search process can be viewed as a BST and the height of this BST is the time we need search for every node.

It’s easy to prove that, if our BST is enough balanced, the time complexity should be $O(nlogn)$ rather than $O(n^2)$ .

KD-Tree

Can we use the same idea to find nearliest neighborhoods in 2-d space or even higher space. Absolutely Yes!

Let us dive into KD-Tree.

For example, here we have multiple nodes, A(40,45) B(15,70) C(70,10) D(69,50) E(66,85) F(85,90). Differ from BST, KD-Tree or search in 2D plane needs to construct tree in 2 ways.

The pseduo code of KD-Tree construction should be as follows:

function build_kd_tree(points, depth)
    if points is empty
        return null
    
    axis = depth mod k // k is the number of dimensions
    sort points along axis // Sort points based on the axis
    
    median_index = length(points) / 2
    node = new Node()
    node.location = points[median_index]
    node.left = build_kd_tree(points[0:median_index], depth + 1)
    node.right = build_kd_tree(points[median_index+1:end], depth + 1)
    
    return node

Consider searching the kd tree for a record located at P=(69,50) . First compare P with the point stored at the root . If P matches the location of :math:A`, then the search is successful. In this example the positions do not match (A ‘s location (40, 45) is not the same as (69, 50)), so the search must continue. The x value of A is compared with that of P to determine in which direction to branch. Because Ax ’s value of 40 is less than P ’s x value of 69, we branch to the right subtree (all cities with x value greater than or equal to 40 are in the right subtree). Ay does not affect the decision on which way to branch at this level. At the second level, P does not match record C ’s position, so another branch must be taken. However, at this level we branch based on the relative y values of point P and record C (because 1mod2=1 , which corresponds to the y -coordinate). Because Cy ’s value of 10 is less than Py ’s value of 50, we branch to the right. At this point, P is compared against the position of D . A match is made and the search is successful.

Application in Greenstand Treetracker

To speed up the process of find nearliest neighborhoods of a given tree, we need to build up a KD-tree before search. There are several library can be used to build and search k-nearliest neighborhoods. For example, we can use kd-bush:

// initialize KDBush for 1000 items
const index = new KDBush(1000);

// fill it with 1000 points
for (const {x, y} of items) {
    index.add(x, y);
}

// perform the indexing
index.finish();

// make a bounding box query
const foundIds = index.range(minX, minY, maxX, maxY);

// map ids to original items
const foundItems = foundIds.map(i => items[i]);

// make a radius query
const neighborIds = index.within(x, y, 5);

// instantly transfer the index from a worker to the main thread
postMessage(index.data, [index.data]);

// reconstruct the index from a raw array buffer
const index = KDBush.from(e.data);

It’s the basic usage of this library. index.within. However, kdbush and original KD-Tree can not used to