rtree

Rectangle tree for fast point cloud data query
import moraine as mr

source

HilbertRtree

 HilbertRtree (bounds_tree:numpy.ndarray, n_points:int, page_size=512)

*This class provides a read-only Hilbert R-tree for spatial indexing.

See Hilbert_R-tree for more info on the Hilbert R-tree.

This implementation stores the R-tree as an array representation of a binary tree. See Binary_tree for more info on the array representation of a binary tree.*

Type Default Details
bounds_tree ndarray an array representation of a binary tree. shape [n_pages, 4]. Every row is a bounding box of a node: [x0, y0, xm, ym]
n_points int number of total points
page_size int 512 number of points in every leaf node.

source

HilbertRtree.build

 HilbertRtree.build (x:numpy.ndarray, y:numpy.ndarray, page_size=512)

classmethod to build a HilbertRtree.

Type Default Details
x ndarray x coordinates, e.g., lon, shape [n_points,]
y ndarray y coordinates, e.g., lat, shape [n_points,]
page_size int 512 number of points in every leaf node. should be same as pc_chunk_size.

source

HilbertRtree.save

 HilbertRtree.save (path:str)

Save the HilbertRtree.

Type Details
path str zarr path

source

HilbertRtree.load

 HilbertRtree.load (zarr_path:str)

classmethod to load the saved HilbertRtree.

Type Details
zarr_path str zarr path

source

HilbertRtree.bbox_query

 HilbertRtree.bbox_query (bounds:tuple|list,
                          x:numpy.ndarray|zarr.core.Array,
                          y:numpy.ndarray|zarr.core.Array)

get the index of points that are in the bounding box.

Type Details
bounds tuple | list data query bounding box, [x0, y0, xm, ym]
x numpy.ndarray | zarr.core.Array x coordinate
y numpy.ndarray | zarr.core.Array y coordinate

Example: create the data:

gix = np.random.randint(0, 10000*10000-1, size=1000000)
gix.sort()
gix = np.stack(np.unravel_index(gix,shape=(10000,10000)),axis=-1).astype(np.int32)
hix = mr.pc_hix(gix,shape=(10000,10000))
key = mr.pc_sort(hix)

x = gix[:,1][key]/10000; y = gix[:,0][key]/10000 # any kind of coordinates
print(x.min(),y.min(),x.max(),y.max())
0.0 0.0 0.9999 0.9999

build r tree:

rtree = HilbertRtree.build(x,y,page_size=4096)

data query:

x0, y0, xm, ym = 0.9, 0.9, 1.0, 1.0
bounds = [x0, y0, xm, ym]
q_idx = rtree.bbox_query(bounds,x,y)

compare with the brute-force check:

q_idx_ = np.where((x>x0) & (x<xm) & (y>y0) & (y<ym))[0]
np.testing.assert_array_equal(q_idx, q_idx_)

speed compare:

q_idx = rtree.bbox_query(bounds,x,y)
426 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
q_idx_ = np.where((x>x0) & (x<xm) & (y>y0) & (y<ym))[0]
1.22 ms ± 3.39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

HilbertRtree is designed for query small amount of data from a big dataset. The bigger the dataset, the greater the improvement.

for data on the disk:

x_zarr = zarr.open('rtree/x.zarr','w',shape=x.shape,chunks=(4096,),dtype=x.dtype)
y_zarr = zarr.open('rtree/y.zarr','w',shape=x.shape,chunks=(4096,),dtype=x.dtype)
x_zarr[:] = x; y_zarr[:] = y
x_zarr = zarr.open('rtree/x.zarr','r')
y_zarr = zarr.open('rtree/y.zarr','r')
q_idx = rtree.bbox_query(bounds,x_zarr,y_zarr)
4.6 ms ± 46.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
x = x_zarr[:]; y = y_zarr[:]
idx = np.where((x>x0) & (x<xm) & (y>y0) & (y<ym))[0]
215 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)