rtree

Rectangle tree for fast point cloud data query

import moraine as mr

source

HilbertRtree

 HilbertRtree (bounds_tree:numpy.ndarray, n_points:int, page_size=512)

*This class provides a read-only Hilbert R-tree for spatial indexing.

See Hilbert_R-tree for more info on the Hilbert R-tree.

This implementation stores the R-tree as an array representation of a binary tree. See Binary_tree for more info on the array representation of a binary tree.*

	Type	Default	Details
bounds_tree	ndarray		an array representation of a binary tree. shape [n_pages, 4]. Every row is a bounding box of a node: [x0, y0, xm, ym]
n_points	int		number of total points
page_size	int	512	number of points in every leaf node.

source

HilbertRtree.build

 HilbertRtree.build (x:numpy.ndarray, y:numpy.ndarray, page_size=512)

classmethod to build a HilbertRtree.

	Type	Default	Details
x	ndarray		x coordinates, e.g., lon, shape [n_points,]
y	ndarray		y coordinates, e.g., lat, shape [n_points,]
page_size	int	512	number of points in every leaf node. should be same as pc_chunk_size.

source

HilbertRtree.save

 HilbertRtree.save (path:str)

Save the HilbertRtree.

	Type	Details
path	str	zarr path

source

HilbertRtree.load

 HilbertRtree.load (zarr_path:str)

classmethod to load the saved HilbertRtree.

	Type	Details
zarr_path	str	zarr path

source

HilbertRtree.bbox_query

 HilbertRtree.bbox_query (bounds:tuple|list,
                          x:numpy.ndarray|zarr.core.Array,
                          y:numpy.ndarray|zarr.core.Array)

get the index of points that are in the bounding box.

	Type	Details
bounds	tuple \| list	data query bounding box, [x0, y0, xm, ym]
x	numpy.ndarray \| zarr.core.Array	x coordinate
y	numpy.ndarray \| zarr.core.Array	y coordinate

Example: create the data:

gix = np.random.randint(0, 10000*10000-1, size=1000000)
gix.sort()
gix = np.stack(np.unravel_index(gix,shape=(10000,10000)),axis=-1).astype(np.int32)
hix = mr.pc_hix(gix,shape=(10000,10000))
key = mr.pc_sort(hix)

x = gix[:,1][key]/10000; y = gix[:,0][key]/10000 # any kind of coordinates
print(x.min(),y.min(),x.max(),y.max())

0.0 0.0 0.9999 0.9999

build r tree:

rtree = HilbertRtree.build(x,y,page_size=4096)

data query:

x0, y0, xm, ym = 0.9, 0.9, 1.0, 1.0
bounds = [x0, y0, xm, ym]
q_idx = rtree.bbox_query(bounds,x,y)

compare with the brute-force check:

q_idx_ = np.where((x>x0) & (x<xm) & (y>y0) & (y<ym))[0]
np.testing.assert_array_equal(q_idx, q_idx_)

speed compare:

q_idx = rtree.bbox_query(bounds,x,y)

567 µs ± 3.24 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

q_idx_ = np.where((x>x0) & (x<xm) & (y>y0) & (y<ym))[0]

1.24 ms ± 32 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

HilbertRtree is designed for query small amount of data from a big dataset. The bigger the dataset, the greater the improvement.

for data on the disk:

x_zarr = zarr.open('rtree/x.zarr','w',shape=x.shape,chunks=(4096,),dtype=x.dtype)
y_zarr = zarr.open('rtree/y.zarr','w',shape=x.shape,chunks=(4096,),dtype=x.dtype)
x_zarr[:] = x; y_zarr[:] = y

x_zarr = zarr.open('rtree/x.zarr','r')
y_zarr = zarr.open('rtree/y.zarr','r')

q_idx = rtree.bbox_query(bounds,x_zarr,y_zarr)

5.03 ms ± 72.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

x = x_zarr[:]; y = y_zarr[:]
idx = np.where((x>x0) & (x<xm) & (y>y0) & (y<ym))[0]

234 ms ± 3.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)