from scipy import stats
import numpy as np
from moraine.utils_ import is_cuda_available
if is_cuda_available():
import cupy as cp
import itertools
shp
Kolmogorov-Smirnov (KS) two-sample test
ks_test
ks_test (rmli:numpy.ndarray, az_half_win:int, r_half_win:int, block_size:int=128, return_dist:bool=False)
SHP identification based on Two-Sample Kolmogorov-Smirnov Test.
Type | Default | Details | |
---|---|---|---|
rmli | ndarray | the rmli stack, dtype: cupy.floating | |
az_half_win | int | SHP identification half search window size in azimuth direction | |
r_half_win | int | SHP identification half search window size in range direction | |
block_size | int | 128 | the CUDA block size, it only affects the calculation speed, only applied if input is cupy.ndarray |
return_dist | bool | False | if return the KS test statistics dist |
Returns | tuple | if return_dist == True, return dist and p value p . Otherwise, only p is returned. |
The ks_test
function apply the Two-Sample Kolmogorov-Smirnov Test on a stack of rmli images to identify SHPs candidate for further processing. This method is originally published in (Ferretti et al. 2011). This function is designed to run on GPU for high speed.
The rmli
is a three dimentional cupy/numpy ndarray
. The dtype
should be float
. From outerest to innerest, the three dimentions are azimuth, range and image. For each pixel P, a search window centered at P is defined by az_half_win
and r_half_win
. All pixels in this search window is compared with P by KS test. They are refered here as secondary pixels. The total number of secondary pixels (including P) is (2*az_half_win
+1)*(2*r_half_win
+1).
The returns are the ks test statistic which is the maximum value of the absolute difference between the emperical cumulative distribution functions of the two samples, and p value. Both of them are 4 dimentional cupy/numpy ndarrays. From outerest ot innerest, they are azimuth, range, secondary pixel relative azimuth, secondary pixel relative range. For P at the corner of the image where part of the search window is out of the image, the result is nan
.
Here is a simplest example. First simulate rmli time series of two pixels from two correlated normal distributions:
= 20
sample_size = np.random.default_rng()
rng = stats.uniform.rvs(size=sample_size, random_state=rng).astype(np.float32)
sample1 = stats.norm.rvs(size=sample_size, random_state=rng).astype(np.float32)
sample2
= np.stack((np.asarray(sample1), np.asarray(sample2))).reshape(1,2,sample_size)
rmli_stack rmli_stack.shape
(1, 2, 20)
The shape of rmli_stack
shows it contains 20 images. Each of the image has 1 pixel in azimuth dimention and 2 pixels in range dimention. Set the az_half_win
and r_half_win
to 1 and apply the ks_test
function:
= ks_test(rmli_stack,1,1,return_dist=True)
dist,p print(dist.shape)
print(dist)
(1, 2, 3, 3)
[[[[ nan nan nan]
[ nan 0. 0.55]
[ nan nan nan]]
[[ nan nan nan]
[0.55 0. nan]
[ nan nan nan]]]]
dist
is the ks test statistic. The shape of it shows for each pixel P in this 1*2
image, a 3*3
search window is defined and all pixels in this search window is test with P. The value 0
in dist
is the ks test result of pixel P and pixel P itself. The value nan
means the secondary pixel is out of the image and no ks test is applied.
print(p.shape)
print(p)
(1, 2, 3, 3)
[[[[ nan nan nan]
[ nan 0. 0.00257061]
[ nan nan nan]]
[[ nan nan nan]
[0.00257061 0. nan]
[ nan nan nan]]]]
p
is the ks test p value with same shape of dist
.
print(stats.ks_2samp(sample1, sample2,method='asymp'))
KstestResult(statistic=0.5499999999999999, pvalue=0.002280510321484379, statistic_location=0.1234892, statistic_sign=-1)
By comparing the result of ks_test
and ks_2samp
from scipy
, the statistics are same which prove the correctness of ks_test
. The difference in p value is because the approcimation method used are different but the orders of magnitudes are consistent.
ks_test
also accept cupy array:
if is_cuda_available():
= cp.asarray(rmli_stack)
rmli_stack_cp = ks_test(rmli_stack_cp,1,1,return_dist=True)
dist_cp,p_cp
np.testing.assert_array_equal(dist,cp.asnumpy(dist_cp)) np.testing.assert_array_almost_equal(p,cp.asnumpy(p_cp))
select_shp
select_shp (p, p_max)
Details | |
---|---|
p | 4D (n_az,n_r,az_win,r_win) |
p_max |
= np.random.rand(100*100*5*5).astype(np.float32)
p = np.random.choice(100*100*5*5,size=1000,replace=False)
gix = np.nan
p[gix] = p.reshape(100,100,5,5)
p = (p+np.transpose(p,axes=(0,1,3,2)))/2
p for i in range(5):
= 1 p[:,:,i,i]
= select_shp(p, 0.05)
is_shp, shp_num < 0.05)
np.testing.assert_array_equal(is_shp,p< 0.05, axis=(-2,-1)).astype(np.int32)) np.testing.assert_array_equal(shp_num, np.count_nonzero(p