shp

Spatially Homogenious Pixels Identification
from scipy import stats
import numpy as np
from moraine.utils_ import is_cuda_available
if is_cuda_available():
    import cupy as cp
import itertools

Kolmogorov-Smirnov (KS) two-sample test


source

ks_test

 ks_test (rmli:numpy.ndarray, az_half_win:int, r_half_win:int,
          block_size:int=128, return_dist:bool=False)

SHP identification based on Two-Sample Kolmogorov-Smirnov Test.

Type Default Details
rmli ndarray the rmli stack, dtype: cupy.floating
az_half_win int SHP identification half search window size in azimuth direction
r_half_win int SHP identification half search window size in range direction
block_size int 128 the CUDA block size, it only affects the calculation speed, only applied if input is cupy.ndarray
return_dist bool False if return the KS test statistics dist
Returns tuple if return_dist == True, return dist and p value p. Otherwise, only p is returned.

The ks_test function apply the Two-Sample Kolmogorov-Smirnov Test on a stack of rmli images to identify SHPs candidate for further processing. This method is originally published in (Ferretti et al. 2011). This function is designed to run on GPU for high speed.

Ferretti, Alessandro, Alfio Fumagalli, Fabrizio Novali, Claudio Prati, Fabio Rocca, and Alessio Rucci. 2011. “A New Algorithm for Processing Interferometric Data-Stacks: SqueeSAR.” IEEE Transactions on Geoscience and Remote Sensing 49 (9): 3460–70. https://doi.org/10.1109/TGRS.2011.2124465.

The rmli is a three dimentional cupy/numpy ndarray. The dtype should be float. From outerest to innerest, the three dimentions are azimuth, range and image. For each pixel P, a search window centered at P is defined by az_half_win and r_half_win. All pixels in this search window is compared with P by KS test. They are refered here as secondary pixels. The total number of secondary pixels (including P) is (2*az_half_win+1)*(2*r_half_win+1).

The returns are the ks test statistic which is the maximum value of the absolute difference between the emperical cumulative distribution functions of the two samples, and p value. Both of them are 4 dimentional cupy/numpy ndarrays. From outerest ot innerest, they are azimuth, range, secondary pixel relative azimuth, secondary pixel relative range. For P at the corner of the image where part of the search window is out of the image, the result is nan.

Here is a simplest example. First simulate rmli time series of two pixels from two correlated normal distributions:

sample_size = 20
rng = np.random.default_rng()
sample1 = stats.uniform.rvs(size=sample_size, random_state=rng).astype(np.float32)
sample2 = stats.norm.rvs(size=sample_size, random_state=rng).astype(np.float32)

rmli_stack = np.stack((np.asarray(sample1), np.asarray(sample2))).reshape(1,2,sample_size)
rmli_stack.shape
(1, 2, 20)

The shape of rmli_stack shows it contains 20 images. Each of the image has 1 pixel in azimuth dimention and 2 pixels in range dimention. Set the az_half_win and r_half_win to 1 and apply the ks_test function:

dist,p = ks_test(rmli_stack,1,1,return_dist=True)
print(dist.shape)
print(dist)
(1, 2, 3, 3)
[[[[ nan  nan  nan]
   [ nan 0.   0.55]
   [ nan  nan  nan]]

  [[ nan  nan  nan]
   [0.55 0.    nan]
   [ nan  nan  nan]]]]

dist is the ks test statistic. The shape of it shows for each pixel P in this 1*2 image, a 3*3 search window is defined and all pixels in this search window is test with P. The value 0 in dist is the ks test result of pixel P and pixel P itself. The value nan means the secondary pixel is out of the image and no ks test is applied.

print(p.shape)
print(p)
(1, 2, 3, 3)
[[[[       nan        nan        nan]
   [       nan 0.         0.00257061]
   [       nan        nan        nan]]

  [[       nan        nan        nan]
   [0.00257061 0.                nan]
   [       nan        nan        nan]]]]

p is the ks test p value with same shape of dist.

print(stats.ks_2samp(sample1, sample2,method='asymp'))
KstestResult(statistic=0.5499999999999999, pvalue=0.002280510321484379, statistic_location=0.1234892, statistic_sign=-1)

By comparing the result of ks_test and ks_2samp from scipy, the statistics are same which prove the correctness of ks_test. The difference in p value is because the approcimation method used are different but the orders of magnitudes are consistent.

ks_test also accept cupy array:

if is_cuda_available():
    rmli_stack_cp = cp.asarray(rmli_stack)
    dist_cp,p_cp = ks_test(rmli_stack_cp,1,1,return_dist=True)
    np.testing.assert_array_equal(dist,cp.asnumpy(dist_cp))
    np.testing.assert_array_almost_equal(p,cp.asnumpy(p_cp))

source

select_shp

 select_shp (p, p_max)
Details
p 4D (n_az,n_r,az_win,r_win)
p_max
p = np.random.rand(100*100*5*5).astype(np.float32)
gix = np.random.choice(100*100*5*5,size=1000,replace=False)
p[gix] = np.nan
p = p.reshape(100,100,5,5)
p = (p+np.transpose(p,axes=(0,1,3,2)))/2
for i in range(5):
    p[:,:,i,i] = 1
is_shp, shp_num = select_shp(p, 0.05)
np.testing.assert_array_equal(is_shp,p< 0.05)
np.testing.assert_array_equal(shp_num, np.count_nonzero(p < 0.05, axis=(-2,-1)).astype(np.int32))