classix.merging.distance_merging
- classix.merging.distance_merging(data, labels, splist, radius, minPts, scale, sort_vals, half_nrm2)[source]
Implement CLASSIX’s merging with early stopping and BLAS routines
Parameters
- datanumpy.ndarray
The input that is array-like of shape (n_samples,).
- labelslist
aggregation labels
- splistnumpy.ndarray
Represent the list of starting points information formed in the aggregation. list of [ starting point index of current group, sorting values, and number of group elements ].
- radiusfloat
The tolerance to control the aggregation. If the distance between the starting point of a group and another data point is less than or equal to the tolerance, the point is allocated to that group. For details, we refer users to [1].
- minPtsint
The threshold, in the range of [0, infity] to determine the noise degree. When assign it 0, algorithm won’t check noises.
- scalefloat
Design for distance-clustering, when distance between the two starting points associated with two distinct groups smaller than scale*radius, then the two groups merge.
- sort_valsnumpy.ndarray
Sorting values.
- half_nrm2numpy.ndarray
Precomputed values for distance computation.
Returns
- labelsnumpy.ndarray
The merging labels.
- old_cluster_countint
The number of clusters without outliers elimination.
- SIZE_NOISE_LABELSint
The number of clusters marked as outliers.
References
[1] X. Chen and S. Güttel. Fast and explainable sorted based clustering, 2022