classix.merging.distance_merging

classix.merging.distance_merging(data, labels, splist, radius, minPts, scale, sort_vals, half_nrm2)[source]

Implement CLASSIX’s merging with early stopping and BLAS routines

Parameters

datanumpy.ndarray

The input that is array-like of shape (n_samples,).

labelslist

aggregation labels

splistnumpy.ndarray

Represent the list of starting points information formed in the aggregation. list of [ starting point index of current group, sorting values, and number of group elements ].

radiusfloat

The tolerance to control the aggregation. If the distance between the starting point of a group and another data point is less than or equal to the tolerance, the point is allocated to that group. For details, we refer users to [1].

minPtsint

The threshold, in the range of [0, infity] to determine the noise degree. When assign it 0, algorithm won’t check noises.

scalefloat

Design for distance-clustering, when distance between the two starting points associated with two distinct groups smaller than scale*radius, then the two groups merge.

sort_valsnumpy.ndarray

Sorting values.

half_nrm2numpy.ndarray

Precomputed values for distance computation.

Returns

labelsnumpy.ndarray

The merging labels.

old_cluster_countint

The number of clusters without outliers elimination.

SIZE_NOISE_LABELSint

The number of clusters marked as outliers.

References

[1] X. Chen and S. Güttel. Fast and explainable sorted based clustering, 2022