Clustor is a high-performance clustering toolkit that leverages the power of Rust to deliver fast and efficient clustering algorithms. Designed with a Python-first API, Clustor integrates seamlessly into production data pipelines, offering minimal-dependency implementations of common clustering algorithms. Built on PyO3 and maturin, it supports a range of unsupervised learning primitives, making it an ideal choice for data scientists and developers working with large datasets.
KMeans and MiniBatchKMeans with KMeans++ initialization, Euclidean/Cosine metrics, and streaming partial_fit.
BisectingKMeans for hierarchical clustering.
DBSCAN with noise labeling and core sample indices.
OPTICS for reachability ordering and core distances.
Affinity Propagation for exemplar-based clustering.
BIRCH for streaming clustering using CF-tree summaries.
GaussianMixture with model selection metrics like AIC and BIC.
Cluster validation metrics: Silhouette, Calinski–Harabasz, and Davies–Bouldin.
Clustor is actively developed, with APIs subject to change before the 1.0 release. It is open source, licensed under the Apache License, and available on GitHub.
Built with