sklearn中的k_means和KMeans区别-CFANZ编程社区

sklearn中的k_means和KMeans区别

1.KMeans的缺点
2.sklearn.KMeans参数
3.sklearn.KMeans属性

KMeans

**：
1.k点中心个数的确定，很难确定到底分多少个聚类才是最合适的
2.k点中心的确定，需要人为的事先给定，而且k点中心的确定比较难把握，不同的聚类中心会导致不同的聚类结果

sklearn.KMeans参数：

KMeans(
    n_clusters=8,
    *,
    init='k-means++',
    n_init=10,
    max_iter=300,
    tol=0.0001,
    verbose=0,
    random_state=None,
    copy_x=True,
    algorithm='auto',
)

注释：

n_clusters：int型，生成的聚类数，默认为8
init：有三个可选值：‘k-means++’、‘random’、或者传递一个ndarray向量。
１）‘k-means++’ 用一种特殊的方法选定初始质心从而能加速迭代过程的收敛
２）‘random’ 随机从训练数据中选取初始质心。
３）如果传递的是一个ndarray，则应该形如 (n_clusters, n_features) 并给出初始质心。
默认值为‘k-means++’。
n_init：int型，用不同的聚类中心初始化值运行算法的次数，最终解是在inertia意义下选出的最优结果。默认值为10
max_iter：int型，执行一次k-means算法所进行的最大迭代数。默认值为300
tol：float型，默认值= 1e-4　与inertia结合来确定收敛条件。
n_jobs：int型。指定计算所用的进程数。内部原理是同时进行n_init指定次数的计算。
（１）若值为 -1，则用所有的CPU进行运算。若值为1，则不进行并行运算，这样的话方便调试。
（２）若值小于-1，则用到的CPU数为(n_cpus + 1 + n_jobs)。因此如果 n_jobs值为-2，则用到的CPU数为总CPU数减1。
random_state：整形或 numpy.RandomState 类型，可选
用于初始化质心的生成器（generator）。如果值为一个整数，则确定一个seed。此参数默认值为numpy的随机数生成器。
copy_x : bool, 默认值=True，如果copy_x=True,则原始数据被保留
algorithm : {“auto”, “full”, “elkan”}, default=“auto”，K-means algorithm to use.

*属性：

cluster_centers_ : 聚类中心
labels_ :
inertia_ : 样本到其最近聚类中心的平方距离之和，按样品权重（如果提供）加权。
n_iter_ : int，总共迭代计算的次数
n_features_in_ : int，‘fit’ 中看到的特征数。
feature_names_in_ :

K_means()

k_means(
    X,
    n_clusters,
    *,
    sample_weight=None,
    init='k-means++',
    n_init=10,
    max_iter=300,
    verbose=False,
    tol=0.0001,
    random_state=None,
    copy_x=True,
    algorithm='auto',
    return_n_iter=False,
)

返回值：
1.centroid : ndarray of shape (n_clusters, n_features)，Centroids found at the last iteration of k-means.
2.label : ndarray of shape (n_samples,)
The label[i] is the code or index of the centroid the
i’th observation is closest to.
3.inertia : float
The final value of the inertia criterion (sum of squared distances to
the closest centroid for all observations in the training set).
4.best_n_iter : int
Number of iterations corresponding to the best results.
Returned only if return_n_iter is set to True.