Clustering¶
Hierarchical Clustering¶
Hierarchical Clustering algorithm derived from the R package ‘amap’ [Amap].
-
class
mlpy.
HCluster
(method='euclidean', link='complete')¶ Hierarchical Cluster.
Initialize Hierarchical Cluster.
Parameters: - method : string (‘euclidean’)
the distance measure to be used
- link : string (‘single’, ‘complete’, ‘mcquitty’, ‘median’)
the agglomeration method to be used
Example:
>>> import numpy as np >>> import mlpy >>> x = np.array([[ 1. , 1.5], ... [ 1.1, 1.8], ... [ 2. , 2.8], ... [ 3.2, 3.1], ... [ 3.4, 3.2]]) >>> hc = mlpy.HCluster() >>> hc.compute(x) >>> hc.ia array([-4, -1, -3, 2]) >>> hc.ib array([-5, -2, 1, 3]) >>> hc.heights array([ 0.2236068 , 0.31622776, 1.4560219 , 2.94108844]) >>> hc.cut(0.5) array([0, 0, 1, 2, 2])
-
compute
(x)¶ Compute Hierarchical Cluster.
Parameters: - x : ndarray
An 2-dimensional vector (sample x features).
Returns: - self.ia : ndarray (1-dimensional vector)
merge
- self.ib : ndarray (1-dimensional vector)
merge
- self.heights : ndarray (1-dimensional vector)
a set of n-1 non-decreasing real values. The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration.
Element i of merge describes the merging of clusters at step i of the clustering. If an element j is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.
-
cut
(ht)¶ Cuts the tree into several groups by specifying the cut height.
Parameters: - ht : float
height where the tree should be cut
Returns: - cl : ndarray (1-dimensional vector)
group memberships. Groups are in 0, ..., N-1
[Amap] | amap: Another Multidimensional Analysis Package, http://cran.r-project.org/web/packages/amap/index.html |
k-means¶
-
class
mlpy.
Kmeans
(k, init='std', seed=0)¶ k-means algorithm.
Initialization.
Parameters: - k : int (>1)
number of clusters
- init : string (‘std’, ‘plus’)
- initialization algorithm
- ‘std’ : randomly selected
- ‘plus’ : k-means++ algorithm
- seed : int (>=0)
random seed
Example:
>>> import numpy as np >>> import mlpy >>> x = np.array([[ 1. , 1.5], ... [ 1.1, 1.8], ... [ 2. , 2.8], ... [ 3.2, 3.1], ... [ 3.4, 3.2]]) >>> kmeans = mlpy.Kmeans(k=3, init="plus", seed=0) >>> kmeans.compute(x) array([1, 1, 2, 0, 0], dtype=int32) >>> kmeans.means array([[ 3.3 , 3.15], [ 1.05, 1.65], [ 2. , 2.8 ]]) >>> kmeans.steps 2
New in version 2.2.0.
-
compute
(x)¶ Compute Kmeans.
Parameters: - x : ndarray
an 2-dimensional vector (number of points x dimensions)
Returns: - cls : ndarray (1-dimensional vector)
cluster membership. Clusters are in 0, ..., k-1
Attributes: - Kmeans.means : 2d ndarray float (k x dim)
means
- Kmeans.steps : int
number of steps
k-medoids¶
-
class
mlpy.
Kmedoids
(k, dist, maxloops=100, rs=0)¶ k-medoids algorithm.
Initialize Kmedoids.
Parameters: - k : int
Number of clusters/medoids
- dist : class
class with a .compute(x, y) method which returns a distance
- maxloops : int
maximum number of loops
- rs : int
random seed
Example:
>>> import numpy as np >>> import mlpy >>> x = np.array([[ 1. , 1.5], ... [ 1.1, 1.8], ... [ 2. , 2.8], ... [ 3.2, 3.1], ... [ 3.4, 3.2]]) >>> dtw = mlpy.Dtw(onlydist=True) >>> km = mlpy.Kmedoids(k=3, dist=dtw) >>> km.compute(x) (array([4, 0, 2]), array([3, 1]), array([0, 1]), 0.072499999999999981)
Samples 4, 0, 2 are medoids and represent cluster 0, 1, 2 respectively.
- cluster 0: samples 4 (medoid) and 3
- cluster 1: samples 0 (medoid) and 1
- cluster 2: sample 2 (medoid)
New in version 2.0.8.
-
compute
(x)¶ Compute Kmedoids.
Parameters: - x : ndarray
An 2-dimensional vector (sample x features).
Returns: - m : ndarray (1-dimensional vector)
medoids indexes
- n : ndarray (1-dimensional vector)
non-medoids indexes
- cl : ndarray 1-dimensional vector)
cluster membership for non-medoids. Groups are in 0, ..., k-1
- co : double
total cost of configuration