Imputing¶
Purify¶
-
mlpy.
purify
(x, th0=0.1, th1=0.1)¶ Return the matrix x without rows and cols containing respectively more than th0 * x.shape[1] and th1 * x.shape[0] NaNs.
Returns: - (xout, v0, v1) : (2d ndarray, 1d ndarray int, 1d ndarray int)
v0 are the valid index at dimension 0 and v1 are the valid index at dimension 1
Example:
>>> import numpy as np >>> import mlpy >>> x = np.array([[1, 4, 4 ], ... [2, 9, np.NaN], ... [2, 5, 8 ], ... [8, np.NaN, np.NaN], ... [np.NaN, 4, 4 ]]) >>> y = np.array([1, -1, 1, -1, -1]) >>> x, v0, v1 = mlpy.purify(x, 0.4, 0.4) >>> x array([[ 1., 4., 4.], [ 2., 9., NaN], [ 2., 5., 8.], [ NaN, 4., 4.]]) >>> v0 array([0, 1, 2, 4]) >>> v1 array([0, 1, 2])
New in version 2.0.4.
KNN imputing¶
-
mlpy.
knn_imputing
(x, k, dist='e', method='mean', y=None, ldep=False)¶ Knn imputing
Parameters: - x : 2d ndarray float (samples x feats)
data to impute
- k : integer
number of nearest neighbor
- dist : string (‘se’ = SQUARED EUCLIDEAN, ‘e’ = EUCLIDEAN)
adopted distance
- method : string (‘mean’, ‘median’)
method to compute the missing values
- y : 1d ndarray
labels
- ldep : bool
label depended (if y != None)
Returns: - xout : 2d ndarray float (samples x feats)
data imputed
>>> import numpy as np >>> import mlpy >>> x = np.array([[1, 4, 4 ], ... [2, 9, np.NaN], ... [2, 5, 8 ], ... [8, np.NaN, np.NaN], ... [np.NaN, 4, 4 ]]) >>> y = np.array([1, -1, 1, -1, -1]) >>> x, v0, v1 = mlpy.purify(x, 0.4, 0.4) >>> x array([[ 1., 4., 4.], [ 2., 9., NaN], [ 2., 5., 8.], [ NaN, 4., 4.]]) >>> v0 array([0, 1, 2, 4]) >>> v1 array([0, 1, 2]) >>> y = y[v0] >>> x = mlpy.knn_imputing(x, 2, dist='e', method='median') >>> x array([[ 1. , 4. , 4. ], [ 2. , 9. , 6. ], [ 2. , 5. , 8. ], [ 1.5, 4. , 4. ]])
New in version 2.0.4.