Data Management¶
Importing and exporting data¶
-
mlpy.
data_fromfile
(file, ytype=<type 'int'>)¶ Read data file in the form:
x11 [TAB] x12 [TAB] ... x1n [TAB] y1 x21 [TAB] x22 [TAB] ... x2n [TAB] y2 . . . . . . . . . . . . . . . xm1 [TAB] xm2 [TAB] ... xmn [TAB] ym
where xij are float and yi are of type ‘ytype’ (numpy.int or numpy.float).
Input
- file - data file name
- ytype - numpy datatype for labels (numpy.int or numpy.float)
Output
- x - data [2D numpy array float]
- y - classes [1D numpy array int or float]
Example:
>>> from numpy import * >>> from mlpy import * >>> x, y = data_fromfile('data_example.dat') >>> x array([[ 1.1, 2. , 5.3, 3.1], ... [ 3.7, 1.4, 2.3, 4.5], ... [ 1.4, 5.4, 3.1, 1.4]]) >>> y array([ 1, -1, 1])
-
mlpy.
data_fromfile_wl
(file)¶ Read data file in the form:
x11 [TAB] x12 [TAB] ... x1n [TAB] x21 [TAB] x22 [TAB] ... x2n [TAB] . . . . . . . . . . . . xm1 [TAB] xm2 [TAB] ... xmn [TAB]
where xij are float.
Input
- file - data file name
Output
- x - data [2D numpy array float]
Example:
>>> from numpy import * >>> from mlpy import * >>> x, y = data_fromfile('data_example.dat') >>> x array([[ 1.1, 2. , 5.3, 3.1], ... [ 3.7, 1.4, 2.3, 4.5], ... [ 1.4, 5.4, 3.1, 1.4]])
-
mlpy.
data_tofile
(file, x, y, sep='\t')¶ Write data file in the form:
x11 [sep] x12 [sep] ... x1n [sep] y1 x21 [sep] x22 [sep] ... x2n [sep] y2 . . . . . . . . . . . . . . . xm1 [sep] xm2 [sep] ... xmn [sep] ym
where xij are float and yi are integer.
Input
- file - data file name
- x - data [2D numpy array float]
- y - classes [1D numpy array integer]
- sep - separator
-
mlpy.
data_tofile_wl
(file, x, sep='\t')¶ Write data file in the form:
x11 [sep] x12 [sep] ... x1n [sep] x21 [sep] x22 [sep] ... x2n [sep] . . . . . . . . . . . . xm1 [sep] xm2 [sep] ... xmn [sep]
where xij are float.
Input
- file - data file name
- x - data [2D numpy array float]
- sep - separator
Normalization¶
-
mlpy.
data_normalize
(x)¶ Normalize numpy array (2D) x.
Input
- x - data [2D numpy array float]
Output
- normalized data
Example:
>>> from numpy import * >>> from mlpy import * >>> x = array([[ 1.1, 2. , 5.3, 3.1], ... [ 3.7, 1.4, 2.3, 4.5], ... [ 1.4, 5.4, 3.1, 1.4]]) >>> data_normalize(x) array([[-0.9797065 , -0.48295391, 1.33847226, 0.12418815], ... [ 0.52197912, -1.13395464, -0.48598056, 1.09795608], ... [-0.75217354, 1.35919078, 0.1451563 , -0.75217354]])
Warning
Deprecated in version 2.3
-
mlpy.
data_standardize
(x, p=None)¶ Standardize numpy array (2D) x and optionally standardize p using mean and std of x.
Input
- x - data [2D numpy array float]
- p - optional data [2D numpy array float]
Output
- standardized data
Example:
>>> from numpy import * >>> from mlpy import * >>> x = array([[ 1.1, 2. , 5.3, 3.1], ... [ 3.7, 1.4, 2.3, 4.5], ... [ 1.4, 5.4, 3.1, 1.4]]) >>> data_standardize(x) array([[-0.67958381, -0.43266792, 1.1157668 , 0.06441566], ... [ 1.1482623 , -0.71081158, -0.81536804, 0.96623494], ... [-0.46867849, 1.1434795 , -0.30039875, -1.0306506 ]])
Warning
Deprecated in version 2.3. Use mlpy.standardize and mlpy.standardize_from instead
-
mlpy.
standardize
(x)¶ Standardize x.
x is standardized to have mean 0 and unit length by columns. Return standardized x, the mean and the standard deviation.
-
mlpy.
center
(y)¶ Center y to have mean 0.
Return centered y.
-
mlpy.
standardize_from
(x, mean, std)¶ Standardize x using external mean and standard deviation.
Return standardized x.
-
mlpy.
center_from
(y, mean)¶ Center y using external mean.
Return centered y.