我正在尝试训练我的模型
from sklearn.linear_model import RidgeCV
alphas = [0.00001, 0.0001, 0.001, 0.01, 0.01, 0.5, 1, 3, 5]
clf = RidgeCV(alphas=alphas, normalize=True, gcv_mode = 'eigen').fit(x_train, y_train)
其中:
print(x_train.shape, y_train.shape)
(62313, 100600) (62313,)
type(x_train)
scipy.sparse.csr.csr_matrix
在发布之前,我有大约 10 GB 的可用 RAM。启动后,它需要大约 2 GB 并保持空闲 8 字节。我的代码运行了大约半小时,而空闲内存一直在 8 GB 左右,然后崩溃并出现错误:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-9-e9d3d7210319> in <module>()
2 from sklearn.linear_model import RidgeCV
3 alphas = [0.00001, 0.0001, 0.001, 0.01, 0.01, 0.5, 1, 3, 5]
----> 4 clf = RidgeCV(alphas=alphas, normalize=True, gcv_mode = 'eigen').fit(x_train, y_train)
~/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/ridge.py in fit(self, X, y, sample_weight)
1112 gcv_mode=self.gcv_mode,
1113 store_cv_values=self.store_cv_values)
-> 1114 estimator.fit(X, y, sample_weight=sample_weight)
1115 self.alpha_ = estimator.alpha_
1116 if self.store_cv_values:
~/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/ridge.py in fit(self, X, y, sample_weight)
1027 centered_kernel = not sparse.issparse(X) and self.fit_intercept
1028
-> 1029 v, Q, QT_y = _pre_compute(X, y, centered_kernel)
1030 n_y = 1 if len(y.shape) == 1 else y.shape[1]
1031 cv_values = np.zeros((n_samples * n_y, len(self.alphas)))
~/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/ridge.py in _pre_compute(self, X, y, centered_kernel)
883 def _pre_compute(self, X, y, centered_kernel=True):
884 # even if X is very sparse, K is usually very dense
--> 885 K = safe_sparse_dot(X, X.T, dense_output=True)
886 # the following emulates an additional constant regressor
887 # corresponding to fit_intercept=True
~/anaconda3/lib/python3.6/site-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
133 """
134 if issparse(a) or issparse(b):
--> 135 ret = a * b
136 if dense_output and hasattr(ret, "toarray"):
137 ret = ret.toarray()
~/anaconda3/lib/python3.6/site-packages/scipy/sparse/base.py in __mul__(self, other)
477 if self.shape[1] != other.shape[0]:
478 raise ValueError('dimension mismatch')
--> 479 return self._mul_sparse_matrix(other)
480
481 # If it's a list or whatever, treat it like a matrix
~/anaconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py in _mul_sparse_matrix(self, other)
500 maxval=nnz)
501 indptr = np.asarray(indptr, dtype=idx_dtype)
--> 502 indices = np.empty(nnz, dtype=idx_dtype)
503 data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
504
MemoryError:
你能帮我弄清楚发生了什么以及可以做些什么吗?
注意来自的评论
error traceback:从这个块:
在您的情况下,您正在使用
(62313, 100600)具有66112691非零元素(大约占元素总数的 1%)的稀疏维度矩阵。要存储这样一个矩阵,您需要大约 500MiB 的内存:从上面的评论如下,执行后:
K将包含很大比例的非零元素。要存储矩阵K(假设它不包含零元素),您将需要额外的 29GiB RAM:注意:在所有计算中,我从每个非零矩阵单元占用 8 个字节(对于数据类型
int64或float64)这一事实出发。如果您有不同的数据类型,请将 8 替换为相应的数字(存储一个非零元素的字节数)。我认为您应该考虑减少输入矩阵的维度。