有一个工作项目。
由于读取 1+ GB 的 csv 文件和各种groupby() + apply()
. 承诺只取代一条线的广告吸引了人们。
在 Anaconda / Python 3.9.12 中从头开始安装所有必要的包。只安装了 Dask 以防万一。
我运行它,它是:
UserWarning: Dask execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:
from distributed import Client
client = Client()
2022-06-06 13:00:20,340 - distributed.diskutils - INFO - Found stale lock file and directory 'D:\\OD\\OneDrive\\Projects\\Chud_Amaz\\Soft_in_dev\\moduled_way_OOP\\dask-worker-space\\worker-75ktabxn', purging
2022-06-06 13:00:20,365 - distributed.diskutils - INFO - Found stale lock file and directory 'D:\\OD\\OneDrive\\Projects\\Chud_Amaz\\Soft_in_dev\\moduled_way_OOP\\dask-worker-space\\worker-97o0hmer', purging
2022-06-06 13:00:20,370 - distributed.diskutils - INFO - Found stale lock file and directory 'D:\\OD\\OneDrive\\Projects\\Chud_Amaz\\Soft_in_dev\\moduled_way_OOP\\dask-worker-space\\worker-ck5pxauy', purging
2022-06-06 13:00:20,377 - distributed.diskutils - INFO - Found stale lock file and directory 'D:\\OD\\OneDrive\\Projects\\Chud_Amaz\\Soft_in_dev\\moduled_way_OOP\\dask-worker-space\\worker-qswh4ton', purging
2022-06-06 13:00:20,386 - distributed.diskutils - INFO - Found stale lock file and directory 'D:\\OD\\OneDrive\\Projects\\Chud_Amaz\\Soft_in_dev\\moduled_way_OOP\\dask-worker-space\\worker-vmsc68w6', purging
2022-06-06 13:00:20,390 - distributed.diskutils - INFO - Found stale lock file and directory 'D:\\OD\\OneDrive\\Projects\\Chud_Amaz\\Soft_in_dev\\moduled_way_OOP\\dask-worker-space\\worker-ys_mgy6k', purging
OD ---> Cant read csv D:\_\OD\AD__OD_04.06.22\AD__RET.csv. Error is index 0 is out of bounds for axis 0 with size 0
Traceback (most recent call last):
File D:\OD\OneDrive\Projects\Chud_Amaz\Soft_in_dev\moduled_way_OOP\izi_report_main_foldered.py:108 in <module>
main_foldered()
File D:\OD\OneDrive\Projects\Chud_Amaz\Soft_in_dev\moduled_way_OOP\izi_report_main_foldered.py:81 in main_foldered
assert False, "Bad input files"
AssertionError: Bad input files
由于我不明白 Dask 想从我这里得到什么,所以我想问 - 那会是什么?嗯,也就是说,一切正常。在模块中添加了以下内容:
try:
import modin.pandas as pd
except:
import pandas as pd
而我被骗了。告诉我——去哪里跑?
是的。我有Windows10,AMD。
所以,简而言之,问题是:
安装后modin[dask]
运行代码报错:
Error is index 0 is out of bounds for axis 0 with size 0
Traceback (most recent call last):
他自己read_csv()
有这些参数:
readed_into_df = pd.read_csv(
str(file_path),
skiprows=skiprows_list,
sep=separator,
encoding=en_cod_,
thousands=",", # TODO: chek it
on_bad_lines="skip",
usecols=columns_dtype.keys(),
dtype=columns_dtype,
)