/train.zip
例如,如果我拥有PNG 图像的存档及其/metadata.csv
注释文件,以便parquet-converter机器人可以自动识别并正确解释该数据集,那么我应该在 Hugging Face Hub 平台中使用什么文件结构?
但无论我使用什么文件排列方式,
/train.zip
/metadata.csv
或者
/train/train.zip
/metadata.csv
我得到一个例外:
Cannot load the dataset split (in streaming mode) to extract the first rows.
Error code: StreamingRowsError
Exception: ValueError
Message: One or several metadata.csv were found, but not in the same directory or in a parent directory of zip://1.png::hf://datasets/[user]/[repo-name]@[hash]/train/train.zip.
Traceback: Traceback (most recent call last):
File "/src/services/worker/src/worker/job_runners/split/first_rows.py", line 322, in compute
compute_first_rows_from_parquet_response(
File "/src/services/worker/src/worker/job_runners/split/first_rows.py", line 88, in compute_first_rows_from_parquet_response
rows_index = indexer.get_rows_index(
File "/src/libs/libcommon/src/libcommon/parquet_utils.py", line 640, in get_rows_index
return RowsIndex(
File "/src/libs/libcommon/src/libcommon/parquet_utils.py", line 521, in __init__
self.parquet_index = self._init_parquet_index(
File "/src/libs/libcommon/src/libcommon/parquet_utils.py", line 538, in _init_parquet_index
response = get_previous_step_or_raise(
File "/src/libs/libcommon/src/libcommon/simple_cache.py", line 590, in get_previous_step_or_raise
raise CachedArtifactError(
libcommon.simple_cache.CachedArtifactError: The previous step failed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/services/worker/src/worker/utils.py", line 96, in get_rows_or_raise
return get_rows(
File "/src/libs/libcommon/src/libcommon/utils.py", line 197, in decorator
return func(*args, **kwargs)
File "/src/services/worker/src/worker/utils.py", line 73, in get_rows
rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1389, in __iter__
for key, example in ex_iterable:
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 234, in __iter__
yield from self.generate_examples_fn(**self.kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/packaged_modules/folder_based_builder/folder_based_builder.py", line 376, in _generate_examples
raise ValueError(
ValueError: One or several metadata.csv were found, but not in the same directory or in a parent directory of zip://1.png::hf://datasets/[user]/[repo-name]@[hash]/train/train.zip.
我究竟做错了什么?