有DF:
Project name
0 ABC-ND-SON-Project-a
1 ABC-ND-SON-Project-a
2 ABC-ND-SON-Project-a
3 ABC-WD-SON-Project-b
4 ABC-WD-SON-Project-b
5 ABC-LI-SON-Project-c
6 ABC-LI-SON-Project-c
7 ABC-KD-SON-Project-d
8 ABC-KD-SON-Project-d
其中,我需要删除前 4 个字符以摆脱“ABC-”部分,然后对于“LI-SON”和“KD-SON”项目,只保留测试的前两个部分。对于其他类型的项目,留下3个部分。
这是我想要得到的结果:
Project name
0 ND-SON-Project
1 ND-SON-Project
2 ND-SON-Project
3 WD-SON-Project
4 WD-SON-Project
5 LI-SON
6 LI-SON
7 KD-SON
8 KD-SON
编码:
import pandas as pd
df_list = pd.read_html('Table.html', match='Projects:')
df = pd.concat([df_list], axis=1) #датафрейм с изначальным списком
df['Project name'] = df['Project name'].str[4:] #удаление "ABC-"
df = df['Project name'].str.split('-', 3, expand=True)
cols = [0, 1, 2]
df['New'] = df[cols].apply(lambda row: '-'.join(row.values.astype(str)), axis=1) #новый столбец, в котором все Project names содержат только первые 3 части текста (без ABC-)
df = df.drop(columns=[0, 1, 2, 3])
print(df)
在这个阶段,我不能只删除 LI 和 KD 项目的“-Project”部分:
New
0 ND-SON-Project
1 ND-SON-Project
2 ND-SON-Project
3 WD-SON-Project
4 WD-SON-Project
5 LI-SON-Project
6 LI-SON-Project
7 KD-SON-Project
8 KD-SON-Project
我试图只删除“LI”:
def row(df):
for k in df['New']:
if k.startswith('LI'):
k.str.split('-', 1)[0]
df['New'] = df['New'].apply(row)
但它给出了这个错误:
TypeError Traceback (most recent call last)
Input In [66], in <module>
3 if k.startswith('LI'):
4 k.str.split('-', 1)[0]
----> 5 df['New'] = df['New'].apply(row)
Input In [66], in row(df)
1 def row(df):
----> 2 for k in df['New']:
3 if k.startswith('LI'):
4 k.str.split('-', 1)[0]
TypeError: string indices must be integers
你做了什么,我想
东风: