In [38]: df = pd.DataFrame({"col":["ONE","TWO","THREE"]})
In [39]: df["col"] = df["col"].str[:-1] + df["col"].str[-1].str.lower()
In [40]: df
Out[40]:
col
0 ONe
1 TWo
2 THREe
In [44]: big = pd.concat([df] * 10000, ignore_index=True)
In [45]: big.shape
Out[45]: (30000, 1)
In [49]: %timeit big["res"] = [s[:-1] + s[-1].lower() for s in big["col"]]
9.19 ms ± 95.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [50]: %timeit big['res'] = big['col'].apply(lambda x: x[:-1] + x[-1].lower())
9.41 ms ± 97.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [51]: %timeit big["res"] = big["col"].str[:-1] + big["col"].str[-1].str.lower()
26.9 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
300,000 行的 DataFrame 的执行速度比较:
In [52]: big = pd.concat([df] * 100000, ignore_index=True)
In [53]: big.shape
Out[53]: (300000, 1)
In [54]: %timeit big["res"] = big["col"].str[:-1] + big["col"].str[-1].str.lower()
272 ms ± 7.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [55]: %timeit big['res'] = big['col'].apply(lambda x: x[:-1] + x[-1].lower())
106 ms ± 3.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [56]: %timeit big["res"] = [s[:-1] + s[-1].lower() for s in big["col"]]
99.5 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
结论:
最快的选择原来是list comprehension:
df["col"] = [s[:-1] + s[-1].lower() for s in df["col"]]
PS 已经多次要求您以可重现的形式在问题中提供示例数据 - (以文本/CSV/Python 代码或文件链接的形式)。阅读为什么它很重要...... ;)
由 30,000 行组成的 DataFrame 的执行速度比较:
300,000 行的 DataFrame 的执行速度比较:
结论:
最快的选择原来是
list comprehension:作为一种选择,您本质上可以使用相同
apply的lambda解决方案,所以在我看来,它看起来比重复重复更麻烦df['col'].str:虽然如果有很多数据,那么通过向量解法
.str当然会效率更高。