我最近开始熟悉 Python。我根本无法解决问题。我正在尝试使用 selenium 解析网站。下载页面并将其保存到文件中。该文件在浏览器中正常打开。我想获取标题列表。除了一个包含希腊字母 betta 之外,所有的都是正常拍摄的。当我尝试显示此标题时,我收到错误“'charmap' codec can't encode character '\u03b2' in position 13: character maps to”。看来我也设置了编码'utf-8',但它仍然不起作用。页面文件也使用“utf-8”写入
错误的全文
Traceback (most recent call last): File "D:\scrap\test1.py", line 18, in <module>
print(aaa.text)
File "C:\Users\serge\AppData\Local\Programs\Python\Python310\lib\encodings\cp1251.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03b2' in position 13: character maps to <undefined>
with open("index.html", 'w', encoding='utf-8') as file:
file.write(driver.page_source)
,并且在浏览器中正常打开,字母“betta”的拼写为“β”
with open("index.html", 'r', encoding='utf-8') as file:
src = file.read()
soup = BeautifulSoup(src, 'lxml')
research = soup.find_all(class_="analyzes__row")
for res in research:
res1 = res.find_all('a')
for aaa in res1:
try:
print(aaa.text)
except Exception as ex:
print(ex)
continue