def print_json(limit, soup):
feed = html.unescape(soup.title.text)
dict_json = {feed: {}}
items = soup.findAll('item')
for i in items:
dict_json[feed] = {'Title': html.unescape(i.title.text), 'Date': html.unescape(i.pubDate.text),
'Link': html.unescape(i.link.text)}
print(json.dumps(dict_json, indent=4))
当我启动程序时,我得到
{
"TUT.BY: \u041d\u043e\u0432\u043e\u0441\u0442\u0438 \u0422\u0423\u0422 - \u0413\u043b\u0430\u0432\u043d\u044b\u0435 \u043d\u043e\u0432\u043e\u0441\u0442\u0438": {
"Title": "\u0421 \u041b\u0435\u043d\u0438\u043d\u044b\u043c \u0432 \u0441\u0435\u0440\u0434\u0446\u0435 \u0438 \u0433\u0432\u043e\u0437\u0434\u0438\u043a\u0430\u043c\u0438 \u0432 \u0440\u0443\u043a\u0430\u0445. \u041a\u0430\u043a \u0432 \u041c\u0438\u043d\u0441\u043a\u0435 \u043e\u0442\u043c\u0435\u0442\u0438\u043b\u0438 \u0433\u043e\u0434\u043e\u0432\u0449\u0438\u043d\u0443 \u041e\u043a\u0442\u044f\u0431\u0440\u044c\u0441\u043a\u043e\u0439 \u0440\u0435\u0432\u043e\u043b\u044e\u0446\u0438\u0438",
"Date": "Thu, 07 Nov 2019 15:36:00 +0300",
"Link": "https://news.tut.by/economics/660484.html?utm_campaign=news-feed&utm_medium=rss&utm_source=rss-news"
}
}
虽然,例如,标题应该是“TUT.BY: News HERE - Main News” 为什么结果有些行正常,有些行不正常?
json.dumps (obj, ..., ensure_ascii=True, ...)函数默认使用 unicode 数字将所有“非 ASCII”字符替换为相应的 unicode 字符。西里尔字符不属于 ASCII 范围,将被
"\u<number>"表示形式取代。要禁用此替换,请显式指定参数
ensure_ascii=False