我需要解析器从一个页面收集所有链接,然后它跟随这些链接并从那里收集所有信息。但是当代码执行时,只显示列表的最后一个组件svarka。
这是代码:
import requests
from bs4 import BeautifulSoup
import csv
import os
URL = 'https://ptk-svarka.ru/catalog/apparaty-poluavtomaticheskoy-svarki-mig'
HEADERS = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36', 'accept': '*/*'}
FILE = 'svarka.csv'
def get_html(url, params=None):
r = requests.get(url, headers=HEADERS, params=params)
return r
def get_Stranic(html):
soup = BeautifulSoup(html, 'html.parser')
items = soup.find_all('div', class_='b-grid__item js-product-item')
stranica_svarka=[]
for item in items:
stranica_svarka.append(
item.find('a', class_='b-products__text').get('href'),
)
return stranica_svarka
def get_html_vivod(stranic):
for url in stranic:
r = requests.get(url, headers=HEADERS)
r = r.content
soup = BeautifulSoup(r, 'html.parser')
items = soup.find_all('article', class_='b-product__wrapper')
svarka = []
for item in items:
svarka.append({
'title': item.find('h1', class_='b-product__title').get_text(strip=True)
})
return svarka
def parse():
URL = input('Введите URL: ')
URL = URL.strip()
html = get_html(URL)
if html.status_code == 200:
stranic = get_Stranic(html.text)
svarka = get_html_vivod(stranic)
print (svarka)
else:
print('Error')
parse()
不同范围的错误。在您的情况下
svarka,它仅存在于循环中for url in stranic,并且每次迭代都会使用新值进行更新[],从而删除先前的进度。您需要将工作表初始化移动
svarka到与return.