是否可以以某种方式自定义 QTabWidget？

Question

Ildar

Asked:2022-02-24 05:04:46 +0000 UTC2022-02-24 05:04:46 +0000 UTC 2022-02-24 05:04:46 +0000 UTC

selenium 没有从页面中提供 html 部分

772

我从下面一个机构的网站收集信息url，我想获取员工及其姓名的链接，driver加载此信息，它在浏览器中可见，但html取的是哪一个，即page_source没有这个信息，可能是什么问题？

from selenium import webdriver
from bs4 import BeautifulSoup

url = 'https://kpfu.ru/computing-technology/struktura/kafedry/kafedra-prikladnoj-matematiki/kafedra-prikladnoj-matematiki-sotrudniki'

driver = webdriver.Firefox(executable_path=constants.gecko_path)
driver.get(url)
html = driver.page_source

def gather_employees_links(html):
    soup = BeautifulSoup(html, 'lxml')

    spans = soup.find_all('span', class_='fio')
    a_tags = [span.find('a') for span in spans]
    employees = {a.text: a.get('href') for a in a_tags}

    return employees

print(gather_employees_links(html))

1 个回答

Voted

Rolles · Answer 1 · 2022-02-24T15:08:38Z

一个带有webdriver的解决方案，但这也可以通过请求来实现。

进口

import time
from selenium import webdriver

代码

    driver.get('https://shelly.kpfu.ru/e-ksu/portal_employee.searchscript?p_'
               'search=1.1.2.09.2.01.2.1&p_noofficename=1&p_order=1&')
    time.sleep(3)
    
    xpath = "//td/span[@class='fio' and 1]/a[1]"
    elements = driver.find_elements_by_xpath(xpath)
    names = []
    for element in elements:
        names.append(element.text)
    
    url = []
    for i in range(1, 25):
        try:
            xpath = f"//tr[{i}]/td[@class='li_spec' and 1]/span[@class='fio' and 1]/a[1]"
            element = 
            driver.find_element_by_xpath(xpath).get_attribute('href')
            url.append(element)
        except:
            continue
    
    print(names)
    print(url)

不要忘记创建驱动程序对象。

第二个选项是通过请求

进口

import requests
from lxml import html

代码

url = 'https://shelly.kpfu.ru/e-ksu/portal_employee.searchscript?p_search=1.1.2.09.2.01.2.1&p_noofficename=1&p_order=1&'
response = requests.get(url)
tree = html.fromstring(response.content)

names = []
elements = tree.xpath("//td/span[@class='fio' and 1]/a[1]")
for element in elements:
    names.append(element.text)


url = []
for i in range(1, 25):
    try:
        xpath = f"//tr[{i}]/td[@class='li_spec' and 1]/span[@class='fio' and 1]/a[1]/@href"
        element = tree.xpath(xpath)
        url.append(element[0])
    except:
        continue

print(names)
print(url)

selenium 没有从页面中提供 html 部分

表格填充不起作用

提示 50/50，有两个，其中一个是正确的

在 PyQt5 中停止进程

我的脚本不工作

在文本文件中写入和读取列表

如何像屏幕截图中那样并排排列这些块？

确定文本文件中每一行的字符数

将接口对象传递给 JAVA 构造函数

正确更新数据库中的数据

Python解析不是css

selenium 没有从页面中提供 html 部分

1 个回答

相关问题