RError.com

RError.com Logo RError.com Logo

RError.com Navigation

  • 主页

Mobile menu

Close
  • 主页
  • 系统&网络
    • 热门问题
    • 最新问题
    • 标签
  • Ubuntu
    • 热门问题
    • 最新问题
    • 标签
  • 帮助
主页 / 问题 / 1475587
Accepted
Михаил Ширшов
Михаил Ширшов
Asked:2022-12-09 14:47:01 +0000 UTC2022-12-09 14:47:01 +0000 UTC 2022-12-09 14:47:01 +0000 UTC

使用 BS4 从 XML 解析问题

  • 772

请帮我解决这个问题。我有一个具有这种结构的 xml 文件:

<?xml version="1.0">
<extract_cadastral_plan_territory>
  <details_statement>
    <group_top_requisites>
      <organ_registr_rights></organ_registr_rights>
      <date_formation>2022-12-02</date_formation>
      <registration_number>****-***/****-*********</registration_number>
    </group_top_requisites>
  </details_statement>
  <details_request>
    <date_received_request>2022-12-02</date_received_request>
    <date_receipt_request_reg_authority_rights>2022-12-02</date_receipt_request_reg_authority_rights>
  </details_request>
  <cadastral_blocks>
    <cadastral_block>
      <cadastral_number>52:18:0010180</cadastral_number>
      <area_quarter>
        <area>1.1</area>
        <unit>059</unit>
      </area_quarter>
      <land_record>
        <object>
          <common_data>
            <type>
              <code>002001001000</code>
              <value>Земельный участок</value>
            </type>
            <cad_number>52:18:0010180:1</cad_number>
          </common_data>
          <subtype>
            <code>01</code>
            <value>Землепользование</value>
          </subtype>
        </object>
        <params>
          <category>
            <type>
              <code>003002000000</code>
              <value>Земли населенных пунктов</value>
            </type>
          </category>
          <permitted_use>
            <permitted_use_established>
              <by_document>Жилой частный сектор</by_document>
            </permitted_use_established>
          </permitted_use>
          <permittes_uses_grad_reg>
            <reg_numb_border>52:00-7.1</reg_numb_border>
          </permittes_uses_grad_reg>
          <area>
            <value>438</value>
            <inaccuracy>7</inaccuracy>
          </area>
        </params>
        <address_location>
          <address_type>
            <code>01</code>
            <value>Установленный</value>
          </address_type>
          <address>
            <address_fias>
              <level_settlement>
                <fias>6a7510ae-0e9d-4e74-a36f-57fe370a5149</fias>
                <okato>22401382000</okato>
                <kladr>52000001000103800</kladr>
                <oktmo>22701000001</oktmo>
                <postal_code>603040</postal_code>
                <region>
                  <code>52</code>
                  <value>Нижегородская область</value>
                </region>
                <city>
                  <type_city>г</type_city>
                  <name_city>Нижний Новгород</name_city>
                </city>
              </level_settlement>
              <detailed_level>
                <street>
                  <type_street>пер</type_street>
                  <name_street>Сочинский</name_street>
                </street>
                <level1>
                  <type_level1>уч</type_level1>
                  <name_level1>1</name_level1>
                </level1>
              </detailed_level>
            </address_fias>
            <readable_address>Российская Федерация , Нижегородская обл, городской округ город Нижний Новгород , г Нижний Новгород, пер Сочинский, земельный участок 1</readable_address>
          </address>
        </address_location>
        <cost>
          <value>1010106.84</value>
        </cost>
      </land_record>
    </cadastral_block>
  </cadastral_blocks>
</extract_cadastral_plan_territory>

我无法获取对象的面积(准确地从文件中获取),我使用这段代码:

from bs4 import BeautifulSoup
from pathlib import Path

file_path = r"C:\Users\shirshov\Desktop\2"


fg = '''<?xml version="1.0">
<extract_cadastral_plan_territory>
  <details_statement>
    <group_top_requisites>
      <organ_registr_rights></organ_registr_rights>
      <date_formation>2022-12-02</date_formation>
      <registration_number>****-***/****-*********</registration_number>
    </group_top_requisites>
  </details_statement>
  <details_request>
    <date_received_request>2022-12-02</date_received_request>
    <date_receipt_request_reg_authority_rights>2022-12-02</date_receipt_request_reg_authority_rights>
  </details_request>
  <cadastral_blocks>
    <cadastral_block>
      <cadastral_number>52:18:0010180</cadastral_number>
      <area_quarter>
        <area>1.1</area>
        <unit>059</unit>
      </area_quarter>
      <land_record>
        <object>
          <common_data>
            <type>
              <code>002001001000</code>
              <value>Земельный участок</value>
            </type>
            <cad_number>52:18:0010180:1</cad_number>
          </common_data>
          <subtype>
            <code>01</code>
            <value>Землепользование</value>
          </subtype>
        </object>
        <params>
          <category>
            <type>
              <code>003002000000</code>
              <value>Земли населенных пунктов</value>
            </type>
          </category>
          <permitted_use>
            <permitted_use_established>
              <by_document>Жилой частный сектор</by_document>
            </permitted_use_established>
          </permitted_use>
          <permittes_uses_grad_reg>
            <reg_numb_border>52:00-7.1</reg_numb_border>
          </permittes_uses_grad_reg>
          <area>
            <value>438</value>
            <inaccuracy>7</inaccuracy>
          </area>
        </params>
        <address_location>
          <address_type>
            <code>01</code>
            <value>Установленный</value>
          </address_type>
          <address>
            <address_fias>
              <level_settlement>
                <fias>6a7510ae-0e9d-4e74-a36f-57fe370a5149</fias>
                <okato>22401382000</okato>
                <kladr>52000001000103800</kladr>
                <oktmo>22701000001</oktmo>
                <postal_code>603040</postal_code>
                <region>
                  <code>52</code>
                  <value>Нижегородская область</value>
                </region>
                <city>
                  <type_city>г</type_city>
                  <name_city>Нижний Новгород</name_city>
                </city>
              </level_settlement>
              <detailed_level>
                <street>
                  <type_street>пер</type_street>
                  <name_street>Сочинский</name_street>
                </street>
                <level1>
                  <type_level1>уч</type_level1>
                  <name_level1>1</name_level1>
                </level1>
              </detailed_level>
            </address_fias>
            <readable_address>Российская Федерация , Нижегородская обл, городской округ город Нижний Новгород , г Нижний Новгород, пер Сочинский, земельный участок 1</readable_address>
          </address>
        </address_location>
        <cost>
          <value>1010106.84</value>
        </cost>
      </land_record>
    </cadastral_block>
  </cadastral_blocks>
</extract_cadastral_plan_territory>'''



for path in Path(file_path).rglob('*.XML'):
    with open(path, 'r',encoding='utf-8') as parse:
        xml = parse.read()
        soup = BeautifulSoup(xml, features="xml")
        print(soup.select_one('permitted_use_established by_document'))
        print(soup.select_one('area value'))

在这里,我尝试将文件结构放入一个变量中,我成功了,但它在 xml 文件中不起作用。还尝试了地方“soup = BeautifulSoup(xml, features="xml")”,使用了“soup = BeautifulSoup(xml, features="lxml")”,然后给出了“<by_document>Residential Private Sector</by_document>”,但不显示“438”。假设问题出在xml文件的编码上,因为 当我从变量(用 visual studio 代码编写)复制文本并将其粘贴到记事本中并保存在 xml 中时,它搜索没有问题,但是当通过记事本重新保存原始 xml 文件时,它没有提供任何内容。也许这就是问题所在。尝试重装python、visual studio code、lxml、bs4,无果。

python
  • 2 2 个回答
  • 24 Views

2 个回答

  • Voted
  1. Best Answer
    Михаил Ширшов
    2022-12-09T16:27:24Z2022-12-09T16:27:24Z

    原来是用不同的编码

    for path in Path(file_path).rglob('*.XML'):
        with open(path, 'r',encoding='utf-8-sig') as parse:
            xml = parse.read()
            soup = BeautifulSoup(xml, features="xml")
            print(soup.select_one('permitted_use_established by_document'))
            print(soup.select_one('area value'))
    
    • 1
  2. gfd2
    2022-12-09T16:02:02Z2022-12-09T16:02:02Z

    试试这个,添加.text:

    with open(file_path, 'r', encoding='utf-8') as file:
        xml = file.read()
    soup = BeautifulSoup(xml, features="xml")
    print(soup.select_one("by_document").text)
    print(soup.select_one("area value").text)
    

    结论:

    住宅私营部门

    438

    • 0

相关问题

  • 是否可以以某种方式自定义 QTabWidget?

  • telebot.anihelper.ApiException 错误

  • Python。检查一个数字是否是 3 的幂。输出 无

  • 解析多个响应

  • 交换两个数组的元素,以便它们的新内容也反转

Sidebar

Stats

  • 问题 10021
  • Answers 30001
  • 最佳答案 8000
  • 用户 6900
  • 常问
  • 回答
  • Marko Smith

    我看不懂措辞

    • 1 个回答
  • Marko Smith

    请求的模块“del”不提供名为“default”的导出

    • 3 个回答
  • Marko Smith

    "!+tab" 在 HTML 的 vs 代码中不起作用

    • 5 个回答
  • Marko Smith

    我正在尝试解决“猜词”的问题。Python

    • 2 个回答
  • Marko Smith

    可以使用哪些命令将当前指针移动到指定的提交而不更改工作目录中的文件?

    • 1 个回答
  • Marko Smith

    Python解析野莓

    • 1 个回答
  • Marko Smith

    问题:“警告:检查最新版本的 pip 时出错。”

    • 2 个回答
  • Marko Smith

    帮助编写一个用值填充变量的循环。解决这个问题

    • 2 个回答
  • Marko Smith

    尽管依赖数组为空,但在渲染上调用了 2 次 useEffect

    • 2 个回答
  • Marko Smith

    数据不通过 Telegram.WebApp.sendData 发送

    • 1 个回答
  • Martin Hope
    Alexandr_TT 2020年新年大赛! 2020-12-20 18:20:21 +0000 UTC
  • Martin Hope
    Alexandr_TT 圣诞树动画 2020-12-23 00:38:08 +0000 UTC
  • Martin Hope
    Air 究竟是什么标识了网站访问者? 2020-11-03 15:49:20 +0000 UTC
  • Martin Hope
    Qwertiy 号码显示 9223372036854775807 2020-07-11 18:16:49 +0000 UTC
  • Martin Hope
    user216109 如何为黑客设下陷阱,或充分击退攻击? 2020-05-10 02:22:52 +0000 UTC
  • Martin Hope
    Qwertiy 并变成3个无穷大 2020-11-06 07:15:57 +0000 UTC
  • Martin Hope
    koks_rs 什么是样板代码? 2020-10-27 15:43:19 +0000 UTC
  • Martin Hope
    Sirop4ik 向 git 提交发布的正确方法是什么? 2020-10-05 00:02:00 +0000 UTC
  • Martin Hope
    faoxis 为什么在这么多示例中函数都称为 foo? 2020-08-15 04:42:49 +0000 UTC
  • Martin Hope
    Pavel Mayorov 如何从事件或回调函数中返回值?或者至少等他们完成。 2020-08-11 16:49:28 +0000 UTC

热门标签

javascript python java php c# c++ html android jquery mysql

Explore

  • 主页
  • 问题
    • 热门问题
    • 最新问题
  • 标签
  • 帮助

Footer

RError.com

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

帮助

© 2023 RError.com All Rights Reserve   沪ICP备12040472号-5