请帮我解决这个问题。我有一个具有这种结构的 xml 文件:
<?xml version="1.0">
<extract_cadastral_plan_territory>
<details_statement>
<group_top_requisites>
<organ_registr_rights></organ_registr_rights>
<date_formation>2022-12-02</date_formation>
<registration_number>****-***/****-*********</registration_number>
</group_top_requisites>
</details_statement>
<details_request>
<date_received_request>2022-12-02</date_received_request>
<date_receipt_request_reg_authority_rights>2022-12-02</date_receipt_request_reg_authority_rights>
</details_request>
<cadastral_blocks>
<cadastral_block>
<cadastral_number>52:18:0010180</cadastral_number>
<area_quarter>
<area>1.1</area>
<unit>059</unit>
</area_quarter>
<land_record>
<object>
<common_data>
<type>
<code>002001001000</code>
<value>Земельный участок</value>
</type>
<cad_number>52:18:0010180:1</cad_number>
</common_data>
<subtype>
<code>01</code>
<value>Землепользование</value>
</subtype>
</object>
<params>
<category>
<type>
<code>003002000000</code>
<value>Земли населенных пунктов</value>
</type>
</category>
<permitted_use>
<permitted_use_established>
<by_document>Жилой частный сектор</by_document>
</permitted_use_established>
</permitted_use>
<permittes_uses_grad_reg>
<reg_numb_border>52:00-7.1</reg_numb_border>
</permittes_uses_grad_reg>
<area>
<value>438</value>
<inaccuracy>7</inaccuracy>
</area>
</params>
<address_location>
<address_type>
<code>01</code>
<value>Установленный</value>
</address_type>
<address>
<address_fias>
<level_settlement>
<fias>6a7510ae-0e9d-4e74-a36f-57fe370a5149</fias>
<okato>22401382000</okato>
<kladr>52000001000103800</kladr>
<oktmo>22701000001</oktmo>
<postal_code>603040</postal_code>
<region>
<code>52</code>
<value>Нижегородская область</value>
</region>
<city>
<type_city>г</type_city>
<name_city>Нижний Новгород</name_city>
</city>
</level_settlement>
<detailed_level>
<street>
<type_street>пер</type_street>
<name_street>Сочинский</name_street>
</street>
<level1>
<type_level1>уч</type_level1>
<name_level1>1</name_level1>
</level1>
</detailed_level>
</address_fias>
<readable_address>Российская Федерация , Нижегородская обл, городской округ город Нижний Новгород , г Нижний Новгород, пер Сочинский, земельный участок 1</readable_address>
</address>
</address_location>
<cost>
<value>1010106.84</value>
</cost>
</land_record>
</cadastral_block>
</cadastral_blocks>
</extract_cadastral_plan_territory>
我无法获取对象的面积(准确地从文件中获取),我使用这段代码:
from bs4 import BeautifulSoup
from pathlib import Path
file_path = r"C:\Users\shirshov\Desktop\2"
fg = '''<?xml version="1.0">
<extract_cadastral_plan_territory>
<details_statement>
<group_top_requisites>
<organ_registr_rights></organ_registr_rights>
<date_formation>2022-12-02</date_formation>
<registration_number>****-***/****-*********</registration_number>
</group_top_requisites>
</details_statement>
<details_request>
<date_received_request>2022-12-02</date_received_request>
<date_receipt_request_reg_authority_rights>2022-12-02</date_receipt_request_reg_authority_rights>
</details_request>
<cadastral_blocks>
<cadastral_block>
<cadastral_number>52:18:0010180</cadastral_number>
<area_quarter>
<area>1.1</area>
<unit>059</unit>
</area_quarter>
<land_record>
<object>
<common_data>
<type>
<code>002001001000</code>
<value>Земельный участок</value>
</type>
<cad_number>52:18:0010180:1</cad_number>
</common_data>
<subtype>
<code>01</code>
<value>Землепользование</value>
</subtype>
</object>
<params>
<category>
<type>
<code>003002000000</code>
<value>Земли населенных пунктов</value>
</type>
</category>
<permitted_use>
<permitted_use_established>
<by_document>Жилой частный сектор</by_document>
</permitted_use_established>
</permitted_use>
<permittes_uses_grad_reg>
<reg_numb_border>52:00-7.1</reg_numb_border>
</permittes_uses_grad_reg>
<area>
<value>438</value>
<inaccuracy>7</inaccuracy>
</area>
</params>
<address_location>
<address_type>
<code>01</code>
<value>Установленный</value>
</address_type>
<address>
<address_fias>
<level_settlement>
<fias>6a7510ae-0e9d-4e74-a36f-57fe370a5149</fias>
<okato>22401382000</okato>
<kladr>52000001000103800</kladr>
<oktmo>22701000001</oktmo>
<postal_code>603040</postal_code>
<region>
<code>52</code>
<value>Нижегородская область</value>
</region>
<city>
<type_city>г</type_city>
<name_city>Нижний Новгород</name_city>
</city>
</level_settlement>
<detailed_level>
<street>
<type_street>пер</type_street>
<name_street>Сочинский</name_street>
</street>
<level1>
<type_level1>уч</type_level1>
<name_level1>1</name_level1>
</level1>
</detailed_level>
</address_fias>
<readable_address>Российская Федерация , Нижегородская обл, городской округ город Нижний Новгород , г Нижний Новгород, пер Сочинский, земельный участок 1</readable_address>
</address>
</address_location>
<cost>
<value>1010106.84</value>
</cost>
</land_record>
</cadastral_block>
</cadastral_blocks>
</extract_cadastral_plan_territory>'''
for path in Path(file_path).rglob('*.XML'):
with open(path, 'r',encoding='utf-8') as parse:
xml = parse.read()
soup = BeautifulSoup(xml, features="xml")
print(soup.select_one('permitted_use_established by_document'))
print(soup.select_one('area value'))
在这里,我尝试将文件结构放入一个变量中,我成功了,但它在 xml 文件中不起作用。还尝试了地方“soup = BeautifulSoup(xml, features="xml")”,使用了“soup = BeautifulSoup(xml, features="lxml")”,然后给出了“<by_document>Residential Private Sector</by_document>”,但不显示“438”。假设问题出在xml文件的编码上,因为 当我从变量(用 visual studio 代码编写)复制文本并将其粘贴到记事本中并保存在 xml 中时,它搜索没有问题,但是当通过记事本重新保存原始 xml 文件时,它没有提供任何内容。也许这就是问题所在。尝试重装python、visual studio code、lxml、bs4,无果。
原来是用不同的编码
试试这个,添加.text:
结论: