RError.com

RError.com Logo RError.com Logo

RError.com Navigation

  • 主页

Mobile menu

Close
  • 主页
  • 系统&网络
    • 热门问题
    • 最新问题
    • 标签
  • Ubuntu
    • 热门问题
    • 最新问题
    • 标签
  • 帮助
主页 / 问题 / 1350333
Accepted
Василий Никпуп
Василий Никпуп
Asked:2022-04-16 16:01:49 +0000 UTC2022-04-16 16:01:49 +0000 UTC 2022-04-16 16:01:49 +0000 UTC

如何在不更改格式的情况下更改 docx 中的文本

  • 772

我需要在不更改格式的情况下替换部分文本。我写了这个函数:

import docx
import os

def getText(from_filename, to_filename, old_value, new_value):
    doc = docx.Document(from_filename)
    for paragraph in doc.paragraphs:
        print(paragraph.text)
        new_text = paragraph.text.replace(old_value, new_value)
        print(new_text)
        paragraph.text = new_text
    doc.save(to_filename)

问题是粗体字体和字体苍蝇。我不知道如何做同样的事情,但保留格式。

python
  • 1 1 个回答
  • 10 Views

1 个回答

  • Voted
  1. Best Answer
    gil9red
    2022-04-16T17:06:27Z2022-04-16T17:06:27Z

    为此,您需要使用段落属性runs

    一个简单的选项,但它不起作用(我将其作为示例展示,下面我将展示一个可行的解决方案):

    import datetime as DT
    
    # pip install python-docx
    import docx
    
    
    from_filename = 'template.docx'
    to_filename = 'simple.docx'
    
    
    REPLACING = {
        '${title}': 'My pretty title!',
        '${datetime}': DT.datetime.now().strftime('%Y/%m/%d %H:%M:%S'),
    }
    
    doc = docx.Document(from_filename)
    for p in doc.paragraphs:
        for k, v in REPLACING.items():
            for run in p.runs:
                if k in run.text:
                    new_text = run.text.replace(k, v)
                    run.text = new_text
    
    doc.save(to_filename)
    

    原因是其中的文本runs可能是碎片化的,例如,您{{<ключ>}}和在某些情况下run.text将拥有整个模板字符串,而在某些情况下它将被分成{{, <ключ>, }}.

    我一直在寻找正确处理的算法runs并找到了答案,解决方案:

    import datetime as DT
    
    # pip install python-docx
    import docx
    
    
    # SOURCE: https://stackoverflow.com/a/55733040/5909792
    def docx_replace(doc, data):
        paragraphs = list(doc.paragraphs)
        for t in doc.tables:
            for row in t.rows:
                for cell in row.cells:
                    for paragraph in cell.paragraphs:
                        paragraphs.append(paragraph)
        for p in paragraphs:
            for key, val in data.items():
                key_name = '${{{}}}'.format(key) # I'm using placeholders in the form ${PlaceholderName}
                if key_name in p.text:
                    inline = p.runs
                    # Replace strings and retain the same style.
                    # The text to be replaced can be split over several runs so
                    # search through, identify which runs need to have text replaced
                    # then replace the text in those identified
                    started = False
                    key_index = 0
                    # found_runs is a list of (inline index, index of match, length of match)
                    found_runs = list()
                    found_all = False
                    replace_done = False
                    for i in range(len(inline)):
    
                        # case 1: found in single run so short circuit the replace
                        if key_name in inline[i].text and not started:
                            found_runs.append((i, inline[i].text.find(key_name), len(key_name)))
                            text = inline[i].text.replace(key_name, str(val))
                            inline[i].text = text
                            replace_done = True
                            found_all = True
                            break
    
                        if key_name[key_index] not in inline[i].text and not started:
                            # keep looking ...
                            continue
    
                        # case 2: search for partial text, find first run
                        if key_name[key_index] in inline[i].text and inline[i].text[-1] in key_name and not started:
                            # check sequence
                            start_index = inline[i].text.find(key_name[key_index])
                            check_length = len(inline[i].text)
                            for text_index in range(start_index, check_length):
                                if inline[i].text[text_index] != key_name[key_index]:
                                    # no match so must be false positive
                                    break
                            if key_index == 0:
                                started = True
                            chars_found = check_length - start_index
                            key_index += chars_found
                            found_runs.append((i, start_index, chars_found))
                            if key_index != len(key_name):
                                continue
                            else:
                                # found all chars in key_name
                                found_all = True
                                break
    
                        # case 2: search for partial text, find subsequent run
                        if key_name[key_index] in inline[i].text and started and not found_all:
                            # check sequence
                            chars_found = 0
                            check_length = len(inline[i].text)
                            for text_index in range(0, check_length):
                                if inline[i].text[text_index] == key_name[key_index]:
                                    key_index += 1
                                    chars_found += 1
                                else:
                                    break
                            # no match so must be end
                            found_runs.append((i, 0, chars_found))
                            if key_index == len(key_name):
                                found_all = True
                                break
    
                    if found_all and not replace_done:
                        for i, item in enumerate(found_runs):
                            index, start, length = [t for t in item]
                            if i == 0:
                                text = inline[index].text.replace(inline[index].text[start:start + length], str(val))
                                inline[index].text = text
                            else:
                                text = inline[index].text.replace(inline[index].text[start:start + length], '')
                                inline[index].text = text
                    # print(p.text)
    
    
    if __name__ == '__main__':
        from_filename = 'template.docx'
        to_filename = 'save_style.docx'
    
        REPLACING = {
            'title': 'My pretty title!',
            'date_time': DT.datetime.now().strftime('%Y/%m/%d %H:%M:%S'),
        }
    
        doc = docx.Document(from_filename)
        docx_replace(doc, REPLACING)
    
        doc.save(to_filename)
    

    模板文件 template.docx如下所示:

    在此处输入图像描述

    Save_style.docx 结果:

    在此处输入图像描述

    • 2

相关问题

  • 是否可以以某种方式自定义 QTabWidget?

  • telebot.anihelper.ApiException 错误

  • Python。检查一个数字是否是 3 的幂。输出 无

  • 解析多个响应

  • 交换两个数组的元素,以便它们的新内容也反转

Sidebar

Stats

  • 问题 10021
  • Answers 30001
  • 最佳答案 8000
  • 用户 6900
  • 常问
  • 回答
  • Marko Smith

    表格填充不起作用

    • 2 个回答
  • Marko Smith

    提示 50/50,有两个,其中一个是正确的

    • 1 个回答
  • Marko Smith

    在 PyQt5 中停止进程

    • 1 个回答
  • Marko Smith

    我的脚本不工作

    • 1 个回答
  • Marko Smith

    在文本文件中写入和读取列表

    • 2 个回答
  • Marko Smith

    如何像屏幕截图中那样并排排列这些块?

    • 1 个回答
  • Marko Smith

    确定文本文件中每一行的字符数

    • 2 个回答
  • Marko Smith

    将接口对象传递给 JAVA 构造函数

    • 1 个回答
  • Marko Smith

    正确更新数据库中的数据

    • 1 个回答
  • Marko Smith

    Python解析不是css

    • 1 个回答
  • Martin Hope
    Alexandr_TT 2020年新年大赛! 2020-12-20 18:20:21 +0000 UTC
  • Martin Hope
    Alexandr_TT 圣诞树动画 2020-12-23 00:38:08 +0000 UTC
  • Martin Hope
    Air 究竟是什么标识了网站访问者? 2020-11-03 15:49:20 +0000 UTC
  • Martin Hope
    Qwertiy 号码显示 9223372036854775807 2020-07-11 18:16:49 +0000 UTC
  • Martin Hope
    user216109 如何为黑客设下陷阱,或充分击退攻击? 2020-05-10 02:22:52 +0000 UTC
  • Martin Hope
    Qwertiy 并变成3个无穷大 2020-11-06 07:15:57 +0000 UTC
  • Martin Hope
    koks_rs 什么是样板代码? 2020-10-27 15:43:19 +0000 UTC
  • Martin Hope
    Sirop4ik 向 git 提交发布的正确方法是什么? 2020-10-05 00:02:00 +0000 UTC
  • Martin Hope
    faoxis 为什么在这么多示例中函数都称为 foo? 2020-08-15 04:42:49 +0000 UTC
  • Martin Hope
    Pavel Mayorov 如何从事件或回调函数中返回值?或者至少等他们完成。 2020-08-11 16:49:28 +0000 UTC

热门标签

javascript python java php c# c++ html android jquery mysql

Explore

  • 主页
  • 问题
    • 热门问题
    • 最新问题
  • 标签
  • 帮助

Footer

RError.com

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

帮助

© 2023 RError.com All Rights Reserve   沪ICP备12040472号-5