pywikibot的查找和替换模式找到任何英文单词的正则表达式

我编写了以下程序来脱开a ta.wikipedia page中的英文单词。 Delink意味着删除英文单词前后的方括号。我是PAWS（pywikibot）的新手。似乎去除可以通过正则表达式（A-Z，a-z）完成。怎么样？pywikibot的查找和替换模式找到任何英文单词的正则表达式

import pywikibot 
import re 

    site = pywikibot.Site('ta', 'wikipedia') 
    page = pywikibot.Page(site, title) 
    page.text = page.text.replace('[[Eudicots]]','Eudicots') 
    page.save()

对不起，我的英语。英语是我的桥梁语言。我不是要求调试。但是如何避免以下重复类型的代码。例如，，以下26（字母）代码有助于删除[[括号。

page.text = page.text.replace('[[A','A') 
page.text = page.text.replace('[[B','B') 
page.text = page.text.replace('[[C','C') 
likewise, A to Z 
page.text = page.text.replace('[[X','X') 
page.text = page.text.replace('[[Y','Y') 
page.text = page.text.replace('[[Z','Z')

然后，我必须删除总是在一个单词结尾的小写字母。因为每个单词都以小写字母结尾。删除小写，我必须写下面的代码，

page.text = page.text.replace('a]]','a') 
    page.text = page.text.replace('b]]','b') 
    page.text = page.text.replace('c]]','c') 
    page.text = page.text.replace('d]]','d') 
    (likewise, for all the 26 English letters) 
    page.text = page.text.replace('x]]','x') 
    page.text = page.text.replace('y]]','y')

我认为这是不好的编码。所以我想使用正则表达式。我希望我提供了维基媒体项目的必要性。

换句话说，我想删除英文单词的括号，而不是英文单词。

来源

2016-12-19 info-farmer

一些与PCRE兼容的正则表达式库可以根据Unicode属性匹配字符类（例如，\p{Latin}可以匹配拉丁脚本的任何字符），但Python的re模块不支持。还有其他Python模块可以用来代替（this answer有详细信息），但只要您只查找ASCII字符，就可以更容易地构建自己的角色类别：[A-Za-z]将匹配在这些范围内的单个字符，并且re.sub('([A-Za-z])]]', '\\1', text)将保留该字符并丢弃括号。

来源

2016-12-20 22:28:23 Tgr

但它只删除]]括号。请参阅[https://ta.wikipedia.org/w/index.php?title=%E0%AE%AA%E0%AE%AF%E0%AE%A9%E0%AE%B0%E0%AF%8D ％3AInfo-farmer％2FPAWS＆type = revision＆diff = 2156707＆oldid = 2156706] page = pywikibot.Page（site，title） page.text = re.sub（'（[A-Za-z]）]]'，'\\ 1 '，page.text） page.save（） –

此外，它不应该删除interwiki链接示例的括号，[[：en：Parkia speciosa]] –

我确定它不难合并为调用：）如果你想在一个正则表达式中使用''（\ [\ [|]]）（？！[A-Za-z]）''就可以。豁免interwikis不是正则表达式是一个很好的工具。你可以用回调来尝试're.sub'，但最好使用[mwparserfromhell]（https://github.com/earwig/mwparserfromhell）。 – Tgr

pywikibot的查找和替换模式找到任何英文单词的正则表达式

回答

相关问题