您好我是Python和RegEx的新手。我正在尝试使用这两种方法,试图获得一个正则表达式来从用户提取数据,但我期望不同的输入考虑错别字等。因此,在下面的代码中,我随机选择了一些类型的字符串,我希望用户给你举个例子他们如何输入数据。我只对美元之前或之后的数字感兴趣。例如:字符和数字的多个字符串的有效正则表达式
ran = random.randint(1, 7)
print str(ran)
if ran == 1:
examplestring = "This item costs 20 USD contact 9999999"
elif ran == 2:
examplestring = "This item costs USD 20"
elif ran == 3:
examplestring = "This item costs 20 U.S.D"
elif ran == 4:
examplestring = "This item costs 20 usd"
elif ran == 5:
examplestring = "This item costs 20 Usd call to buy : 954545577"
elif ran == 6:
examplestring = "This item costs 20USD"
elif ran == 7:
examplestring = "This item costs usd20"
regex = re.compile(r'\busd|\bu.s.d\b|\bu.s.d.\b', re.I)
examplestring = regex.sub("USD", examplestring)
costs = re.findall(r'\d+.\bUSD\b|\bUSD\b.\d+|\d+USD\b|\bUSD\d+', examplestring)
cost = "".join(str(n) for n in costs[0])
cost = ''.join(x for x in cost if x.isdigit())
print cost + " USD"
使用这些正则表达式我可以得到我想要的是“20美元”的细节。我的问题是,如果我以正确的方式进行,并且能够使代码更好?
你可以做到这一切与一个正则表达式:'(:(<= USD | USD)\ S *(\ d +)?)|(?:\ d + \ s *(?= USD | usd | Usd | USD))'但是由于正则表达式的复杂性,有时这不是一个好的方法。请参阅[此处](https://regex101.com/r/mH0cC8/1)有关它的工作原理的解释。 – RedX