Python的正则表达式为多个分隔符，包括使用正则表达式双引号

-4

代码在Python可以像这样Python的正则表达式为多个分隔符，包括使用正则表达式双引号

输入执行的东西：

> https://test.com, 2017-08-14, "This is the title with , and "anything" in it", "This is the paragraph also with , and "anything" in it"

理想的输出：

['https://test.com', '2017-08-14', 'This is the title with , and "anything" in it', 'This is the paragraph also with , and "anything" in it']

来源

2017-08-16 Yin Yin

欢迎来到Stack Overflow。这不是代码或正则表达式写入服务。一旦你努力自己解决问题并遇到困难，我们很乐意提供帮助。当你这样做时，你可以解释你遇到的问题，包含*相关*代码，并且询问有关该代码的**特定问题**，我们可以尝试提供帮助。祝你好运。 –

不客气。 – dat3450

有您可以使用多种方法拆分。

香草内置分割方法接受分隔符作为一个参数，并会做什么是对锡写的，正是在已经指定的任何分隔符拆分字符串，返回它作为一个列表。

在你的情况，你想要的分隔符为“”但只有不在引号内的逗号。在一般情况下，这样的事情你可以做：

foo = 'https://test.com, 2017-08-14, "This is the title with , and "anything" in it", "This is the paragraph also with , and "anything" in it"' 


print foo.split(',') 
#but this has the caveat that you don't have any ','s within your input as those will become delimitation points as well, which you do not want.

在这种特殊情况下，你也可以匹配的发言权“” 但是这也将失败，因为你的输入有一个元素title with , and "any，这将是不正确的拆分。

而在这种情况下，我们可以使用shlex和使用它的方法split。现在，这种拆分方法将使用空格来设置分隔符。

那么，这样做的：

print [_ for _ in shlex.split(foo)]

会给我们一些更接近我们想要的东西，但不完全：

>>> ['https://test.com,', '2017-08-14,', 'This is the title with , and anything in it,', 'This is the paragraph also with , and anything in it']

可以看出，它在要素讨厌逗号，我们不想要。

不幸的是，我们不能做

print [_[:-1] for _ in shlex.split(foo)]

为将切断在 '它' 最后 'T'，但我们可以使用内置的字符串在

rstrip

方法

和匹配任何逗号在每个元件的端部：

print [_.rstrip(',') for _ in shlex.split(foo)]

给输出：

>>> ['https://test.com', '2017-08-14', 'This is the title with , and anything in it', 'This is the paragraph also with , and anything in it']

这是非常接近我们想要什么，但不完全是！（缺少的“围绕‘什么’ - shlex吃掉这件事！）

但是，我们非常接近，我会离开，轻微珍闻为你的功课，你应该尝试的解决方案。首先像其他人发布的那样。

资源：

https://www.tutorialspoint.com/python/string_split.htm

https://docs.python.org/2/library/shlex.html

附：提示：也看看csv模块。

来源

2017-08-16 02:29:02 srath

这是不正确的。 OP不希望在''''内部'''被分割，见理想输出。 – lincr

@lincr Oopsies，你说的没错，它会以任何逗号分割字符串，认为这是非常微不足道的问题。我匆匆地看了一下输入= p。我将代码更新了一些，但留下了他的问题让他真正尝试完成，谢谢。 – srath

Python的正则表达式为多个分隔符，包括使用正则表达式双引号

回答

相关问题