Python - 按分隔符数量分割大字符串

我还在学习Python，并且我有一个问题我一直无法解决。我有一个非常长的字符串（数百万行），我希望根据指定的分隔符数来分割成更小的字符串长度。Python - 按分隔符数量分割大字符串

例如：

ABCDEF 
// 
GHIJKLMN 
// 
OPQ 
// 
RSTLN 
// 
OPQR 
// 
STUVW 
// 
XYZ 
//

在这种情况下，我想根据“//”分裂和分隔符的第n次出现之前返回的所有行的字符串。

因此，通过由1 //分割字符串的输入将返回：

ABCDEF

2由//分割字符串的输入将返回：

ABCDEF 
// 
GHIJKLMN

的输入分割字符串由// 3将返回：

ABCDEF 
// 
GHIJKLMN 
// 
OPQ

等等... 然而，原来的200万行字符串的长度似乎是一个问题，当我简单地尝试拆分整个字符串和“//”并且仅使用单个索引时。（我得到一个内存错误）也许Python不能在一个分割中处理这么多行？所以我不能那样做。

我在寻找，我并不需要将整个字符串分成十万索引时，我可能只需要100的方式，而只是从头开始，直到某一点，停止和返回在它之前的一切，我认为也可能更快？我希望我的问题尽可能清楚。

有没有简单或优雅的方式来实现这一目标？谢谢！

来源

2015-06-04 Indie

为什么不使用发生器读取前n个项目，直到读取所需数量的“//”分隔符？这样你只能读你需要的东西 –

谢谢，我还会看看发电机，因为我不熟悉它们。 – Indie

请显示您迄今为止尝试过的代码。 –

如果你想在文件中使用文件而不是字符串，下面是另一个答案。

该版本是作为读取行并立即打印出来的函数编写的，直到找到指定数量的分隔符为止（不需要额外的内存来存储整个字符串）。

def file_split(file_name, delimiter, n=1): 
    with open(file_name) as fh: 
     for line in fh: 
      line = line.rstrip() # use .rstrip("\n") to only strip newlines 
      if line == delimiter: 
       n -= 1 
       if n <= 0: 
        return 
      print line 

file_split('data.txt', '//', 3)

你可以用它来输出写入到这样一个新的文件：

python split.py > newfile.txt

随着一点点额外的工作，你可以使用参数传递给该程序。

来源

2015-06-04 17:55:28

这实际上是完美的，它没有处理200万行文件的问题。谢谢！ – Indie

例如：

i = 0 
    s = "" 
    fd = open("...") 
    for l in fd: 
     if l[:-1] == delimiter: # skip last '\n' 
      i += 1 
     if i >= max_split: 
      break 
     s += l 
    fd.close()

来源

2015-06-04 14:42:52 sheh

作为一种更有效的方式，你可以阅读杉杉贵分隔符分隔N线，所以如果你是确保所有的线都通过分隔符分裂您可以使用itertools.islice做工作：

from itertools import islice 
with open('filename') as f : 
    lines = islice(f,0,2*N-1)

来源

2015-06-04 14:43:45 Kasramvd

当我看到你的问题使用了在我脑海中的方法一个for循环，你砍了绳子分成几个（比如你叫100），并通过迭代子。

thestring = "" #your string 
steps = 100 #length of the strings you are going to use for iteration 
log = 0 
substring = thestring[:log+steps] #this is the string you will split and iterate through 
thelist = substring.split("//") 
for element in thelist: 
    if(element you want): 
     #do your thing with the line 
    else: 
     log = log+steps 
     # and go again from the start only with this offset

现在你可以通过所有的元素遍历整个200万（！）行字符串。在这里做

最好的东西实际上是让从这个递归函数（如果这是你想要的）：

thestring = "" #your string 
steps = 100 #length of the strings you are going to use for iteration 

def iterateThroughHugeString(beginning): 
    substring = thestring[:beginning+steps] #this is the string you will split and iterate through 
    thelist = substring.split("//") 
    for element in thelist: 
     if(element you want): 
      #do your thing with the line 
     else: 
      iterateThroughHugeString(beginning+steps) 
      # and go again from the start only with this offset

来源

2015-06-04 14:48:07

既然你正在学习的Python这将是建立完整的动态的解决方案是一个挑战。以下是你如何建模一个概念。

注意：以下代码片段仅适用于格式为给定格式的文件（请参阅问题中的“For Instance”）。因此，这是一个静态解决方案。

num = (int(input("Enter delimiter: ")) * 2) 
with open("./data.txt") as myfile: 
    print ([next(myfile) for x in range(num-1)])

现在有这个想法，你可以使用模式匹配等。

来源

2015-06-04 16:25:18

Python - 按分隔符数量分割大字符串

回答

相关问题