提取文本使用python

-3

我写一个脚本提取从文件数据和分割数据到多个文件的内容为每个文件由5分裂“@” S提取文本使用python

实施例：

@@@@@ 

hello 

@@@@@ 

world 

@@@@@

在这种情况下

，“你好”应该是在一个文件和“世界”应在另一个文件

我使用python

来源

2016-09-30 nijeesh joshy

告诉我们您现在的代码请 – Ivaro18

该程序的哪一部分是您遇到问题？ – JohnnyWineShirt

文件如果我正确理解你的要求，你要能够把输入从一个文件的分隔符@@@@@

@@@@@ 
hello 
@@@@@ 
world 
@@@@@

，这将产生用于每一个块之间的文件

hello

和

world

您可以使用re.split获得劈叉

splits = re.split("[@]{5}\n", input_buffer)

会看到这样的（注：以上数据假设分裂还包括换行符）

['', 'hello\n', 'world\n', '']

和以仅获得具有实际文本的分割（假定要删除尾随的新行）

[i.strip() for i in splits if i]

输出文件名也未指定，以便使用

for index, val in enumerate([i.strip() for i in splits if i]): 
    with open("output%d"%index, "w+") as f:

创建一个名为OUTPUT0文件，outputN

import re 
import StringIO 

input_text = '''@@@@@ 
hello 
@@@@@ 
world 
@@@@@ 
''' 
string_file = StringIO.StringIO(input_text) 
input_buffer = string_file.read() 

splits = re.split("[@]{5}\n", input_buffer) 
for index, val in enumerate([i.strip() for i in splits if i]): 
    with open("output%d"%index, "w+") as f: 
     f.write(val)

只是一个帮手，能显着使用不同的正则表达式来拆分上，改变输出名称更适合的东西等

此外，如果作为这个问题的标题说[ - 和 - ]拆分之间的文本可以获得使用re.findall而不是

input_text = '''[-hello-] 
[-world-] 
''' 
string_file = StringIO.StringIO(input_text) 

input_buffer = string_file.read() 
splits = re.findall("\[-(.*)-\]", input_buffer) 
for index, val in enumerate(splits): 
    with open("output%d"%index, "w+") as f: 
     f.write(val)

来源

2016-09-30 13:47:32

很确定'[@] {5} \ n'不匹配最后的'@@@@@'。也许更好：'[@] {5} \ n？'或者完全删除换行符并让'strip（）'完成工作。 – brianpck

@brianpck是正确的，我假设换行符被终止的文件， –

这可能做的伎俩：

with open('a.txt') as r: #open source file and assign it to variable r 
    r = r.read().split('@@@@@') #read the contents and break it into list of elements separated by '@@@@@' 
    new = [item.strip() for item in r if item] #clean empty rows from the list 

for i, item in enumerate(new): #iterate trough new list and assign a number to each iteration starting with 0 (default) 
    with open('a%s.txt' % i+1, 'w') as w: #create new file for each element from the list that will be named 'a' + 'value of i + 1' + '.txt' 
     w.write(item) #writing contents of current element into file

这将阅读您的文件，我叫“A.TXT”和生成名为a1.txt, a2.txt ... an.txt

来源

2016-09-30 13:27:37 zipa

你能解释一下我的工作原理吗？ –

@nijeeshjoshy我为每一行添加了评论。希望它清除图片。 – zipa

提取文本使用python

回答

相关问题