2014-11-03 30 views
0

我有一个文本文件,我从中提取两个字符串之间的区域。提取的区域如下所示:在Python中提取数据和转置

title "A" "B" "C" "D" "E" "F" 
number "G1" "G2" "G3" "G4" "G5" "G6" 
data "aaa,bbb" "sss,ddd" "fff,ggg" "rrr,eee" "aaa,ooo" "ggg,aaa" 

我想写入一个csv文件。但是,即使指定“\ t”的作为分隔符之后,这是近逗号成单独的细胞在一排和标签来获取数据到新的线路是这样的分裂:

title 
"A" 
"B" 
"C" 
"D" 
"E" 
"F" 
number 
"G1"  
"G2"  
"G3"  
"G4"  
"G5"  
"G6" 
data 
"aaa bbb"  
"sss ddd"  
"fff ggg"  
"rrr eee"  
"aaa ooo"  
"ggg aaa" 

我需要这样说:

title A B C D E F 
number G1 G2 G3 G4 G5 G6 
data aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa 

位于单独的单元格中,由制表符分隔。我感谢任何帮助。

+0

'提取的区域看起来像这样<< - 你有这个提取的区域在列表/字符串/文件/ ...? – inspectorG4dget 2014-11-03 01:57:13

+0

@ inspectorG4dget它目前在一个文件中。我用'if line.startswith(“!Sample_title”): copy = True outfile.write(line)'写入文件。 – abn 2014-11-03 01:58:45

回答

0

infile.csv:

title "A" "B" "C" "D" "E" "F" 
number "G1" "G2" "G3" "G4" "G5" "G6" 
data "aaa,bbb" "sss,ddd" "fff,ggg" "rrr,eee" "aaa,ooo" "ggg,aaa" 

outfile.csv:

title A B C D E F 
number G1 G2 G3 G4 G5 G6 
data aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa 

代码:使用

In [40]: import csv 

In [41]: with open('infile.csv') as infile, open('outfile.csv', 'w') as outfile: 
    ....:  writer = csv.writer(outfile, delimiter='\t') 
    ....:  for row in csv.reader(infile, delimiter='\t', quotechar='"'): 
    ....:   writer.writerow(row) 
    ....:   
0

正则表达式

f=open('yoyr_file.txt','r') 
f=f.readlines() 
for x in f: 
    print " ".join(re.findall('\w+,?\w*',x)) 

输出:

'title A B C D E F' 
'number G1 G2 G3 G4 G5 G6' 
'data aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa' 

readlines()将阅读您的文件作为行列表,那么我遍历它来寻找模式。当你得到这个模式时,你可以像任何你想要的那样格式化它。