如何用句子编号，句子（按'|'分割）以CSV格式编写文件？

所以我想读取一个文件列表，提取文件ID和摘要。摘要的每个句子都应该写入CSV文件，文件ID，句号和句子用'|'分隔。如何用句子编号，句子（按'|'分割）以CSV格式编写文件？

有人告诉我使用NLTK的标记器。我安装了NLTK，但不知道如何让它与我的代码一起工作。我的Python是3.2.2。下面是我的代码：

import re, os, sys 
import csv 
# Read into the list of files. 
topdir = r'E:\Grad\LIS\LIS590 Text mining\Part1\Part1' # Topdir has to be an object rather than a string, which means that there is no paranthesis. 
matches = [] 
for root, dirnames, filenames in os.walk(topdir): 
    for filename in filenames: 
     if filename.endswith(('.txt','.pdf')): 
      matches.append(os.path.join(root, filename)) 

# Create a list and fill in the list with the abstracts. Every abstract is a string in the list. 
capturedabstracts = [] 
for filepath in matches[:10]: # Testing with the first 10 files. 
    with open (filepath,'rt') as mytext: 
    mytext=mytext.read() 

     # code to capture files 
    matchFile=re.findall(r'File\s+\:\s+(\w\d{7})',mytext)[0] 
    capturedfiles.append(matchFile) 


    # code to capture abstracts 
    matchAbs=re.findall(r'Abstract\s+\:\s+(\w.+)'+'\n',mytext)[0] 
    capturedabstracts.append(matchAbs) 
    print (capturedabstracts) 

with open('Abstract.csv', 'w') as csvfile: 
writer = csv.writer(csvfile) 
for data in capturedabstracts: 
    writer.writerow([data])

我的Python的初学者，我可能无法理解你的意见，这将是巨大的，如果你能提供与修订后的代码的注释。

来源

2014-03-05 Q-ximi

尝试使用writerow

尝试是这样的：

with open('Abstract.csv', 'w') as csvfile: 
    writer = csv.writer(csvfile) 
    for data in capturedabstracts: 
     writer.writerow([data])

来源

2014-03-05 22:14:16 asdoylejr

“...应该写入CSV文件，文件ID，句子编号和句子**由'|'**。”分隔。 –

作为第一个尝试，看看a sentence tokenizer和文本分割成一个列表，然后使用writerows存储到CSV：

with file(u'Abstract.csv','w') as outfile: 
    sent_detector = nltk.data.load('tokenizers/punkt/english.pickle') 
    list_of_sentences = sent_detector.tokenize(text.strip()) 
    writer = csv.DictWriter(outfile, headers = ['phrase'], delimiter = '|', quotechar = None, quoting = csv.QUOTE_NONE, escapechar="\\") 
    for phrase in list_of_sentences: 
     phrasedict = {'phrase':phrase} 
     writer.writerow(phrase) 
    writer.close()

来源

2014-03-05 22:12:49 hd1

如何用句子编号，句子（按'|'分割）以CSV格式编写文件？

回答

相关问题