如何在多个文件中保存多个输出，其中每个文件都有不同的标题来自python中的对象？

我正在从网站上抓取RSS源（http://www.gfrvitale.altervista.org/index.php/autismo-in?format=feed&type=rss）。我已经写下了一个脚本来提取和纯化来自每个Feed的文本。我的主要问题是将每个项目的每个文本保存在一个不同的文件中，我还需要为每个文件命名该项目的正确标题exctractet。我的代码是：如何在多个文件中保存多个输出，其中每个文件都有不同的标题来自python中的对象？

for item in myFeed["items"]: 
    time_structure=item["published_parsed"] 
    dt = datetime.fromtimestamp(mktime(time_structure)) 

    if dt>t: 

    link=item["link"]   
    response= requests.get(link) 
    doc=Document(response.text) 
    doc.summary(html_partial=False) 

    # extracting text 
    h = html2text.HTML2Text() 

    # converting 
    h.ignore_links = True #ignoro i link 
    h.skip_internal_links=True #ignoro i link esterni 
    h.inline_links=True 
    h.ignore_images=True #ignoro i link alle immagini 
    h.ignore_emphasis=True 
    h.ignore_anchors=True 
    h.ignore_tables=True 

    testo= h.handle(doc.summary()) #testo estratto 

    s = doc.title()+"."+" "+testo #contenuto da stampare nel file finale 

    tit=item["title"] 

    # save each file with it's proper title 
    with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f: 
     f.write(s) 
     f.close()

的错误是：

File "<ipython-input-57-cd683dec157f>", line 34 with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f: 
           ^
SyntaxError: invalid syntax

来源

2016-10-02 CosimoCD

你需要把逗号后%tit

应该是：

#save each file with it's proper title 
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f: 
    f.write(s) 
    f.close()

但是，如果你的文件名称具有无效字符，它将返回一个错误（例如使用nltk

... 
tit = item["title"] 
tit = tit.replace(' ', '').replace("'", "").replace('?', '') # Not the best way, but it could help for now (will be better to create a list of stop characters) 

with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f: 
    f.write(s) 
    f.close()

其他方式：）

你可以试试这个代码

from nltk.tokenize import RegexpTokenizer 
tokenizer = RegexpTokenizer(r'\w+') 
tit = item["title"] 
tit = tokenizer.tokenize(tit) 
tit = ''.join(tit) 
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f: 
    f.write(s) 
    f.close()

来源

2016-10-02 15:26:58 estebanpdl

我做到了，但它不工作，我得到这个错误：C：\ Anaconda2 \ LIB \编解码器.pyc in open（filename，mode，encoding，errors，buffering） 894＃以二进制模式强制打开文件 895 mode = mode +'b' - > 896 file = __builtin __。open（filename，mode，缓冲） 897 if encoding is None： 898 return file IOError：[Errno 22] invalid mode（'wb'）or filename：u'testo_La Comunicazione Facilitata？ Parliamone。 – CosimoCD

代码是正确的。逗号在目标'％tit'之后，而不是之前。那是另一个错误。我会检查。 – estebanpdl

期望的输出是什么？（即'.csv'，'.txt'） – estebanpdl

首先，你放错地方的逗号，它应该是%tit不前了。

其次，您不需要关闭文件，因为您使用的with语句会自动为您执行。编解码器从哪里来？我没有看到任何其他地方....反正，正确with说法应该是：

with open("testo_%s" %tit, "w", encoding="utf-8") as f: 
    f.write(s)

来源

2016-10-02 19:34:03 geo1230

我已经运行了上面的代码，但它给了错误。现在我正在研究：使用io.open（“testo _”+ tit，“w”，encoding =“utf-8”）作为f： f.write（s） – CosimoCD

它给了什么错误？你应该提供一些工作来...和命名你应该坚持''testo_％s“％tit'，因为我不认为''testo _”+ tit'会起作用（但我可能只是错误） – geo1230

我已经运行了上面的代码，但是它给我错误。它说这个函数不会接受象参数那样的％tit。现在我正在研究：使用io.open（“testo _”+ tit，“w”，encoding =“utf-8”）作为f： f.write（s）它的功能部分是因为它保存了第一项他的正确头衔，而不是停止。我得到这个新的错误：IOError：[Errno 22]无效的参数：u'testo_La Comunicazione Facilitata？ Parliamone ......” – CosimoCD

如何在多个文件中保存多个输出，其中每个文件都有不同的标题来自python中的对象？

回答

相关问题