如何遍历列表直至找到匹配项？

我有这样一段代码：如何遍历列表直至找到匹配项？

class MySpider(Spider): 
name = 'smm' 
allowed_domains = ['*'] 
start_urls = ['http://www.theguardian.com/media/social-media'] 
def parse(self, response): 
    items = [] 
    #Define keywords present in metadata to scrap the webpage 
    keywords = ['social media','social business','social networking','social marketing','online marketing','social selling', 
     'social customer experience management','social cxm','social cem','social crm','google analytics','seo','sem', 
     'digital marketing','social media manager','community manager'] 
    for link in response.xpath("//a"): 
     item = SocialMediaItem() 
     #Extract webpage keywords 
     metakeywords = link.xpath('//meta[@name="keywords"]').extract() 

     #Compare keywords and extract if one of the defined keyboards is present in the metadata 
     for metaKW in metakeywords: 
      if metaKW in keywords: 
       item['SourceTitle'] = link.xpath('/html/head/title').extract() 
       item['TargetTitle'] = link.xpath('text()').extract() 
       item['link'] = link.xpath('@href').extract() 
       outbound = str(link.xpath('@href').extract()) 
       if 'http' in outbound: 
        items.append(item) 
    return items

它的目的是比较变量“关键字”（列表）与可变“的MetaKeywords”，这是使用link.xpath('//meta[@name="keywords"]').extract()提取的网页的关键字。比较它时，如果找到单个匹配项，则应提取项目并在最后一条if语句中将它们追加为shownb。但是，它没有结果。我知道它应该抛出一些东西，因为我检查了网页的网址（http://www.socialmediaexaminer.com/）。 Chan有人帮忙吗？干杯！

Dani

来源

2014-12-03 Dani Valverde

这是目前不可能回答你的问题有两个主要原因：（1）由于“for”循环的结构，“关键字”变量一次一个地接受“元关键字”中每个项目的值;因此你的“if”陈述是微不足道的，因为它总是会评估为真。如果“元关键字”是列表的列表/集合，并且您没有显示或具体告诉我们“元关键字”是什么类型的对象，则“关键字”只是一个列表。（2）阅读这段代码的人不知道“item”对象是什么，也不知道“items”列表是如何或何时被初始化的。 – 2014-12-03 18:00:20

感谢您的意见洋红色新星。我更新了代码。 – 2014-12-03 18:13:43

感谢您的更新，但请仔细阅读我的第一点。如果你还没有，我想你需要阅读for循环的python文档。 – 2014-12-03 18:28:14

看到我对你的代码行之间的评论。

def parse(self, response): 
      items = [] 
      #Define keywords present in metadata to scrap the webpage 
      keywords = ['social media','social business','social networking','social marketing','online marketing','social selling','social customer experience management','social cxm','social cem','social crm','google analytics','seo','sem','digital marketing','social media manager','community manager'] 
     for link in response.xpath("//a"): 
      item = SocialMediaItem() 
      #Extract webpage keywords 
      metakeywords = link.xpath('//meta[@name="keywords"]').extract()

什么样的数据呢.extract（）返回？你能举个例子吗？我对您使用的图书馆不熟悉。

  #compare keywords and extract if one of the defined keyboards is present in the metadata 
      for keywords in metakeywords:

这是第一个主要问题。 for循环中的“in”之前定义的变量应该与您已经定义的任何变量共享一个名称。您应该使用新名称，如 “metaKW”。这个变量的值是动态设置的，因为“metakeywords”中的每个项目都由您的for循环检查。

   if keywords in metakeywords:

，因为“关键字”需要对每个项目的中的MetaKeywords一个接一个的值，这语句将一定总是为true，所以它的琐碎/不必要的。然而，假设你实际上引用了你在代码中定义得更高的关键字列表...... 在这种情况下，当你遍历metakeywords列表（我假设它是一个列表或其他种类的迭代）时， “关键字”和“元关键词”都不会改变它的价值。因此你会一直询问同样的问题，而不改变问题的条款，并得到相同的结果。清理一些这些问题，并让我们知道，如果你还没有得到预期的结果。

    item['SourceTitle'] = link.xpath('/html/head/title').extract() 
        item['TargetTitle'] = link.xpath('text()').extract() 
        item['link'] = link.xpath('@href').extract() 
        outbound = str(link.xpath('@href').extract()) 
        if 'http' in outbound: 
         items.append(item) 
         return items

编辑，由被多一点建设性的方式：

你想用的是这样的一种循环...

for metaKW in metakeywords: 
    if metaKW in keywords: 
     # the rest of your code.

来源

2014-12-03 18:38:06

感谢您分享！我会研究代码。如果我不绞尽脑汁，.extract（）会返回一个列表，如：[u'']。不起作用，查看我更新的代码。 – 2014-12-05 16:47:16

如何遍历列表直至找到匹配项？

回答

相关问题