我想在Python中将一个变量设置为一个数组中的字符串元素,这是基于另一个数组中使用的字符串元素。我很难过如何去做。在python和scrapy中检查另一个数组与另一个数组
这里有两个阵列:
genre = ["Dance",
"Festivals",
"Rock/pop"
]
我试图基于在另一个阵列即这三个要素来打印类型时start_urls = [0],流派= [0]:
start_urls = [
"http://www.allgigs.co.uk/whats_on/London/clubbing-1.html",
"http://www.allgigs.co.uk/whats_on/London/festivals-1.html",
"http://www.allgigs.co.uk/whats_on/London/tours-1.html"
]
全码:
genre = ["Dance",
"Festivals",
"Rock/pop"
]
class AllGigsSpider(CrawlSpider):
name = "allGigs" # Name of the Spider. In command promt, when in the correct folder, enter "scrapy crawl Allgigs".
allowed_domains = ["www.allgigs.co.uk"] # Allowed domains is a String NOT a URL.
start_urls = [
"http://www.allgigs.co.uk/whats_on/London/clubbing-1.html",
"http://www.allgigs.co.uk/whats_on/London/festivals-1.html",
"http://www.allgigs.co.uk/whats_on/London/tours-1.html"
]
rules = [
Rule(SgmlLinkExtractor(restrict_xpaths='//div[@class="more"]'), # Search the start URL's for
callback="parse_item",
follow=True),
]
def parse_start_url(self, response):
return self.parse_item(response)
def parse_item(self, response):#http://stackoverflow.com/questions/15836062/scrapy-crawlspider-doesnt-crawl-the-first-landing-page
for info in response.xpath('//div[@class="entry vevent"]'):
item = TutorialItem() # Extract items from the items folder.
item ['artist'] = info.xpath('.//span[@class="summary"]//text()').extract() # Extract artist information.
item ['date'] = info.xpath('.//span[@class="dates"]//text()').extract() # Extract date information.
preview = ''.join(str(s)for s in item['artist'])
#item ['genre'] = i.xpath('.//li[@class="style"]//text()').extract()
client = soundcloud.Client(client_id='401c04a7271e93baee8633483510e263', client_secret='b6a4c7ba613b157fe10e20735f5b58cc', callback='http://localhost:9000/#/callback.html')
tracks = client.get('/tracks', q = preview, limit=1)
for track in tracks:
print track.id
for i, val in enumerate(genre):
print '{} {}'.format(genre[i], start_urls[i])
print genre
#for i, val in enumerate(genre):
# print '{} {}'.format(genre[i], start_urls[i])
item ['trackz'] = track.id
yield item
任何帮助表示赞赏。
如果你想映射两个数组你可以使用'dicts'? – Zero
把你的预期输出\ – itzMEonTV
我的预期输出只是将项目['流派']设置为与被抓取的链接相对应的任何内容。所以第一个url只会发送一个字符串“跳舞”到我的数据库 –