0
我正在从一个目录中挖出按摩治疗师的名字以及他们的地址。这些地址全部保存到CSV的一列中,但是每个治疗师的标题/名称在2列或3列中每列保存一个字。如何保存字符串,在Python中每列一个单词?
我需要做什么才能得到多数民众赞成被提取一列中保存,就像被保存在地址字符串办? (代码最上面两行是从页例如HTML,下一组的代码是从脚本针对此元素的提取物)
<span class="name">
<img src="/images/famt-placeholder-sm.jpg" class="thumb" alt="Tiffani D Abraham"> Tiffani D Abraham</span>
import mechanize
from lxml import html
import csv
import io
from time import sleep
def save_products (products, writer):
for product in products:
for price in product['prices']:
writer.writerow([ product["title"].encode('utf-8') ])
writer.writerow([ price["contact"].encode('utf-8') ])
writer.writerow([ price["services"].encode('utf-8') ])
f_out = open('mtResult.csv', 'wb')
writer = csv.writer(f_out)
links = ["https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY","https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY&PageIndex=2&PageSize=10","https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY&PageIndex=3&PageSize=10","https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY&PageIndex=4&PageSize=10","https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY&PageIndex=5&PageSize=10","https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY&PageIndex=6&PageSize=10","https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY&PageIndex=7&PageSize=10", "https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY&PageIndex=8&PageSize=10", "https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY&PageIndex=9&PageSize=10", "https://www.amtamassage.org/findamassage/results.html?match=exact&l=NY&PageIndex=10&PageSize=10" ]
br = mechanize.Browser()
for link in links:
print(link)
r = br.open(link)
content = r.read()
products = []
tree = html.fromstring(content)
product_nodes = tree.xpath('//ul[@class="famt-results"]/li')
for product_node in product_nodes:
product = {}
price_nodes = product_node.xpath('.//a')
product['prices'] = []
for price_node in price_nodes:
price = {}
try:
product['title'] = product_node.xpath('.//span[1]/text()')[0]
except:
product['title'] = ""
try:
price['services'] = price_node.xpath('./span[2]/text()')[0]
except:
price['services'] = ""
try:
price['contact'] = price_node.xpath('./span[3]/text()')[0]
except:
price['contact'] = ""
product['prices'].append(price)
products.append(product)
save_products(products, writer)
f_out.close()
请添加数据的一部分,你的问题,它会更容易明白你的意思。 – LetzerWille
@LetzerWille这是我从提取的页面:'的https://www.amtamassage.org/findamassage/results.html匹配=确切&L = NY' - 这就是正在发生的每3个治疗师行CSV,用的顺序?从名字,地址,专业名称降序。地址和专业化仅保存在列A中,但名称分布在列B,C和D上,每个列中都有一个字。 我已经发布了整个脚本。 – McLeodx
我已经意识到问题是产品[“title”]的数据是一个字符串而不是一个列表(不同于'services'和'contact'的数据都是列表)。我知道我需要改变导致它期望列表而不是字符串的东西,但我不确定哪部分代码需要调整。 – McLeodx