Python中添加一个字符串匹配列表与多个项目

我的工作是由有2场，URL和标题的HTML页面检索列表中的代码...Python中添加一个字符串匹配列表与多个项目

的URL反正有/URL....启动，并我需要附加“http://website.com”给每个从re.findall返回的变化。

到目前为止的代码是这样的：

bsoup=bs(html) 
tag=soup.find('div',{'class':'item'}) 
reg=re.compile('<a href="(.+?)" rel=".+?" title="(.+?)"') 
links=re.findall(reg,str(tag)) 
*(append "http://website.com" to the href"(.+?)" field)* 
return links

来源

2015-12-25 Aenema

http://stackoverflow.com/a/1732454/1459669请使用美丽的汤来找到链接！ –

@CrazyPython除非你想召唤克苏鲁。 – timgeb

@timgeb你永远不知道，他可能想要召唤他。然后我们需要将它迁移到StackExchange Skeptics或Worldbuilding ... –

尝试：

for link in tag.find_all('a'): 
    link['href'] = 'http://website.com' + link['href']

然后使用这些输出方法之一：

return str(soup)应用更改后，让你的文档。

return tag.find_all('a')获取所有链接元素。

return [str(i) for i in tag.find_all('a')]让您将所有链接元素转换为字符串。

现在，不要试图用正则表达式解析HTML，而你的已经有一个XML解析器正在工作。

来源

2015-12-26 00:11:00

糟糕，我的不好。网址附件的反转顺序。 –

Python中添加一个字符串匹配列表与多个项目

回答

相关问题