正则表达式在Python中查找列表中的字符串3

如何从列表中获取base.php？id = 5314？正则表达式在Python中查找列表中的字符串3

import urllib.parse 
import urllib.request 
from bs4 import BeautifulSoup 
url = 'http://www.fansubs.ru/search.php' 
values = {'Content-Type:' : 'application/x-www-form-urlencoded', 
     'query' : 'Boku dake ga Inai Machi' } 
d = {} 
data = urllib.parse.urlencode(values) 
data = data.encode('ascii') 
req = urllib.request.Request(url, data) 
with urllib.request.urlopen(req) as response: 
    the_page = response.read() 
soup = BeautifulSoup(the_page, 'html.parser') 
for link in soup.findAll('a'): 
    d[link] = (link.get('href')) 
x = (list(d.values()))

来源

2016-03-06 Alex

什么是你的问题是什么呢？ – Arman

这是我的理解，他正在调查页面中的所有'a's，并且想要过滤特定的'href'值...（作为列表存储在'x'中） – urban

您可以组合使用内置的功能filter与regex。例如：

import re 

# ... your code here ... 

x = (list(d.values())) 
test = re.compile("base\.php\?id=", re.IGNORECASE) 
results = filter(test.search, x)

更新基于评论：您可以将筛选结果转换成一个列表：

print(list(results))

示例结果与下列硬编码的列表：

x = ["asd/asd/asd.py", "asd/asd/base.php?id=5314", 
    "something/else/here/base.php?id=666"]

你得到：

['asd/asd/base.php?id=5314', 'something/else/here/base.php?id=666']

这个答案是基于this页面，它介绍了过滤列表。它有更多的实现来做同样的事情，这可能会更适合你。希望它可以帮助

来源

2016-03-06 13:08:23 urban

非常感谢！有用！ – Alex

如果他只是在寻找一个使用正则表达式的精确匹配是一个矫枉过正。只需使用：'在y.lower（），x）'中过滤（lambda：'base.php？id ='）。此外，当使用正则表达式来执行完全匹配时，您应该使用're.escape'来转义内容而不是自己做，所以're.compile（re.escape（'base.php？id ='），re.IGNORECASE） '等等，这对用户提供的输入更重要。 – Bakuriu

您可以直接传递一个正则表达式来find_all将基于与href=re.compile(...在href做过滤为您提供：

import re 

with urllib.request.urlopen(req) as response: 
    the_page = response.read() 
    soup = BeautifulSoup(the_page, 'html.parser') 
    d = {link:link["href"] for link in soup.find_all('a', href=re.compile(re.escape('base.php?id='))}

find_all将只返回具有匹配的href属性的一个标签正则表达式。

它给你：

In [21]:d = {link:link["href"] for link in soup.findAll('a', href=re.compile(re.escape('base.php?id='))} 

In [22]: d 
Out[22]: {<a href="base.php?id=5314">Boku dake ga Inai Machi <small>(ТВ)</small></a>: 'base.php?id=5314'}

考虑到你似乎只是在寻找一个链接，然后它会更有意义只是使用发现：

In [36]: link = soup.find('a', href=re.compile(re.escape('base.php?id=')) 

In [37]: link 
Out[37]: <a href="base.php?id=5314">Boku dake ga Inai Machi <small>(ТВ)</small></a> 

In [38]: link["href"] 
Out[38]: 'base.php?id=5314'

来源

2016-03-06 17:26:15

正则表达式在Python中查找列表中的字符串3

回答

相关问题