如何从Python中的字符串中去除？

-3

我正在使用beautifulsoup来追加数组“get_link”中的所有链接。如何从Python中的字符串中去除？

get_link = [] 
for a in soup.find_all('a', href=True): 
    if a.get_text(strip=True): 
    get_link .append(a['href'])

输出get_link的：

['index.html?country=2', 
'index.html?country=25', 
'index.html?country=1', 
'index.html?country=6', 
'index.html?country=2']

如何得到下面的输出？

[country=2', 
country=25', 
country=1', 
country=6', 
country=2']

来源

2017-10-05 Raju Singh

我不明白你的要求。您的标题与您显示的代码几乎没有关系，或者没有关系。你只是想弄清楚如何得到你的每个'index.html？country = ...'字符串的'country = ...'部分？这似乎是'str.index'和一个切片很容易，但我会写一个答案，说当我不确定这实际上是你问什么。 – Blckknght

@Blckknght我的英语不好，这就是为什么我不能更好地解释。有没有什么办法可以使用正确的左数组和数组，这样我就可以只保留必要的数组文本了get_link –

对不起，我仍然不知道“right，left function”是什么意思。如果你的所有链接都是相同的类型（它们总是以'index.html？'开头，这就是你想要切断的内容，你可以'get_link.append（a ['href'] [11：]） ''[11：]'是一个切断前11个字符的切片，如果你的链接可能看起来不同，你可能需要更复杂的逻辑 – Blckknght

优化的方法来获取所有a标签（链接）与非空的文本价值和href属性：

links = [l.get('href').replace('index.html?','') 
     for l in soup.find_all('a', href=True, string=True) if l.text.strip()] 
print(links)

来源

2017-10-05 09:02:18 RomanPerekhrest

是的，这是除去“index.html？”的另一种方法。谢谢！ –

@RajuSingh，不客气 – RomanPerekhrest

有很多方法来获得唯一的“国家=”一些已经在BS4但如果你愿意，你可以使用正则表达式：

import re 
ui=['index.html?country=2', 
'index.html?country=25', 
'index.html?country=1', 
'index.html?country=6', 
'index.html?country=2'] 





pattern=r'(country=[0-9]{0,99})' 



print("\n".join([re.search(pattern,i).group() for i in ui]))

结果：

country=2 
country=25 
country=1 
country=6 
country=2

来源

2017-10-05 09:33:18

如何从Python中的字符串中去除？

回答

相关问题