2014-03-19 76 views
2

假设我有一个汤,我想删除所有段落的所有样式标签。所以我想在整个汤中将<p style='blah' id='bla' class=...>变成<p id='bla' class=...>。但我不想碰,比如说,<img style='...'>标签。我将如何做到这一点?从特定标签中删除样式BeautifulSoup/Python

+0

对于那些谁需要删除一些类中的特定标签(python3 ): for soup.findAll(“p”,class _ =“MsoNormal”): \t del x ['class'] – JinSnow

回答

3

的想法是使用find_all('p')遍历所有p标签和删除的样式属性:

from bs4 import BeautifulSoup 


data = """ 
<body> 
    <p style='blah' id='bla1'>paragraph1</p> 
    <p style='blah' id='bla2'>paragraph2</p> 
    <p style='blah' id='bla3'>paragraph3</p> 
    <img style="awesome_image"/> 
</body>""" 


soup = BeautifulSoup(data, 'html.parser') 
for p in soup.find_all('p'): 
    if 'style' in p.attrs: 
     del p.attrs['style'] 

print soup.prettify() 

打印:

<body> 
<p id="bla1"> 
    paragraph1 
</p> 
<p id="bla2"> 
    paragraph2 
</p> 
<p id="bla3"> 
    paragraph3 
</p> 
<img style="awesome_image"/> 
</body>