2011-12-07 55 views
"outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left;" 



def html_remove_attrs(value): 
    soup = BeautifulSoup(value) 
    print "hi" 
    for tag in soup.findAll(True,{'style': re.compile(r'')}): 
     #tag.attrs = None 
     #for attr in tag.attrs: 
     # if "class" in attr: 
     #  tag.attrs.remove(attr) 
     # if "style" in attr: 
     #  tag.attrs.remove(attr) 
     for attr in tag.attrs: 
      if "style" in attr: 
       #remove the background and font properties 

    return soup 

你在做这个之前,去居住或当它击中的客户端(JavaScript?) – Jakub


我必须从服务器端解析它.. –


您应该重新考虑使用'内联css'来支持可重用类。 – Jakub








public static void main(String[] args) throws Exception { 

    String[] lines = { 
     "outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left;", 
     "outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left", 
     "background-color: #eff0f8;", 
     "background-color: #eff0f8", 

    String regex = "((?:background|font)(?:[^:]+):(?:\\s*))([^;]+)"; 

    Pattern p = Pattern.compile(regex); 

    for (String s: lines) { 
     StringBuffer sb = new StringBuffer(); 
     Matcher m = p.matcher(s); 
     while (m.find()) { 

      // capturing group(2) for debug purpose only 
      // just to get it's length so we can fill that with '-' 
      // to assist comparison of before and after 
      String text = m.group(2); 
      text = text.replaceAll(".", "-"); 
      m.appendReplacement(sb, "$1"+text); 

      // for non-debug mode, just use this instead 
      // m.appendReplacement(sb, "$1"); 

     System.err.println("> " + s); // before 
     System.err.println("< " +sb.toString()); // after 

的确有很好的表现力。谢谢你的帮助。但是当我用这个正则表达式分割并且将所有分割的数据结合在一起时,我得到这个http://pastebin.com/n43wUw8x。 “背景*”和“字体*”的值不会被删除:( –


我修改了表达式并更新了答案,包括一个例子。 – sudocode