2014-10-02 40 views

回答

2

经过测试Mikko Ohtamaa的回答,这里有一些笔记。 它适用于很多标签并使用lxm,但有不同的情况,例如background-image:url(xxx)。所以我只是使用正则表达式来替代。 这里的解决方案,

content = re.sub('(?P<left>("|\'))\s*(?P<url>(\w|\.)+(/.+?)+)\s*(?P<right>("|\'))', 
        '\g<left>' + url[:url.rfind('/')] + '/\g<url>\g<right>', content) 
content = re.sub('(?P<left>("|\'))\s*(?P<url>(/.+?)+)\s*(?P<right>("|\'))', 
        '\g<left>' + url[:url.find('/', 8)] + '\g<url>\g<right>', content) 
6

下面是一个例子代码又包括<a href>

from lxml import etree, html 
import urlparse 

def fix_links(content, absolute_prefix): 
    """ 
    Rewrite relative links to be absolute links based on certain URL. 

    @param content: HTML snippet as a string 
    """ 

    if type(content) == str: 
     content = content.decode("utf-8") 

    parser = etree.HTMLParser() 

    content = content.strip() 

    tree = html.fragment_fromstring(content, create_parent=True) 

    def join(base, url): 
     """ 
     Join relative URL 
     """ 
     if not (url.startswith("/") or "://" in url): 
      return urlparse.urljoin(base, url) 
     else: 
      # Already absolute 
      return url 

    for node in tree.xpath('//*[@src]'): 
     url = node.get('src') 
     url = join(absolute_prefix, url) 
     node.set('src', url) 
    for node in tree.xpath('//*[@href]'): 
     href = node.get('href') 
     url = join(absolute_prefix, href) 
     node.set('href', url) 

    data = etree.tostring(tree, pretty_print=False, encoding="utf-8") 

    return data 

The full story is available in Plone developer documentation