2012-10-16 47 views
1
string content=" 
     <br /><br /><a href="need to replace this url">Cooking School</a><br /><br /><a href="http://www.sdlm.com">Feed your senses</a><br /><br /><a href="http://www.sdl.com">Take your cooking skills to the next level. Find a cooking school near you!</a><br /><br /><a href="http:google.com"><img src="http://www.sdlm1.com/autd3umrl_u_t.jpg" /></a> 
    " 

我需要替换所有的锚标签的href网址不同 值我用下面的功能,但它得到错误需要更换锚标记的HREF字符串中的

public List<string> GetLinksFromHtml(string content) 
     { 
      string regex = @"<(?<Tag_Name>(a)|img)\b[^>]*?\b(?<URL_Type>(?(1)href|src))\s*=\s*(?:""(?<URL>(?:\\""|[^""])*)""|'(?<URL>(?:\\'|[^'])*)'))"; 
      var matches = Regex.Matches(content, regex, RegexOptions.IgnoreCase | RegexOptions.Singleline); 
      var links = new List<string>(); 

      foreach (Match item in matches) 
      { 
       string link = item.Groups[1].Value; 
       links.Add(link); 
      } 

      return links; 
     } 

感谢您的帮助

+0

我只是想获得所有的锚标签的href值,以便我可以,我想其他网址替换它们。当我在堆栈溢出搜索我上面的函数,我只是试过,但错误是 - >解析“<(?(a)| img)\ b [^>] *?\ b(?(1)href | src ))\ s * = \ s *(?:“(?(?:\\”| [^“])*)”|'(?(?:\\'| [^'])*) '))“ - 太多)的。 – user1622436

回答

8

试图用正则表达式解析html不是一个好主意。见this post。使用像HtmlAgilityPack这样的真正的html解析器。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(content); 
foreach (var a in doc.DocumentNode.Descendants("a")) 
{ 
    a.Attributes["href"].Value = "http://a.com?url=" + HttpUtility.UrlEncode(a.Attributes["href"].Value); 
} 

var newContent = doc.DocumentNode.OuterHtml; 
+0

iam没有收到HtmlAgilityPack dll – user1622436

+1

@ user1622436这是什么意思? “你不能”或“你不想要” –

+0

我需要dll HtmlAgilityPack – user1622436