2016-08-05 117 views
0

阅读从这个问题的答案后弦:C# regex pattern to extract urls from given string - not full html urls but bare links as well我想知道这将是从文档中提取的URL,通过使用正则表达式匹配,或使用字符串分割方法的最快途径。C#,提取使用正则表达式或字符串分割

所以,你必须包含一个HTML文档的字符串,并要提取的网址。

正则表达式的方法是:

Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase); 
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue"; 
foreach(Match m in linkParser.Matches(rawString)) 
    MessageBox.Show(m.Value); 

和字符串分割方法:

string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue"; 
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://")); 
foreach (string s in links) 
    MessageBox.Show(s); 

哪一个是最高效的方式做到这一点?

+0

你可以用秒表 –

+0

尝试它既我羞于承认,我首先想到的是“是秒表某种类型的基准程序的” –

+0

我可以” t基准,因为我几天没有访问PC。 –

回答

0

拆分速度更快。下面是一些代码,你可以测试: dotnetfiddle link

using System; 
using System.Diagnostics; 
using System.Linq; 
using System.Text.RegularExpressions; 

public class Program 
{ 

    public void Main() 
    { 
     Stopwatch sw = new Stopwatch(); 

     sw.Start(); 

     for (int i=0; i < 500; i++) 
     { 
      Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase); 
      string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue"; 
     } 

     sw.Stop(); 

     var test1Time = sw.ElapsedMilliseconds; 


     sw.Reset(); 
     sw.Start(); 

     for (int i=0; i < 500; i++) 
     { 
      string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue"; 
      var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://")); 
     } 

     sw.Stop(); 

     var test2Time = sw.ElapsedMilliseconds; 

     Console.WriteLine("Regex Test: " + test1Time.ToString()); 
     Console.WriteLine("Split Test: " + test2Time.ToString()); 
    } 
} 
+0

太棒了。感谢您的回答, –

+0

如何检查它作为答案。 – 2016-08-05 15:18:23