我需要在paralel中刮很多页面,而我的UI线程一定不能被阻塞。我正在为每个页面(url)创建线程,并在该线程中实例化webBrowser控件以执行javascript并在此之后获取html。当webBrowser获取我在UI线程中发起的事件以注册该浏览器已完成其工作时,我想知道何时所有浏览器都提取了html,以便我可以合并所有数据并显示它。C#.NET webbrowser控件在主线程中的单独线程提升事件
1.)第一个问题是,有些线程从未发生事件,所以我一直在等待。
2)第二个问题是我不能处理的浏览器,而不会导致外部浏览器火,永诺拉动地毯浏览器下方,所以他决定通过在用户的默认浏览器,我想打开页面继续。但如果不处理,我会用尽内存。
我v一直在四处搜寻,发现了很多相关的东西,但我没有为我的用例实现它。这是我的代码:
[System.Runtime.InteropServices.ComVisibleAttribute(true)]
public partial class Form1 : Form
{
public delegate void ThreadFinishedEventHandler(object source, EventArgs e);
public event ThreadFinishedEventHandler threadFinishedEvent;
int threadCount = 0;
int threadReturnedCount = 0;
List<string> linksGlobal;
public Form1()
{
InitializeComponent();
threadFinishedEvent += new ThreadFinishedEventHandler(OnThreadFinished);
}
private void Form1_Load(object sender, EventArgs e)
{
}
private void btnGO_Click(object sender, EventArgs e)
{
scrapeLinksWithBrowsersInSeparateThreads();
}
private void scrapeLinksWithBrowsersInSeparateThreads()
{
linksGlobal = getLinks(); //10 urls all the same -> https://sports.betway.com
threadCount = linksGlobal.Count;
Random rand = new Random(123);
int waitTime = 0;//trying not to be registered as DOS attack or smth
foreach (string url in linksGlobal)
{
runBrowserThread(url, waitTime);
waitTime += rand.Next(500, 3000) + 500;//each browser will start navigating withing 1 - 4 seconds interval from each other
}
}
public void runBrowserThread(string url, int waitTime)
{
var th = new Thread(() =>
{
try
{
WebBrowserDocumentCompletedEventHandler completed = null;
WebBrowser wb = new WebBrowser();
completed = (sndr, e) =>
{
if (e.Url.AbsolutePath != (sndr as WebBrowser).Url.AbsolutePath)
{
wb.DocumentCompleted -= completed;
string html = (sndr as WebBrowser).Document.Body.InnerHtml;
threadFinishedEvent.Raise(this, EventArgs.Empty); // I have EventExtension allowing me this
//wb.Dispose(); //whenever and wherever I put this it causes external browser to fire
// Application.ExitThread(); //this sometimes seems to cause event never firing, not shure
}
};
wb.DocumentCompleted += completed;
wb.ScriptErrorsSuppressed = true;
Thread.Sleep(waitTime); //tryin not to get registerd as DOS attck or smth, each browser will start navigating withing 1 - 4 seconds interval from each other
wb.Navigate(url);
Application.Run();
}
catch (Exception ex)
{
throw ex;
}
});
th.SetApartmentState(ApartmentState.STA);
th.Start();
}
private void OnThreadFinished(object source, EventArgs e)
{
threadReturnedCount++; // i get this for smth like 3 - 5 out od 11 threads, then this event stops being raised, dunno why
if (threadReturnedCount == threadCount)
{
// Do work
//this never happens cos a lot of threads never raise event, some do
}
}
private List<string> getLinks()
{
List<string> links = new List<string>();
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
links.Add("https://sports.betway.com");
return links;
}
}
P.S.从线程返回数据是单独的问题,我没有实现它,但首先我想解决这个问题。我将使用从Factory.createObject(html)等每个线程中调用的objectFactory,因为它将位于主线程上,所以我必须在该Factory上使用某种锁定。