2013-01-19 274 views
0

我试图来用LinkedIn提取一些数据网格,我只是试图让这对我自己的学习曲线的工作,但如果我删除行浏览器加载文件

MessageBox.Show("asdfasdfasdf") 

名单“消息“只有1个项目,如果我包含上面的行确实是预期的,我会得到15条消息

有人可以解释吗?

public void extract_messages_received(object sender, RoutedEventArgs e) 
{ 
    triggered = false; 
    System.Windows.Forms.WebBrowser browser = new System.Windows.Forms.WebBrowser(); 
    browser.Navigate(new Uri(@"http://www.linkedin.com/inbox/messages/received")); 
    browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted); 
} 

private void LoadMessages(string url) 
{ 
    txtOutput.Text = @"http://www.linkedin.com" + url.Substring(6, url.Length - 6); 
    if (!urls.Contains(url)) 
    { 
     urls.Add(url); 
     WebBrowser browser = new WebBrowser(); 
     browser.Navigate(new Uri(txtOutput.Text); 

     loaded_message = false; 
     browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(ReadMessages); 
    } 
} 

private void ReadMessages(object sender, WebBrowserDocumentCompletedEventArgs e) 
{ 
    if (loaded_message == false) 
    {   
     string url = ((WebBrowser)sender).Url.ToString(); 
     int loc1 = url.IndexOf("itemID") + 7; 
     int loc2 = url.IndexOf("&", loc1); 
     IEnumerable<string> name = null; 
     IEnumerable<string> odate = null; 
     IEnumerable<string> photo = null; 
     IEnumerable<string> subject = null; 
     IEnumerable<string> headline = null; 
     string body = ""; 
     string id = url.Substring(loc1, loc2 - loc1); 
     //System.Windows.MessageBox.Show("READ"); 
     foreach (HtmlElement element in ((WebBrowser)sender).Document.GetElementsByTagName("div")) 
     { 
      if (element.GetAttribute("classname").Equals("inbox-item-body")) 
      { 
       body = element.InnerText; 
      } 
      if (element.GetAttribute("classname").Equals("inbox-item-header")) 
      { 
       var doc = new HtmlAgilityPack.HtmlDocument(); 
       doc.LoadHtml(element.InnerHtml); 
       name = from foo in doc.DocumentNode.SelectNodes("//a[@class='fn']") select foo.InnerText; 
       odate = from foo in doc.DocumentNode.SelectNodes("//p[@class='date']") select foo.InnerText; 
       photo = from foo in doc.DocumentNode.SelectNodes("//img[@class='photo']") select foo.Attributes["src"].Value; 
       subject = from foo in doc.DocumentNode.SelectNodes("//h3") select foo.InnerText; 
       headline = from foo in doc.DocumentNode.SelectNodes("//span[@class='headline']") select foo.InnerText; 
      } 
     } 

     // **** 
     MessageBox.Show("asdfasdfasdf"); 
     // **** 

     messages.Add(new Messages() 
     { 
      ID = id, 
      Subject = subject.First().ToString(), 
      Headline = headline.First().ToString(), 
      Sender = name.First().ToString(), 
      Photo = photo.First().ToString(), 
      SendDate = odate.First().ToString(), 
      Body = body 
     }); 

      // dataMessages.ItemsSource = messages; 
    } 
    loaded_message = true; 
} 

void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) 
{ 
    if (!triggered) 
    { 
     triggered = true; 
     System.Windows.Forms.WebBrowser web = sender as System.Windows.Forms.WebBrowser; 
     foreach (HtmlElement element in web.Document.GetElementsByTagName("ol")) 
     { 
      if (element.GetAttribute("classname").Contains("inbox-list ")) 
      { 
       WebBrowser browser = new WebBrowser(); 
       browser.Navigate("about:blank"); 
       browser.Document.Write(element.InnerHtml); 
       HtmlElementCollection hrefTags = null; 
       hrefTags = browser.Document.GetElementsByTagName("a"); 
       foreach (HtmlElement a in hrefTags) 
       { 
        if (a.OuterHtml.Contains("displayMBox")) 
        { 
         LoadMessages(a.GetAttribute("href")); 
        } 
       } 
      } 
     } 
    }  
} 

回答

0

这是一个计时问题。

当你有消息框在那里,loaded_message不会被设置为true直到您关闭后,消息框,让其他事件的处理,直到消息框,以及与他们没有设置loaded_message到真正的,直到你关闭第一个消息框。

如果关闭该消息框的速度不够快,你可能会看到一些数字beteween 1和15

让我们更简单的例子:

private void Form1_Load(object sender, EventArgs e) 
    { 

     for (int i = 0; i < 5; i++) 
     { 
      WebBrowser wb = new WebBrowser(); 
      wb.DocumentCompleted += wb_DocumentCompleted; 
      wb.Navigate("http://www.stackoverflow.com"); 
     } 
    } 

    bool shown = false; 
    void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) 
    { 
     if (!shown) 
     { 
      Console.WriteLine(shown); 
      MessageBox.Show(shown.ToString()); 
      shown = true; 
     } 
    } 

现在,如果你看控制台,你在显示第一个消息框之前会看到几个false。当我关闭消息框时,我会看到4个更多的消息框,因为这些消息框已经排队并等待shown设置为true之前显示。如果我将消息框注释掉,那么在控制台中只显示一个消息框和一个false

现在,问题变成了,你为什么添加并需要检查loaded_message布尔变量。

我的猜测是你只想加载每条消息只有一次。如果是这样的话,跟踪每一个URL的字典和维护每个URL一个布尔值:

Dictionary<string, bool> loadedUrls = new Dictionary<string, bool>(); 
    private void Form1_Load(object sender, EventArgs e) 
    { 

     for (int i = 0; i < 5; i++) 
     { 
      WebBrowser wb = new WebBrowser(); 
      wb.DocumentCompleted += wb_DocumentCompleted; 
      string url = "http://stackoverflow.com/" + i; 

      loadedUrls.Add(url, false); 
      wb.Navigate(url); 
     } 
    } 

    bool shown = false; 
    void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) 
    { 

     if (loadedUrls.ContainsKey(e.Url.OriginalString) && loadedUrls[e.Url.OriginalString] == false) 
     { 
      loadedUrls[e.Url.OriginalString] = true; 
      Console.WriteLine(shown); 
      shown = true; 
     } 
    } 

我离开那里shown证明这种新方法现在工作在文档完成事件的每通。您的输出窗口应该有false,然后是4 true

+0

不错的约翰!答对了 ;-) – user1320651