我试图从网站获取图像列表，并将它们保存到硬盘，但它不起作用

我正在使用HtmlAgilityPack。我试图从网站获取图像列表，并将它们保存到硬盘，但它不起作用

在这个函数的imageNodes在foreach计数为0

我不明白为什么这个列表计数为0

该网站包含许多图像。我想要的是从网站上获取图像列表，并在richTextBox1中显示列表，并且我还希望将网站上的所有图像保存在我的硬盘上。

我该如何解决？

public void GetAllImages() 
{ 
    // Bing Image Result for Cat, First Page 
    string url = "http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n"; 

    // For speed of dev, I use a WebClient 
    WebClient client = new WebClient(); 
    string html = client.DownloadString(url); 

    // Load the Html into the agility pack 
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
    doc.LoadHtml(html); 

    // Now, using LINQ to get all Images 
    List<HtmlNode> imageNodes = null; 
    imageNodes = (from HtmlNode node in doc.DocumentNode.SelectNodes("//img") 
       where node.Name == "img" 
        && node.Attributes["class"] != null 
        && node.Attributes["class"].Value.StartsWith("img_") 
       select node).ToList(); 

    foreach (HtmlNode node in imageNodes) 
    { 
     // Console.WriteLine(node.Attributes["src"].Value); 
     richTextBox1.Text += node.Attributes["src"].Value + Environment.NewLine; 
    } 
}

来源

2012-05-15 user1363119

你似乎只选择具有以'img_'开头的'class'属性的图像 - 是否与文档本身一致？我不清楚为什么除了用于选择节点的XPATH之外，还有一个'node node.Name ==“img”' - 看起来多余。 – Oded

你检查过下载的HTML吗？下载原始HTML后，图像可能会动态地被下载，所以你的图像没有了。 – JotaBe

正如我所见，Bing图像的正确类别是sg_t。您可以获取这些HtmlNodes具有以下Linq查询：

List<HtmlNode> imageNodes = doc.DocumentNode.Descendants("img") 
    .Where(n=> n.Attributes["class"] != null && n.Attributes["class"].Value == "sg_t") 
    .ToList();

这份名单应该充满所有img与class = 'sg_t'

来源

2012-05-15 10:23:03 vfportero

就让我们来看看在你的代码，例如网页/ URL显示，图像，你之后没有以“img_”开头的课程类型。

<img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&amp;id=db87e23954c9a0360784c0546cd1919c&amp;url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px">

我注意到你的代码只是针对thumnails。您还需要全尺寸的图片网址，它们位于每个缩略图周围的锚点中。你需要从A HREF看起来像这样拉最终网址：

<a href="/images/search?q=cat&amp;view=detail&amp;id=89929E55C0136232A79DF760E3859B9952E22F69&amp;first=0&amp;FORM=IDFRIR" class="sg_tc" h="ID=API.images,18.1"><img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&amp;id=db87e23954c9a0360784c0546cd1919c&amp;url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px"></a>

和解码看起来像位： url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg

其解码为： http://actnowtraining.files.wordpress.com/2012/02/cat.jpg

来源

2012-05-15 10:25:01

我试图从网站获取图像列表，并将它们保存到硬盘，但它不起作用

回答

相关问题