2017-05-02 31 views
0

我对图片我试过的Python 3.6图像从谷歌图片搜索</p> <p>crwaling

1.Open的镀铬驱动器与硒

2.向下滚动到结束

3。使用BeautifulSoup获取图片网址并保存图片

但这是一个问题,因为图片太小

所以,我发现有SRC

它是在src原始图像的图像irc_mi类

的(以“.jpg”结尾),但我不知道如何将其拉出

我尝试使用find_all作为类名,但失败了。

我该怎么办?

这里是源代码

def Remainder_All_ImagesURLs_Google(searchText): 

def scroll_page(): 
    for i in range(7): 
     driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") 
     sleep(3) 

def click_button(): 
    more_imgs_button_xpath = "//*[@id='smb']" 
    element = driver.find_element_by_xpath(more_imgs_button_xpath) 
    element.click() 
    sleep(3) 


def create_soup(): 
    html_source = driver.page_source 
    soup = BeautifulSoup(html_source, 'html.parser') 
    return soup 


def find_imgs(): 
    soup = create_soup() 
    imgs_urls = [] 
    for img in soup.find_all('img'): 
     try: 
      if img['src'].startswith('http'): 
       imgs_urls.append(img['src']) 
     except: 
      pass 

    return imgs_urls 


driver = webdriver.Chrome('C:/chromedriver.exe') 

driver.maximize_window() 
sleep(2) 


searchUrl = "https://www.google.com/search?q={}&site=webhp&tbm=isch".format(searchText) 


driver.get(searchUrl) 

try: 
    scroll_page() 
    click_button() 
    scroll_page() 


except: 
    click_button() 
    scroll_page() 

imgs_urls = find_imgs() 

driver.close() 

return(imgs_urls) 

def download_image(url,filename): 
    full_name = str(filename) + ".jpg" 
    urllib.request.urlretrieve(url, 'C:/Python/Picture' + full_name) 

回答

0

问题是美丽的汤不会找到,因为它的一个java脚本基于功能的来源或图像的HREF返回源(SRC),因此我的建议使用硒点击图片标签,等待图像src和解压 使用

element=driver.find_element_by_class_name("some_class") 
element.click() 

然后搜索图片src