0
我对图片我试过的Python 3.6图像从谷歌图片搜索</p> <p>crwaling
1.Open的镀铬驱动器与硒
2.向下滚动到结束
3。使用BeautifulSoup获取图片网址并保存图片
但这是一个问题,因为图片太小
所以,我发现有SRC
它是在src原始图像的图像irc_mi类
的(以“.jpg”结尾),但我不知道如何将其拉出
我尝试使用find_all作为类名,但失败了。
我该怎么办?
这里是源代码
def Remainder_All_ImagesURLs_Google(searchText):
def scroll_page():
for i in range(7):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(3)
def click_button():
more_imgs_button_xpath = "//*[@id='smb']"
element = driver.find_element_by_xpath(more_imgs_button_xpath)
element.click()
sleep(3)
def create_soup():
html_source = driver.page_source
soup = BeautifulSoup(html_source, 'html.parser')
return soup
def find_imgs():
soup = create_soup()
imgs_urls = []
for img in soup.find_all('img'):
try:
if img['src'].startswith('http'):
imgs_urls.append(img['src'])
except:
pass
return imgs_urls
driver = webdriver.Chrome('C:/chromedriver.exe')
driver.maximize_window()
sleep(2)
searchUrl = "https://www.google.com/search?q={}&site=webhp&tbm=isch".format(searchText)
driver.get(searchUrl)
try:
scroll_page()
click_button()
scroll_page()
except:
click_button()
scroll_page()
imgs_urls = find_imgs()
driver.close()
return(imgs_urls)
def download_image(url,filename):
full_name = str(filename) + ".jpg"
urllib.request.urlretrieve(url, 'C:/Python/Picture' + full_name)