我可以下载懒惰的加载图片吗？

我想使用的urllib下载从一些旅行图片，但所有我得到从HTML中的src领域的网址是this 我可以下载懒惰的加载图片吗？

我做了一些研究，我发现，那些懒惰的负载图像...有什么办法可以下载它们吗？

2016-06-07 WisdomPill

你给的链接不起作用 – BradTheBrutalitist

对不起尝试用这种https://www.tripadvisor.it/Restaurant_Review-g3174493-d3164947-Reviews-Le_Ciaspole-Tret_Fondo_Province_of_Trento_Trentino_Alto_Adige.html – WisdomPill

你可以使用超链接，或者您可以点击左键并进行检查，然后在元素页面中找到该图片。 – BradTheBrutalitist

您可以从Javascript使用Beautiful Soup和json模块提取图像列表，然后遍历列表，并检索您感兴趣的图像

编辑：

的问题是，图像具有相同的名称，因此它们被覆盖。获取前三张图像很简单，但在传送带打开之前，不会加载对传送带中其他图像的引用，因此更加棘手。对于某些图像，您可以通过用“photo-w”替换路径中的“photo-s”来找到更高分辨率的版本，但要弄清楚需要深入研究Javascript逻辑。

import urllib, re, json 
from bs4 import BeautifulSoup as bs 

def img_data_filter(tag): 
    if tag.name == "script" and tag.text.strip().startswith("var lazyImgs"): 
     return True 
    return False 

response = urllib.urlopen("https://www.tripadvisor.it/Restaurant_Review-g3174493-d3164947-Reviews-Le_Ciaspole-Tret_Fondo_Province_of_Trento_Trentino_Alto_Adige.html") 
soup = bs(response.read(), 'html.parser') 
img_data = soup.find(img_data_filter) 

js = img_data.text 
js = js.replace("var lazyImgs = ", '') 
js = re.sub(r";\s+var lazyHtml.+", '', js, flags=re.DOTALL) 

imgs = json.loads(js) 
suffix = 1 

for img in imgs: 
    img_url = img["data"] 

    if not "media/photo-s" in img_url: 
     continue 

    img_name = img_url[img_url.rfind('/')+1:-4] 
    img_name = "%s-%03d.jpg" % (img_name, suffix) 
    suffix += 1 

    urllib.urlretrieve(img_url, img_name)

来源

2016-06-08 07:37:41 flesk

谢谢，但我想为餐厅下载几张图片。 – WisdomPill

你的算法只能得到其中的一个......带有链接“Tutte le foto dei visitatori”的那个......你能向我解释如何得到它们的前3或4个吗？为什么你的算法不下载它们？他们是不是也是图像？ – WisdomPill

非常感谢您......其实我已经自己调整过它，但是我认为您的编辑效果更好。 – WisdomPill

我可以下载懒惰的加载图片吗？

回答

相关问题