是否可以使用wget镜像来保存整个网站的所有链接并将它们保存在txt文件中?镜像整个网站并保存txt文件中的链接
如果可能,它是如何完成的?如果没有,是否有其他方法可以做到这一点?
编辑:
我试图运行这个命令:
wget -r --spider example.com
,得到了这样的结果:
Spider mode enabled. Check if remote file exists.
--2015-10-03 21:11:54-- http://example.com/
Resolving example.com... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1270 (1.2K) [text/html]
Remote file exists and could contain links to other resources -- retrieving.
--2015-10-03 21:11:54-- http://example.com/
Reusing existing connection to example.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 1270 (1.2K) [text/html]
Saving to: 'example.com/index.html'
100%[=====================================================================================================>] 1,270 --.-K/s in 0s
2015-10-03 21:11:54 (93.2 MB/s) - 'example.com/index.html' saved [1270/1270]
Removing example.com/index.html.
Found no broken links.
FINISHED --2015-10-03 21:11:54--
Total wall clock time: 0.3s
Downloaded: 1 files, 1.2K in 0s (93.2 MB/s)
(Yes, I also tried using other websites with more internal links)
是的,这是它应该如何工作。实际网站“example.com”没有内部链接,所以它只是返回自己。尝试一个网站链接到网站内的其他网页,你应该得到更多。你是否也想要链接到* external *网站?如果是这样,来自@Randomazer的python脚本可能是一个更好的选择。 – seumasmac
其实,有一个类似的问题,你可以在:http://stackoverflow.com/questions/2804467/spider-a-website-and-return-urls-only哪些可能是有用的。 – seumasmac
非常感谢!这有帮助! – user1878980