0
这是我为获取alexa排名而编写的脚本。以良好的表格形式显示结果并对其进行分栏
#!/usr/bin/env python
import sys
import requests
from lxml import html
if __name__ == '__main__':
if len(sys.argv) < 2:
print 'usage: python %s <file-urls>' % (sys.argv[0])
sys.exit(2)
filename = sys.argv[1]
urls = open(filename)
for site in urls:
try:
url="http://www.alexa.com/siteinfo/"+site
content=requests.get(url).content
tree=html.fromstring(content)
RANK=tree.xpath('//strong[@class="metrics-data align-vmiddle"]/text()')
print "Site:",site+"Global Rank:",RANK[0]+"\t"+"Country Rank:",RANK[1]
# print 'Site:%s Global Rank:%2s Country Rank:%2s' % (site, RANK[0], RANK[1])
except (KeyboardInterrupt, SystemExit):
print "Keyboar Interruption!"
sys.exit(0)
结果:
Site: google.com
Global Rank: 1 Country Rank: 1
Site: yahoo.com
Global Rank: 4 Country Rank: 4
Site: bing.com
Global Rank: 23 Country Rank: 14
的结果并不令人满意。你能否展示如何更好地分组结果?
我想知道为什么网站位于上面一行以及如何纠正它 – MLSC 2014-10-22 12:19:59
因为在网站变量的末尾有'\ n'。尝试去除它。 – 2014-10-22 12:23:09