我该如何刮这个特殊的jQuery网站与python？

我想刮这个网站：https://resultadoselecciones2016.onpe.gob.pe/PRP2V2016/Actas-por-Ubigeo.html我该如何刮这个特殊的jQuery网站与python？

他们正在使用jQuery，所以数据不在“正常”的HTML代码。我看到这个Chrome开发者控制台上：

所以我这样做对Python的2.7：

import urllib 
import urllib2 

url = 'https://resultadoselecciones2016.onpe.gob.pe/PRP2V2016/Actas-por-Ubigeo.html' 

data = "pid=844399127479680.2&_clase=mesas&_accion=displayMesas&ubigeo=140107&nroMesa=034915&tipoElec=10&page=1&pornumero=1" 

req = urllib2.Request(url, data) 
response = urllib2.urlopen(req) 
print response.read()

但它不工作，它只是打印正常HTML，而不是你在上面看到的回应。

我该如何获得这些数据？

来源

2016-06-08 Kevin Castro

您需要在您的服务器上运行无头浏览器 – charlietfl

您可以使用Selenium或RoboBrowser执行此类任务。 –

我刚刚解决了这个问题。我用requests模块，而不是urllib，只是复制/粘贴整个头，像这样：

import requests 
from bs4 import BeautifulSoup 

url2 = "https://resultadoselecciones2016.onpe.gob.pe/PRP2V2016/ajax.php" 
head = "[my entire header]" 
data_get_departamentos = "pid=1037937475037058.5&_clase=ubigeo&_accion=getDepartamentos&dep_id=&tipoElec=&tipoC=acta&modElec=&ambito=E&pantalla=" 

r = requests.post(url2, data=data_get_departamentos, headers=head) 
departamentos = r.text

然后我用Beautifulsoup解析HTML响应。就这样。

来源

2016-06-08 17:00:24

我该如何刮这个特殊的jQuery网站与python？

回答

相关问题