将值输入到搜索栏并从网页下载输出

我正在尝试搜索网页（http://www.phillyhistory.org/historicstreets/）。我认为相关的源html是这样的：将值输入到搜索栏并从网页下载输出

<input name="txtStreetName" type="text" id="txtStreetName">

你可以在网站上看到源html的其余部分。我想进入那个文本框并输入一个街道名称并下载一个输出（即在页面的搜索框中输入'Jefferson'并查看杰弗逊的历史街道名称）。我曾尝试使用requests.post，并试图在URL中尝试输入？get = Jefferson来测试如果没有运气的话。任何人有任何想法如何获得此页？谢谢，

卡梅伦说我现在尝试（有些进口未使用的，因为我打算解析等）

代码：

import requests 
from bs4 import BeautifulSoup 
import csv 
from string import ascii_lowercase 
import codecs 
import os.path 
import time 


arrayofstreets = [] 



arrayofstreets = ['Jefferson'] 

for each in arrayofstreets: 
    url = 'http://www.phillyhistory.org/historicstreets/default.aspx' 
    payload = {'txtStreetName': each} 
    r = requests.post(url, data=payload).content 
    outfile = "raw/" + each + ".html" 
    with open(outfile, "w") as code: 
     code.write(r) 
    time.sleep(2)

这没有工作，只给了我下载的默认网页（即杰弗森在搜索栏中没有输入和检索。

来源

2016-06-20 www3

我猜你参考“requests.post”涉及蟒蛇请求模块。

当你没有指定你想从搜索结果中凑什么，我只给你一个片段来获取HTML对于给定的搜索查询：

import requests 

query = 'Jefferson' 

url = 'http://www.phillyhistory.org/historicstreets/default.aspx' 
post_data = {'txtStreetName': query} 

html_result = requests.post(url, data=post_data).content 

print html_result

如果您需要进一步处理的HTML文件中提取一些数据，我建议你使用Beautiful Soup模块来做到这一点。

更新版本：

#!/usr/bin/python 
import requests 
from bs4 import BeautifulSoup 
import csv 
from string import ascii_lowercase 
import codecs 
import os.path 
import time 

def get_post_data(html_soup, query): 
    view_state = html_soup.find('input', {'name': '__VIEWSTATE'})['value'] 
    event_validation = html_soup.find('input', {'name': '__EVENTVALIDATION'})['value'] 
    textbox1 = '' 
    btn_search = 'Find' 
    return {'__VIEWSTATE': view_state, 
      '__EVENTVALIDATION': event_validation, 
      'Textbox1': '', 
      'txtStreetName': query, 
      'btnSearch': btn_search 
      } 

arrayofstreets = ['Jefferson'] 


url = 'http://www.phillyhistory.org/historicstreets/default.aspx' 
html = requests.get(url).content 
for each in arrayofstreets: 
     payload = get_post_data(BeautifulSoup(html, 'lxml'), each) 
     r = requests.post(url, data=payload).content 
     outfile = "raw/" + each + ".html" 
     with open(outfile, "w") as code: 
      code.write(r) 
      time.sleep(2)

在我的/你的第一个版本的问题是，我们并没有发布所有必需的参数。要找出需要发送的内容，请在浏览器中打开网络监视器（Firefox中的Ctrl + Shitf + Q），并按照正常情况进行搜索。如果您在网络日志中选择POST请求，则在右侧您应该看到“参数选项卡”，其中您的浏览器发送了帖子参数。

来源

2016-06-20 15:59:56 Dziugas

嗨Dziugas，这正是我试过的。我没有得到正确的输出。我在这个问题上编辑了我的回答 – www3

将值输入到搜索栏并从网页下载输出

回答

相关问题