用Beautifulsoup和Requests刮取'N'页（如何获得真实页码）

我想获取网站中的所有标题（）。用Beautifulsoup和Requests刮取'N'页（如何获得真实页码）

http://www.shyan.gov.cn/zwhd/web/webindex.action

现在，我的代码只能成功擦除一个页面。但是，我想在上面的网站上找到多个可用的页面。

例如，通过上面的url，当我点击链接到“第2页”时，整个网址不会改变。我查看了页面源代码，并看到javascript代码像这样前进到下一页：javascript：gotopage（2）或javascript：void（0）。我的代码是在这里（获取页面1）

from bs4 import Beautifulsoup 
import requests 
url = 'http://www.shyan.gov.cn/zwhd/web/webindex.action' 
r = requests.get(url) 
soup = Beautifulsoup(r.content,'lxml') 
titles = soup.select('td.tit3 > a') 
for title in titles: 
    print(title.get_text())

如何将我的代码更改为刮去所有可用列出的网页标题？非常感谢！

来源

2016-04-18 champion Ch

非常感谢！但是我无法得到下一页。我的代码在下面。请帮我修改它。 –

尝试使用以下URL格式：

http://www.shiyan.gov.cn/zwhd/web/webindex.action?keyWord=&searchType=3&page.currentpage=2&page.pagesize=15&page.pagecount=2357&docStatus=&sendOrg=

该网站使用JavaScript来隐藏页面信息传递给服务器请求下一个页面。当您查看源代码时，您会发现：

<form action="/zwhd/web/webindex.action" id="searchForm" name="searchForm" method="post"> 
<div class="item"> 
    <div class="titlel"> 
     <span>留言查询</span> 
    <label class="dow"></label> 
    </div> 
    <input type="text" name="keyWord" id="keyword" value="" class="text"/> 
    <div class="key"> 
     <ul> 
      <li><span><input type="radio" checked="checked" value="3" name="searchType"/></span><p>编号</p></li> 
      <li><span><input type="radio" value="2" name="searchType"/></span><p>关键字</p></li> 
     </ul>  
    </div> 
    <input type="button" class="btn1" onclick="search();" value="查询"/> 
    </div> 
    <input type="hidden" id="pageIndex" name="page.currentpage" value="2"/> 
    <input type="hidden" id="pageSize" name="page.pagesize" value="15"/> 
    <input type="hidden" id="pageCount" name="page.pagecount" value="2357"/> 
    <input type="hidden" id="docStatus" name="docStatus" value=""/> 
    <input type="hidden" id="sendorg" name="sendOrg" value=""/> 
    </form>

来源

2016-04-18 17:01:15 vassilo

谢谢，这是一个不错的选择。它比硒更容易理解。 –

@vassilo你是怎么想出这个URL的（将隐藏元素格式化为url）？ – Phillip

当我点击下一页链接时，我使用Google Chrome的DevTools来检查网页的请求。确定适当的请求，你很好走。 – vassilo

用Beautifulsoup和Requests刮取'N'页（如何获得真实页码）

回答

相关问题