Python TypeError：'NoneType'对象不可调用

-1

我试图抓取一个网站并将数据写入CSV文件（成功）。我面临两个挑战：Python TypeError：'NoneType'对象不可调用

CSV文件中的数据保存在ROWS中，而不是保存在列中。
网站有页面，1,2,3,4 ...接下来我无法浏览所有页面来刮取数据。数据仅从第一页报废。

错误：

if last_link.startswith('Next'): 
TypeError: 'NoneType' object is not callable

代码：

import requests 
import csv 
from bs4 import BeautifulSoup 

url = 'http://localhost:8088/wiki.html' 

response = requests.get(url) 
html = response.content 
soup = BeautifulSoup(html) 

table = soup.find('table', {'class' : 'tab_operator'}) 

list_of_rows = [] 
for rows in table.findAll('tr'): 
    list_of_cells = [] 
    for cell in rows.findAll('td'): 
     list_of_links = [] 
     for links in cell.findAll('a'): 
      text = links.text.replace('&nbsp;', '') 
      list_of_links.append(text) 
     list_of_rows.append(list_of_links) 

outfile = open('./outfile.csv', 'w') 
writer = csv.writer(outfile) 
writer.writerows(list_of_rows) 

try: 
    last_link = soup.find('table', {'id' : 'str_nav'}).find_all('a')[-1] 
    if last_link.startswith('Next'): 
     next_url_parts = urllib.parse.urlparse(last_link['href']) 
     url = urllib.parse.urlunparse((base_url_parts.scheme, base_url_parts.netloc, next_url_parts.path, next_url_parts.params, next_url_parts.query, next_url_parts.fragment)) 

except ValueError: 
    print("Oops! Try again...")

网站HTML代码示例：

### Numbers to scrape ### 

<table cellpadding="10" cellspacing="0" border="0" style="margin-top:20px;" class="tab_operator"> 
<tbody><tr> 
<td valign="top"> 
<a href="http://localhost:8088/wiki/9400000">9400000</a><br> 
<a href="http://localhost:8088/wiki/9400001">9400001</a><br> 
</td> 
</tr></tbody> 
</table> 

### Paging Sample Code: ### 

<div class="pstrnav" align="center"> 
<table cellpadding="0" cellspacing="2" border="0" id="str_nav"> 
<tbody> 
<tr> 
<td style="background-color:#f5f5f5;font-weight:bold;">1</td> 
<td><a href="http://localhost:8088/wiki/2">2</a></td> 
<td><a href="http://localhost:8088/wiki/3">3</a></td> 
<td><a href="http://localhost:8088/wiki/4">4</a></td> 
<td><a href="http://localhost:8088/wiki/2">Next &gt;&gt;</a></td> 
<td><a href="http://localhost:8088/wiki/100">Last</a></td> 
</tr> 
</tbody> 
</table> 
</div>

来源

2016-04-23 Overflow

last_link是一个标签对象，不的字符串。 BeautifulSoup将标记中的任何属性名称视为标记搜索，而不是现有的属性或方法。因为在链接都没有startswith标签，即搜索返回None，这是您要调用该对象：

>>> last_link = soup.find('table', {'id' : 'str_nav'}).find_all('a')[-1] 
>>> last_link 
<a href="http://localhost:8088/wiki/100">Last</a> 
>>> last_link.startswith is None 
True 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
TypeError: 'NoneType' object is not callable

你要测试的包含的文本代替：

if last_link.get_text(strip=True).startswith('Next'):

这使用Tag.get_text() method访问链接中的所有文本;即使链接中包含其他标签（如<b>或<i>标记），使用此方法仍可正常工作。

你可能想直接在这里Next链接搜索：

import re 

table = soup.select_one('table#str_nav') 
last_link = table.find('a', text=re.compile('^Next'))

正则表达式规定只有直接包含文本开始Next允许a标签相匹配。

来源

2016-04-23 13:21:22

按建议做了更改......'table = soup.select_one（'table＃str_nav'） last_link = table.find（'a'，text = re.compile（'^ Next'）） if last_link .get_text（strip = True）.startswith（'Next'）：'.....但仍然收到错误信息：'table = soup.select_one（'table＃str_nav'） TypeError：'NoneType'对象不是可调用的' – Overflow

@溢出：'select_one'需要BeautifulSoup 4.4.0或更新版本（约6个月前发布，IIRC）。 –

@Overflow：如果你的bs4安装比较旧，使用'table = soup.select（'table＃str_nav'）[0]'。 –

Python TypeError：'NoneType'对象不可调用

回答

相关问题