TypeError使用正则表达式和beautifulsoup

我正在读取一些正在读取html的代码，正在通过beautifulsoup进行解析，然后希望使用正则表达式来查找一些数字（作业的一部分）。TypeError使用正则表达式和beautifulsoup

现在，我使用套接字而不是urllib，我知道错误是从数据类型（期望字符串或字节），但在行我缺少我需要编码/解码来处理数据的套接字。这个错误发生在我的re.findall

除了一个修复，什么是造成这个问题，我猜更重要的是什么是数据类型的差异，因为我似乎失去了一些东西...应该感觉固有的。

提前感谢。

#Py3 urllib is utllib.request 
import urllib.request 

#BeautifulSoup stuff bs4 in Py3 
from bs4 import * 

#Raw Input now input in Py3 
#url = 'http://' + input('Enter - ') 
url = urllib.request.urlopen('http://python-data.dr-chuck.net/comments_42.html') 

html = url.read() 

#html.parser is the parser that defaults. Usefull most of the time (according to the web) 
soup = BeautifulSoup(html, 'html.parser') 
# Retrieve all of the tags specified 
tags = soup('span') 
for tag in tags: 

    print(re.findall('[0-9]+', tag))

来源

2017-02-02 Devin Martin

准确地说，你在做什么？，因为没有定义你的代码，并且你的reggex表达式匹配所有的数字，你应该导入模块重新工作。 –

没有复制，是的，进口是有 –

所以，我已经捉住了这个后卫之前：BeautifulSoup返回一个对象，刚刚似乎是字符串，当你调用print。

正如一个全面的检查，试试这个：

import urllib.request 
from bs4 import * 

url = urllib.request.urlopen('http://python-data.dr-chuck.net/comments_42.html') 
soup = BeautifulSoup(url.read(), 'html.parser') 
single_tag = soup('span')[0] 
print("Type is: \"%s\"; prints as \"%s\"" % (type(single_tag), single_tag)) 
print("As a string: \"%s\"; prints as \"%s\"" % (type(str(single_tag)), str(single_tag)))

下应该输出：

类型是： “<类的bs4.element.Tag'>”;打印为“<范围 class =”comment“> 97 </span>”
作为字符串：“< class'str'>”;打印为“<跨度类=”意见“> 97 </SPAN>”

所以，如果你在一个str()通话将其发送到正则表达式之前封装的‘标签’，这个问题应采取的

护理

我一直发现，添加理智print(type(var))检查什么时候事情开始抱怨意外的变量类型是一个有用的调试技术！

来源

2017-02-02 04:22:22 icebooda

健康检查帮助大时间。首先进入bs4，但无法辨认出印刷品只是将物体扔回给我。当家庭遇到病态测试 –

TypeError使用正则表达式和beautifulsoup

回答

相关问题