1
在Python,我有这样得到的一个html
表元素的变量:无法获取表头元素
page = requests.get('http://www.myPage.com')
tree = html.fromstring(page.content)
table = tree.xpath('//table[@class="list"]')
的table
变量有这样的内容:
<table class="list">
<tr>
<th>Date(s)</th>
<th>Sport</th>
<th>Event</th>
<th>Location</th>
</tr>
<tr>
<td>Jan 18-31</td>
<td>Tennis</td>
<td><a href="tennis-grand-slam/australian-open/index.htm">Australia Open</a></td>
<td>Melbourne, Australia</td>
</tr>
</table>
我想提取这样的标题:
rows = iter(table)
headers = [col.text for col in next(rows)]
print "headers are: ", headers
但是,当我打印headers
变量我得到这个:
headers are: ['\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n
', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n
', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n
', '\n ', '\n ']
如何正确提取标题?
不能重现该问题://要点。 github.com/har07/c693eac57c79c2896881f9b6e2de2202)。你能发布简单但完整的代码来重现这个问题吗? – har07