的Python 3 BeautifulSoup4从每个<tr>

-1

我从一个HTML表以这种格式刮中选择特定的标签：的Python 3 BeautifulSoup4从每个<tr>

<table> 

    <tr> 
     <th>Name</th> 
     <th>Date</th> 
     <th>Number</th> 
     <th>Address</th> 

    </tr> 

    <tr> 1 

     <td> Name-1 </td> 
     <td> Date-1 </td> 
     <td> Number-1 </td> 
     <td> Address-1 </td> 

    </tr> 

    <tr> 2 

     <td> Name-2 </td> 
     <td> Date-2 </td> 
     <td> Number-2 </td> 
     <td> Address-2 </td> 

    </tr> 

</table>

它是页面上的唯一表。我想存储每个TD标签及其相应的TH标签信息以制作一个列表，然后最终将其保存为CSV。实际的信息不是用一个数字保存的，这只是为了说明。数据有数百个表格行，所有表格中都有这种格式的相同数据。

基本上，我想让'名字'是每个TR行中的第一个TD单元，日期是第二个，依此类推。

我似乎无法找到一种方法来处理Python3和BeautifulSoup4，我知道有一种方法，我只是太新了。

谢谢大家的帮助，我正在学习很多东西。

来源

2016-10-24 Clive

假设数据是均匀的，下面的基本示例应该工作：

table_rows = soup.find_all("tr") #list of all <tr> tags 
for row in table_rows: 
    cells = row.find_all("td") #list of all <td> tags within a row 
    if not cells: #skip rows without td elements 
     continue 
    name, date, number, address = cells #unpack list of <td> tags into separate variables

来源

2016-10-24 16:34:22 sytech

的Python 3 BeautifulSoup4从每个<tr>

回答

相关问题