2014-11-03 113 views
0

投掷错误,我试图处理以下HTML:硒为TD元素

<tr style="background-color:LightCyan;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1409009.html" id="SubContent_GridViewPublicationList_A1_0">L & L Exxon: Interim Action Plan and SEPA DNS Available for Review and Comment</a> 
     </td><td>14-09-009</td><td>October 2014</td> 
     </tr><tr style="background-color:White;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1407028.html" id="SubContent_GridViewPublicationList_A1_1">Public Comment Notice: RockTenn Notice of Construction and SEPA Determination of Non-significance</a> 
     </td><td>14-07-028</td><td>October 2014</td> 
     </tr><tr style="background-color:LightCyan;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1406013.html" id="SubContent_GridViewPublicationList_A1_2">Rule Implementation Plan - Chapter 197-11 WAC, State Environmental Policy Act (SEPA) Rules</a> 
     </td><td>14-06-013</td><td>April 2014</td> 
     </tr><tr style="background-color:White;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1406012.html" id="SubContent_GridViewPublicationList_A1_3">Concise Explanatory Statement - Chapter 197-11 WAC, State Environmental Policy Act (SEPA) Rules</a> 
     </td><td>14-06-012</td><td>April 2014</td> 
     </tr><tr style="background-color:LightCyan;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1406011.html" id="SubContent_GridViewPublicationList_A1_4">Rule Adoption Notice</a> 
     </td><td>14-06-011</td><td>April 2014</td> 
     </tr><tr style="background-color:White;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1406010.html" id="SubContent_GridViewPublicationList_A1_5">Final Cost - Benefit and Least Burdensome Alternative Analyses</a> 
     </td><td>14-06-010</td><td>April 2014</td> 
     </tr><tr style="background-color:LightCyan;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1410050.html" id="SubContent_GridViewPublicationList_A1_6">Final Environmental Impact Statement: Management of <i>Zostera Japonica</i> on Commercial Clam Beds in Willapa Bay, Washington</a> 
     </td><td>14-10-050</td><td>March 2014</td> 
     </tr><tr style="background-color:White;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1307049.html" id="SubContent_GridViewPublicationList_A1_7">Public Comment Notice: Weyerhaeuser, Longview Notice of Construction Order SEPA Determination of Non-Significance</a> 
     </td><td>13-07-049</td><td>December 2013</td> 
     </tr><tr style="background-color:LightCyan;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1306004.html" id="SubContent_GridViewPublicationList_A1_8">Focus on SEPA Rulemaking - Updating the State Environmental Policy Act</a> 
     </td><td>13-06-004</td><td>March 2013</td> 
     </tr><tr style="background-color:White;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1309112.html" id="SubContent_GridViewPublicationList_A1_9">Port of Tacoma Kaiser: Interim Cleanup Plans and SEPA Forms Available for Public Comment</a> 
     </td><td>13-09-112</td><td>January 2013</td> 
     </tr><tr style="background-color:LightCyan;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1206021.html" id="SubContent_GridViewPublicationList_A1_10">Final Cost-Benefit and Least Burdensome Alternative Analyses Chapter 197-11 WAC</a> 
     </td><td>12-06-021</td><td>December 2012</td> 
     </tr><tr style="background-color:White;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1206020.html" id="SubContent_GridViewPublicationList_A1_11">SEPA Rule Adoption Notice</a> 
     </td><td>12-06-020</td><td>December 2012</td> 
     </tr><tr style="background-color:LightCyan;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1206017.html" id="SubContent_GridViewPublicationList_A1_12">SEPA Rule Implementation Plan</a> 
     </td><td>12-06-017</td><td>December 2012</td> 
     </tr><tr style="background-color:White;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1206016.html" id="SubContent_GridViewPublicationList_A1_13">SEPA Rule - Concise Explanatory Statement </a> 
     </td><td>12-06-016</td><td>December 2012</td> 
     </tr><tr style="background-color:LightCyan;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1206013.html" id="SubContent_GridViewPublicationList_A1_14">Preliminary Cost-Benefit and Least Burdensome Alternative Analyses, Chapter 197-11 WAC SEPA Rules</a> 
     </td><td>12-06-013</td><td>November 2012</td> 
     </tr><tr style="background-color:White;"> 
      <td> 
      <a href="/ecy/publications/SummaryPages/1206009.html" id="SubContent_GridViewPublicationList_A1_15">Rule Proposal Notice, State Environmental Protection Act (SEPA)</a> 
     </td><td>12-06-009</td><td>October 2012</td> 
     </tr> 

我试图提取使用硒蟒td元素。我写的代码:

def parse(self, response): 
     self.driver.get("https://fortress.wa.gov/ecy/publications/UIPages/PublicationList.aspx?IndexTypeName=Topic&NameValue=SEPA+(State+Environmental+Policy+Act)&DocumentTypeName=Publication") 
     # dropdown=Select(self.driver.find_element_by_id("industrydrop")) 
     # dropdown.select_by_index(4) 
     # sleep(10) 
     items = [] 
     sel = Selector(response) 
     sHelper = StringHelper.getStrinHelperObject() 
     dHelper = DateHelper.getDateHelperObject() 
     sites = self.driver.find_elements_by_css_selector("table#SubContent_GridViewPublicationList tr") 
     count = 0 
     for site in sites: 
      item = EKSpiderItem() 
      item['docNumber'] = sHelper.processMyString(site.find_element_by_css_selector("td:nth-child(2)").text) 
      item['title'] = sHelper.processMyString(site.find_element_by_css_selector("td:nth-child(1)").text) 
      item['publicationDate'] = sHelper.processMyString(site.find_element_by_css_selector("td:nth-child(3)").text) 
items.append(item) 
return items. 

但程序抛出这样的错误

Message: u'Unable to locate element: {"method":"css selector","selector":"td:nth-child(2)"}' 

我从How to use find_element_by_link_text() properly to not raise NoSuchElementException?Unable to locate using find element by link想尽了各种办法,但没有什么工作在这种情况下。

任何帮助将真诚赞赏。谢谢。

回答

0

sites第一个元素没有你要找的东西:

(Pdb) sites[0].text 
u'Title (link to summary) Number Date (released or updated)' 

设置时间出来:self.driver.implicitly_wait(0)

,要么跳过第一个元素,或处理:

for site in sites: 
      try: 
        results = site.find_element_by_css_selector("td:nth-child(2)").text 
        print(results) 
        if "Unable to locate element" in results: 
          raise Exception(results) 
      except Exception,e: 
        print(e) 
        continue 
    import pdb;pdb.set_trace()