2017-10-16 93 views
0

我的目标是抓住所有法庭案件编号并将它们放入Excel文件夹中。这些案件都在第2列Python:从所有tr儿童中找到第二个td孩子

我的代码:

courtCases = driver.find_elements_by_css_selector('body > table:nth-child(3) > tbody > tr:nth-child* > td:nth-child(2)') 
for courtCase in courtCases: 
    print(courtCase.text) 

这将引发错误

selenium.common.exceptions.InvalidSelectorException:消息:无效的选择:一个无效的或非法的选择是指定。

我能够通过将精确的CSS路径和XPath想获得一个案件:

courtCases = driver.find_elements_by_css_selector('body > table:nth-child(3) > tbody > tr:nth-child(7) > td:nth-child(2) > font') 

我需要收集所有法院在第2列TD:第n个孩子(2 )。

无论如何,我的问题是:任何人都可以帮我写一个好的css-selector或xpath来获取所有的法庭日期吗?

一些HTML

<html> 
<head> 
<title>Wejis - Dayton Municipal Court</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 
</head> 

<body> 
<table width="750" border="0"> 
    <tr> 
    <td width="185"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Run 
     Date: 10/16/2017 </font></td> 
    <td width="380"><div align="center"> 
     <p><strong><font color="#003399" size="4" face="Verdana, Arial, Helvetica, sans-serif">Housing 
      Docket Report</font></strong></p> 
     <p><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Dayton 
      Municipal Court</font></strong></p> 
     </div></td> 
    <td width="185"><div align="right"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Run 
     Time: 12:28 PM</font></div></td> 
    </tr> 
</table> 
<table width="750"> 
    <tr><td colspan="4">&nbsp;</td></tr> 
    <tr> 
     <td width="250"><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Court Date: September 20, 2017</font></strong></td> 
     <td width="140"><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">All Sessions</font></strong></td> 
     <td width="130"><div align="center"><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Courtroom 3A</font></strong></div></td> 
     <td width="220"><div align="right"><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Judge Deirdre E Logan</font></strong></div></td> 
    </tr> 
</table> 
<table width="750" border="0"> 
    <tr> 
    <td colspan="5"><hr></td> 
    </tr> 

      <tr> 
       <td colspan="2"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong>Housing Trial</strong></font></td> 
       <td colspan="3"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong> 8:30AM </strong></font></td> 
      </tr> 
      <tr> 
       <td width="140"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Defendant Name</font></strong></td> 
       <td width="120"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Case Number</font></strong></td> 
       <td width="240"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Charges</font></strong></td> 
       <td width="115"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Attorney</font></strong></td> 
       <td width="115"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Location</font></strong></td> 
      </tr> 

       <tr> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Rosal, Jorge</font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">2017-CRM-005695</font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">MAINTAINING EXTERIOR<br></font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif"></font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">1347 Kingsley </font></td> 
       </tr> 

      <tr> 
       <td colspan="2"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong>Criminal Court Trial In Jail</strong></font></td> 
       <td colspan="3"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong> 9:30AM </strong></font></td> 
      </tr> 
      <tr> 
       <td width="140"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Defendant Name</font></strong></td> 
       <td width="120"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Case Number</font></strong></td> 
       <td width="240"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Charges</font></strong></td> 
       <td width="115"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Attorney</font></strong></td> 
       <td width="115"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Location</font></strong></td> 
      </tr> 

       <tr> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Joyner, Melissa</font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">2017-CRB-000784</font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">DRUG ABUSE INSTRUMENT<br>DRUG PARAPHERNALIA/USE OR POSS<br></font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Jenn A. Cunningham-Minnick</font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">1401 Harshman RD</font></td> 
       </tr> 

       <tr> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Joyner, Melissa</font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">2017-CRM-000775</font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">LITTERING IN PARK<br></font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif"></font></td> 
        <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">1401 Harshman RD</font></td> 
       </tr> 
+0

您是否尝试过使用BeautifulSoup?相关的问题在这里:https://stackoverflow.com/questions/23377533/python-beautifulsoup-parsing-table。您可以将数据写入熊猫数据框,然后使用pandas.to_csv写出csv – skrubber

+0

是的,但我认为OP正在倾向于XPath表达式,我相信BeautifulSoup不支持 – Mangohero1

+0

我试过xlml并且有类似的问题为此:导入请求 URL =的raw_input( “输入一个网站,从提取的URL: ”) R = requests.get(“ HTTP://” +网址) 数据= r.text 汤= BeautifulSoup(data)我无法得到r.text,因为得到:url = raw_input(“输入一个网站来解压缩URL:”) r = requests.get(“http://” +网址)我需要的网址,我只能通过点击链接到达网页。如果重新加载网页404错误发生 - 文件或目录未找到。我需要从打开的网页 – John

回答

1

通过XPath发现它的:

courtCases = driver.find_elements_by_xpath('//td[2]/font[@size="1"]') 
for courtCase in courtCases: 
    print(courtCase.text) 

通知所有法院的案件是如何对字体大小。如果你忽略了这个属性,你也会得到时间。

+0

哇!谢谢。我需要处理我的xpath技能。我尝试了很多不同的东西。我可以使用'/ html/body/table [3]/*'得到所有包含法庭案件的表格,并试图从那里缩小它的范围,但没有运气,所以试图切换到css选择器。 – John

+0

不用担心。如果您知道如何使用Chrome的Inspector工具,则更容易找到它。 :-) – Mangohero1

1

你的选择器中有一点比你需要的还多。我发现它可以被缩小到下面。

td:nth-child(2) > font[size='1'] 

CSS选择更快,更好的支持比XPath的,但有一些东西,比如通过包含的文本定位的元素,即只有XPath的可以做。

+0

有趣的方法,并感谢您的洞察力。我认为CSS会更快,我只是不确定它是如何构建的。 +1 – Mangohero1

相关问题