我已经提取了第二个表格,在第二个表格中,我需要提取具有column[0]
中文件名的行。解析来自html的特定数据
<TABLE WIDTH="100%" BORDER="1" >
<TR ><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="2" WIDTH="70%">Root</TD></TR>
<TR ><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="70%">Functions</TD><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="30%">    10.1% (1077/10647)</TD></TR>
<TR ><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="70%">Functions and exits</TD><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="30%">     9.5% (2142/22473)</TD></TR>
<TR ><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="70%">Statement blocks</TD><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="30%">     9.1% (2191/24167)</TD></TR>
<TR ><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="70%">Decisions</TD><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="30%">     8.8% (2648/29930)</TD></TR>
<TR ><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="70%">Loops</TD><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="30%">     8.4% (305/3628)</TD></TR>
<TR ><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="70%">Basic conditions</TD><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="30%">     8.3% (1759/21254)</TD></TR>
<TR ><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="70%">Modified conditions</TD><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="30%">     1.8% (35/1997)</TD></TR>
<TR ><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="70%">Multiple conditions</TD><TD BGCOLOR="#FFFFCC" ROWSPAN="1" COLSPAN="1" WIDTH="30%">     4.4% (137/3082)</TD></TR>
</TABLE>
</P>
<P ALIGN="LEFT"><BR>
2 - Files list</P>
<BR>
Display absolute values only.<BR>
<TABLE WIDTH="100%" BORDER="1" >
<TR BGCOLOR="#FFFF99"><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><b>Item<IMG SRC="cvi_sort_d.png" ALT="cvi_sort_d.xpm"></b></TD><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><b>Functions</b></TD><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><b>Functions and exits</b></TD><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><b>Statement blocks</b></TD><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><b>Decisions</b></TD><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><b>Loops</b></TD><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><b>Basic conditions</b></TD><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><b>Modified conditions</b></TD><TD BGCOLOR="#FFFF99" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><b>Multiple conditions</b></TD></TR>
<TR ><TD BGCOLOR="#FF9999" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><B><A NAME="175746848"></A><a href="LOADER.H.html">LOADER.H</a></B></TD><TD BGCOLOR="#FFDFDD" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">0/1</P>
</TD><TD BGCOLOR="#FFDFDD" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">0/2</P>
</TD><TD BGCOLOR="#FFDFDD" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">0/1</P>
</TD><TD BGCOLOR="#FFDFDD" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">0/1</P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
<TR ><TD BGCOLOR="#9999FF" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><A NAME="175746912"></A>    <a href="LOADER.H.html">LoaderState_struct</a></TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
<TR ><TD BGCOLOR="#9999FF" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><A NAME="175746976"></A>    <a href="LOADER.H.html">LoadParameters_struct</a></TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
<TR ><TD BGCOLOR="#9999FF" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><A NAME="175747104"></A>    <a href="LOADER.H.html">LoadOffsets_struct</a></TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
<TR ><TD BGCOLOR="#9999FF" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><A NAME="175747168"></A>    <a href="LOADER.H.html">LoadAppComponent_struct</a></TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
<TR ><TD BGCOLOR="#FF9999" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><B><A NAME="175746848"></A><a href="CORBA_FIXED.CC.html">CORBA_FIXED.CC</a></B></TD><TD BGCOLOR="#FFDFDD" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">0/1</P>
</TD><TD BGCOLOR="#FFDFDD" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">0/2</P>
</TD><TD BGCOLOR="#FFDFDD" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">0/1</P>
</TD><TD BGCOLOR="#FFDFDD" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">0/1</P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
<TR ><TD BGCOLOR="#9999FF" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><A NAME="175746912"></A>    <a href="LOADER.H.html">LoaderState_struct</a></TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
<TR ><TD BGCOLOR="#9999FF" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><A NAME="175746976"></A>    <a href="LOADER.H.html">LoadParameters_struct</a></TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
<TR ><TD BGCOLOR="#9999FF" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><A NAME="175747104"></A>    <a href="LOADER.H.html">LoadOffsets_struct</a></TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
<TR ><TD BGCOLOR="#9999FF" ROWSPAN="1" COLSPAN="1" WIDTH="27%"><A NAME="175747168"></A>    <a href="LOADER.H.html">LoadAppComponent_struct</a></TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD><TD BGCOLOR="#CCCCFF" ROWSPAN="1" COLSPAN="1" WIDTH="9%"><P ALIGN="RIGHT">none </P>
</TD></TR>
</TABLE>
对于这个分析我写了一个Python脚本如下:
from bs4 import BeautifulSoup
f = open("/home/vignesh/Downloads/html/RateDoc.html","r")
fl = {'LOADER.H','CORBA_FIXED.H'}
soup = BeautifulSoup(f)
t = soup.findAll('table')
for table in t[1:]:
rows = table.findAll('tr')
for tr in rows[1:]:
cols = tr.findAll('td')
for td in cols:
text = ''.join((td.find(text=True)).encode('utf-8'))
print text+"\t",
print
print
the above script extracts the data as follows:
LOADER.H 0/1 0/2 0/1 0/1 none none none none
none none none none none none none none
none none none none none none none none
none none none none none none none none
none none none none none none none none
CORBA_FIXED.CC 0/1 0/2 0/1 0/1 none none none none
none none none none none none none none
none none none none none none none none
none none none none none none none none
none none none none none none none none
但该预期的结果如下,我想提取与扩展*.cc
或*.h
输出的所有文件要求:
LOADER.H 0/1 0/2 0/1 0/1 none none none none
CORBA_FIXED.CC 0/1 0/2 0/1 0/1 none none none none
是否有人帮助我修改上述脚本,以便提取特定扩展*.cc
和*.h
。