我有一个问题,我必须做一个网页的解析器。结构如下:用PHP解析HTML
<TABLE WIDTH=80%>
<tr><td colspan=7><BR><BR></td></tr>
<TR>
<Td colspan=7><FONT FACE="arial" align=left><B><A NAME="TEST">Anagrafica</B><br></TH>
</TR>
<tr><td colspan=7></td></tr>
<TR>
<TH ALIGN=LEFT ><FONT COLOR="#AA0000" FACE="arial" SIZE="2">Name</FONT></TH>
<TH></TH>
<TH ALIGN=LEFT ><FONT COLOR="#AA0000" FACE="arial" SIZE="2">Surname</FONT></TH>
<TH></TH>
<TH ALIGN=LEFT ><FONT COLOR="#AA0000" FACE="arial" SIZE="2">ID</FONT></TH>
<TH></TH>
<TH ALIGN=LEFT ><FONT COLOR="#AA0000" FACE="arial" SIZE="2">Code</FONT></TH>
</TR>
<tr>
<TD COLSPAN="7">
<HR SIZE="1" NOSHADE></TD>
<TR>
<TR>
<TD ALIGN="left" VALIGN="TOP" NOWRAP><FONT SIZE="1" FACE="arial">Mario</FONT> </TD>
<TD WIDTH="10"><VALIGN="TOP"><FONT SIZE="1" FACE="arial"> </FONT></TD>
<TD ALIGN="CENTER" VALIGN="TOP" NOWRAP><P ALIGN="CENTER"><FONT SIZE="1" FACE="arial"> Mario </FONT></TD>
<TD WIDTH="10"><VALIGN="TOP"><FONT SIZE="1" FACE="arial"> </FONT></TD>
<TD ALIGN="LEFT" VALIGN="TOP" NOWRAP><FONT SIZE="1" FACE="arial">1</FONT></TD>
<TD WIDTH="10"><VALIGN="TOP"><FONT SIZE="1" FACE="arial">a</FONT></TD>
<TD ALIGN="LEFT" VALIGN="TOP" NOWRAP><FONT SIZE="1" FACE="arial">132</FONT></TD>
<TR>
<TD ALIGN="left" VALIGN="TOP" NOWRAP><FONT SIZE="1" FACE="arial">Mario</FONT> </TD>
<TD WIDTH="10"><VALIGN="TOP"><FONT SIZE="1" FACE="arial"> </FONT></TD>
<TD ALIGN="CENTER" VALIGN="TOP" NOWRAP><P ALIGN="CENTER"><FONT SIZE="1" FACE="arial"> Mario </FONT></TD>
<TD WIDTH="10"><VALIGN="TOP"><FONT SIZE="1" FACE="arial"> </FONT></TD>
<TD ALIGN="LEFT" VALIGN="TOP" NOWRAP><FONT SIZE="1" FACE="arial">1</FONT></TD>
<TD WIDTH="10"><VALIGN="TOP"><FONT SIZE="1" FACE="arial">a</FONT></TD>
<TD ALIGN="LEFT" VALIGN="TOP" NOWRAP><FONT SIZE="1" FACE="arial">132</FONT></TD>
<TR>
我想用这个脚本
$start = strpos($content,'<Td colspan=7><FONT FACE="arial" align=left><B><A NAME=');
if ($start == TRUE) {
$end = strpos($content,'</TABLE>',$start) + 8;
$table = substr($content,$start,$end-$start);
preg_match_all("|<TD(.*)</TD>|U",$table,$rows);
$x = 1;
$counter = 1;
echo "<table class=\"TFtable\">";
foreach ($rows[0] as $row){
if ((strpos($row,'<TR')===false)){
preg_match_all("|<TD(.*)</TD>|U",$row,$cells);
$status[$x] = strip_tags($cells[0][0]);
$x = $x+1;
$counter = $counter+1;
}
if ($counter % 7 == 1) {
echo "<tr><td>{$status[2]} - {$status[4]} <br> {$status[6]} - {$status[1]}</td></tr>\n";
$x = 1;
}
}
echo "</table>";
这样拿4列的数据,但是,最后一个字段$状态[1]我就会出现在第二行中,就好像它确实是第2行的一部分:
例如
马里奥罗西1 213
马里奥·比安奇2 324
显示
马里奥·罗西1
马里奥·比安奇2 213
我在哪里错了?
简单:使用[DOM](http://php.net/dom)。你不应该手动解析HTML – 2014-11-25 14:53:02
强制性的:http://stackoverflow.com/a/1732454/1902010 – ceejayoz 2014-11-25 14:53:54