2012-09-12 68 views
2

我有下面的XML:非常规格式化/解析XML

<XMLResults><ConfMess><RCode>0</RCode><MId>0</MId></ConfMess><COURSE_DATA><THEHEADING>Review Engagements: Inquiry and Analytical Review Procedures and Reporting</THEHEADING><ABSTRACT><!--this file has been generated by v.3.2.1 8/9/2012 8:50:14 AM by JHancock (and called from 'A G&Q Database')--><html><head><title>Course Abstract</title><link rel='stylesheet' href='https://www.thelearningcenter.org/cserver/case1/css/theabstract.css' type='text/css'></head><body><div style='text-align: center;' class=h2banner>Course Abstract</div><div id="tableContainer" class="tableContainer"><table class="abstract"><tbody class="scrollContent"><tr class="abstract"><td class="abstractCaptions">Main Title</td><td class="abstract" id=courseAbstractTitle>Initial Review: Find Out About Additional Reporting Procedures</td></tr><tr class="abstract"><td class="abstractCaptions">Writer(s)</td><td class="abstract" id=authorsAbstract>Karl Booker<br>Harriet Johnson</td></tr><tr class="abstract"><td class="abstractCaptions">Current Field(s) of Study<sup>1</sup></td><td class="abstract" id=fosAbstract>4.0 study hours in 'History'</td></tr><tr class="abstract"><td class="abstractCaptions">Area Of Study</td><td class="abstract" id=courseLevelAbstract>Medium</td></tr><tr class="abstract"><td class="abstractCaptions">Value (30 min.sec.)<sup>1</sup></td><td class="abstract" id=creditHoursAbstract>3.5</td></tr><tr class="abstract"><td class="abstractCaptions">Must Haves</td><td class="abstract" id=prerequisitesAbstract>None</td></tr><tr class="abstract"><td class="abstractCaptions">Description</td><td class="abstract" id=descriptionAbstract>This topic revolves around discussing important topics in the history field and how they relate to our current situation.</td></tr><tr class="abstract"><td class="abstractCaptions">TheObjective</td><td class="abstract" id=objectivesAbstract><ul><li>Learn more about history and how our modern times have been shaped by it.<li>Plan for the future<li>Help mankind to learn from the past<li>Provide valuable input to others<li>Be greatful for what we have<li>Gain credit for all the hard work we put in<li>Pass this course and move on with our lives.<li>Get a good job and raise a family.<li>Get a vacation home and relax on the beach<li>Soak up the sun and get a tan</ul></td></tr><tr class="abstract" id=idExpirationRow><td class="abstractCaptions">Expires</td><td class="abstract" id=expirationAbstract>This topic is reviewed monthly for value and modified where needed.</td></tr><tr class="abstract"><td class="abstractCaptions">Item ID</td><td class="abstract" id=courseIDabstract>odt</td></tr></tbody></table></div><div id=footnote1ID class="sylFNote"><sup>1</sup>Consult your instructor for infornation on this particular topic</div><div id="idCopyright" class="copyright">© 2004 THIS SCHOOL BOARD</div></body></html></ABSTRACT></COURSE_DATA><STUDY_AREA><SUBJECT>AuditField</SUBJECT><NUMBER_HOURS>3.0</NUMBER_HOURS></FIELD_OF_STUDY></XMLResults> 

我似乎无法找到一个程序,将在XML的<ABSTRACT>stuff</ABSTRACT>部分解析出的“东西”。我认为这可能是由于特殊字符或类似的东西。有人能帮我解决这个问题,并且不会失败吗?

回答

0

这可能是因为<!-- -->是XML中的一条评论。本身并不是失败。

Comments in XML 

The syntax for writing comments in XML is similar to that of HTML. 

<!-- This is a comment --> 

这是reference链接。

你如何解决这个问题将取决于你正在使用的库。一些图书馆可能会支持获取该元素的原始文本。他们也可能会返回一个评论元素。

我可能只是grep纯文本<ABSTRACT>(.*)</ABSTRACT>。如果每个文档有多个记录,可能会有问题,因此您需要首先将其隔离到每个文档。

+0

嗯......那么我该如何解决这个问题? –

2

这不是XML。这是一串带尖括号的文字。

您不仅在<ABSTRACT>元素内有问题,还有<STUDY_AREA></FIELD_OF_STUDY>

你如何解决它?你没有。你得到这个垃圾寄给你的人发给你有效的XML。这并不是说有没有很多XML编辑器。他们应该使用这样的工具来创建和/或验证他们的“XML”。