2014-04-29 57 views
0

我解析它代表的研究论文/ artciles的XML文件,并有XML架构之下在MySQL数据库中存储在Java中解析XML文件,以获得特定的文本内容

<article> 
    <article-meta></article-meta> 
    <body> 
    <p> 
    Extensible Markup Language (XML) is a markup language that defines a set of 
    rules for encoding documents in a format that is both human-readable and machine- 
    readable <ref id = 1>. It is defined in the XML 1.0 Specification produced by the 
     W3C, and several other related specifications 
     </p> 
     <p> 
     Many application programming interfaces (APIs) have been developed to aid 
     software developers with processing XML <ref id = 2>. data, and several schema 
     systems exist to aid in the definition of XML-based languages. 
     </p> 
    </body> 
    <back> 
     <ref-list> 
     <ref id = 1>Details about this reference </ref> 
     <ref id = 2>Details about this reference </ref> 
     </ref-list> 
    </back> 
    </article> 

我解析使用DOM文件解析器。其中一个要求是每个ref ID,我必须从身体标签中引用的位置提取150个左右的字符。我怎样才能做到这一点 ??

 refId  leftText rightText 
    1   left 150  150 chars on right side 
+0

做XPATH – MadProgrammer

回答

0

假设你使用DOM得到了在代码中的XML的<ref>标签元素Id = 1和元素content value = Details about this reference,在一个字符串变量存储<ref> tag含量值,那么你可以使用子字符串方法被甩char和右焦炭这样。

String text ="Details about this reference"; 
String leftText = text.substring(0,7); // get 7 chars from left side 
String rightText =text.substring(text.length()-2); // get 2 char from right side, instead of 2 you have to pass10 

结果

leftText:Details rightText:ce 

注意:你需要提取它,如果之前检查字符串长度刨丝器超过150小于子会抛出异常ArayIndexBoundOfException

+0

搜索我要从身体中提取它。例如''标签元素'Id = 1',左右字符将是人可读和机器可读的。它在XML 1.0规范中定义** – Abhilash

+0

您是否提取了ref标签内容值? –