在我的XML行为我有一个多行的元素:奇怪的字符()与SAX + Java的
<tag id="sometag" ...>
| first line
| second line
| third line
| fourth line
<tag ...>
....
<tag id="someothertag" ...>
| ANOTHER FIRST LINE
| ANOTHER SECOND LINE
| ANOTHER THIRD LINE
| ANOTHER FORTH LINE
<tag ...>
然后在Java中我有必要startElement
,endElement
和characters
方法,但我发现我得到一些奇怪的行为与characters
:
public void characters(char[] ch, int start, int length){
Log.d(TAG, "characters("\"" + (new String(ch)).replaceAll("[\r\n]", "\\n") + "\", " + start + ", " + length + ")");
}
除此之外,我对字符什么都不做。我基本上创建了一个解析器的两个实例。有一个例子,我正在寻找sometag
。如果我找到要查找的内容并返回该元素,则会抛出异常。
D/MyProgram(1565): STARTING document parsing...
D/MyProgram(1565): characters("n ", 0, 1)
D/MyProgram(1565): characters(" | first line", 0, 20)
D/MyProgram(1565): characters("n | first line", 0, 1)
D/MyProgram(1565): characters(" | second line", 0, 23)
D/MyProgram(1565): characters("n | second line", 0, 1)
D/MyProgram(1565): characters(" | third line", 0, 26)
D/MyProgram(1565): characters("n | third line", 0, 1)
D/MyProgram(1565): characters(" | fourth lineline", 0, 22)
D/MyProgram(1565): characters("n | fourth lineline", 0, 1)
D/MyProgram(1565): characters(" | fourth lineline", 0, 4)
D/MyProgram(1565): Successfully found "sometag"!
...和另一个全新的实例,我正在寻找someothertag
。我做了和以前一样的事情。
D/MyProgram(1565): STARTING document parsing...
D/MyProgram(1565): characters("n", 0, 1)
D/MyProgram(1565): characters(" ", 0, 4)
D/MyProgram(1565): characters("n ", 0, 1)
D/MyProgram(1565): characters(" | first line", 0, 20)
D/MyProgram(1565): characters("n | first line", 0, 1)
D/MyProgram(1565): characters(" | second line", 0, 23)
D/MyProgram(1565): characters("n | second line", 0, 1)
D/MyProgram(1565): characters(" | third line", 0, 26)
D/MyProgram(1565): characters("n | third line", 0, 1)
D/MyProgram(1565): characters(" | fourth lineline", 0, 22)
D/MyProgram(1565): characters("n | fourth lineline", 0, 1)
D/MyProgram(1565): characters(" | fourth lineline", 0, 4)
D/MyProgram(1565): Successfully found "someothertag"!
我明白,XML解析是基于流的(它解析块而不是整个字符串),但这是非常奇怪的行为。这里有几件事我注意到,真的是让人眼花缭乱:
- 随着人物的每一次迭代(),解析器没有启动离开的地方或整理的字符,如果它,的确,完成解析:我m甚至得到之前之前的第一个字符数组('
n
',它是换行符)。 ch
有最初不存在的额外字符:“line
”被追加到“forth line
”。- 当我创建一个全新的解析器实例时,这些字符被“重新读取”。第二个执行应该读的东西,如:
..this ...
D/MyProgram(1565): characters("n", 0, 1)
D/MyProgram(1565): characters(" ", 0, 4)
D/MyProgram(1565): characters("n ", 0, 1)
D/MyProgram(1565): characters(" | ANOTHER FIRST LINE", 0, 20)
D/MyProgram(1565): characters("n | ANOTHER SECOND LINE", 0, 1)
...等等。
任何想法我做错了什么?提前致谢。
看起来像你不尊重开始和长度。 – bmargulies