提取字数从XML文件

（这一问题与前一个问题我刚才计算器上发布...这里是链接提取字数从XML文件

Extracting Values From an XML File Either using XPath, SAX or DOM for this Specific Scenario）

的问题是，保持在上述情况下介意，而不是让句子，如果我想获得每个参与者写在所有句子的话。例如。如果“预算”这个词总共用了十次，参加者'Dolske'用了七次，其他人用了三次。所以我需要所有单词的列表以及每个参与者写多少次？还有每个回合中的单词列表？

实现此目标的最佳策略是什么？任何示例代码？

的XML接在这里（你也可以检查它在提及问题）

“（495584）的Firefox - 搜索建议通过以前的错误的结果，形成历史”

<Turn> 
    <Date>'2009-06-14 18:55:25'</Date> 
    <From>'Justin Dolske'</From> 
    <Text> 
    <Sentence ID = "3.1"> Created an attachment (id=383211) [details] Patch v.2</Sentence> 
    <Sentence ID = "3.2"> Ah. So, there's a ._formHistoryResult in the....</Sentence> 
    <Sentence ID = "3.3"> The simple fix it to just discard the service's form history result.</Sentence> 
    <Sentence ID = "3.4"> Otherwise it's trying to use a old form history result that no longer applies for the search string.</Sentence> 
    </Text> 
</Turn> 

<Turn> 
    <Date>'2009-06-19 12:07:34'</Date> 
    <From>'Gavin Sharp'</From> 
    <Text> 
    <Sentence ID = "4.1"> (From update of attachment 383211 [details])</Sentence> 
    <Sentence ID = "4.2"> Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence> 
    </Text> 
</Turn> 

<Turn> 
    <Date>'2009-06-19 13:17:56'</Date> 
    <From>'Justin Dolske'</From> 
    <Text> 
    <Sentence ID = "5.1"> (In reply to comment #3)</Sentence> 
    <Sentence ID = "5.2"> &amp;gt; (From update of attachment 383211 [details] [details])</Sentence> 
    <Sentence ID = "5.3"> &amp;gt; Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence> 
    <Sentence ID = "5.4"> Good point.</Sentence> 
    <Sentence ID = "5.5"> I renamed the one in the wrapper to _formHistResult. </Sentence> 
    <Sentence ID = "5.6"> fhResult seemed maybe a bit too short.</Sentence> 
    </Text> 
</Turn>

。 .... 等

帮助，将不胜感激......

来源

2012-10-29 Skipper07

确实有帮助吗？ – user

获取所有的值，更好地使用sTax解析器，这对于这样的任务是很好的。然后用词语分解所有的句子，做你想做的事。就像使用Class Turn创建模型一样，在那里存储作者和句子，为此课程编写服务并继续。 :)

若要在单词中分割句子，请使用split（）或StringTokenizer，但不推荐使用标记器。使用分裂，创建临时数组，像

stringArray = sentence.toString().split(" ");

或像 “sentence.getValue（）”，等等。

其中您在方法参数中放置了regEx。在你的情况下，这是一个简单的空间，因为它分裂了句子。然后，你可以仔细检查一下你的需要。

在ArrayList的情况下，使用List.toArray（）在数组视图中获取您的列表。

来源

2012-10-29 08:23:28 user

我已经有每个参与者ArrayList sentenceList的句子列表。有没有办法从每个句子中得到所有单词？这是一个很难的方法吗？我只是避免再次编写代码... – Skipper07

我的朋友，有很多方法可以用单词分解参数。我会编辑我的答案。 – user

提取字数从XML文件

回答

相关问题