R tm removeWords停用词不会删除停用词

我正在使用R tm软件包，发现几乎没有任何删除文本元素的tm_map函数正在为我工作。R tm removeWords停用词不会删除停用词

通过 '工作' 我的意思是，例如，我要跑：

d <- tm_map(d, removeWords, stopwords('english'))

但后来当我运行

ddtm <- DocumentTermMatrix(d, control = list(
    weighting = weightTfIdf, 
    minWordLength = 2)) 
findFreqTerms(ddtm, 10)

我仍然得到：

[1] the  this

..等等，还有一堆其他的停用词。

我看不出错误，表明出现了问题。有没有人知道这是什么，以及如何正确地使stopword-删除功能，或诊断出什么是我的错？

UPDATE

存在错误同期增长，我没赶上：

Refreshing GOE props... 
---Registering Weka Editors--- 
Trying to add database driver (JDBC): RmiJdbc.RJDriver - Warning, not in CLASSPATH? 
Trying to add database driver (JDBC): jdbc.idbDriver - Warning, not in CLASSPATH? 
Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - Warning, not in CLASSPATH? 
Trying to add database driver (JDBC): com.mckoi.JDBCDriver - Warning, not in CLASSPATH? 
Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - Warning, not in CLASSPATH? 
[KnowledgeFlow] Loading properties and plugins... 
[KnowledgeFlow] Initializing KF...

这是Weka的是在TM去除停用词，对不对？所以这可能是我的问题？

更新2

从this，这个错误似乎是无关的。这是关于数据库，而不是停用词。

来源

2013-02-07 Mittenchops

您是否尝试过在此处建议的内容：https://stat.ethz.ch/pipermail/r-help/ 2012年2月/ 302479.html？ – Ben

谢谢，但看起来这样只会压制我的错误消息，不帮助weka找到该文件，对不对？ – Mittenchops

没关系，它正在工作。我做了以下最低例子：

data("crude") 
crude[[1]] 
j <- Corpus(VectorSource(crude[[1]])) 
jj <- tm_map(j, removeWords, stopwords('english')) 
jj[[1]]

我曾在系列使用的几个tm_map表达式。事实证明，我已经删除了空格，标点符号等的命令，并将新的停用字串联起来。

来源

2013-02-07 18:57:46 Mittenchops

是的，它可能是邪恶的，以正确的顺序获得这些'tm_map'函数。过去我一直在努力阻止并找到重新排序的顺序来帮助。很高兴你把事情解决了。 – Ben

我在removeWords中的自定义单词基本上有同样的问题。该命令应该是什么？我正在运行stripWhitepace，removePunctuation，removeWords和stemDocument。我想我会想到，但也许值得用正确的顺序更新解决方案。 –

R tm removeWords停用词不会删除停用词

回答

相关问题