从TM输出语料库R

我试图从R输出Corpus对象到静态文件。语料库包含通过解析文件系统中现有预处理文件创建的梗概文档。作者描述了为此在他的“介绍文本挖掘在R”的方法，这从TM输出语料库R

> writeCorpus(file)

但我尝试到目前为止只产生以下（第2）：

Error in UseMethod("as.PlainTextDocument", x): 
    no applicable method for 'as.PlainTextDocument' applied to an object of class "character"

我脚本到目前为止非常简单，我希望这可能是一个简单的疏忽。任何建议，非常感谢：这似乎是边缘问题。

# Turn off Java so it doesn't interfere with Weka interface 
Sys.setenv(NOAWT=1) 

# Load required text mining packages 
require(tm) 
require(rJava) 
require(RWeka) 
require(Snowball) 

# Populate a vector with the number of subdirectories in preprocessed dir 
preprocessed <- list.files(path="preprocessed_dir", include.dirs=TRUE, full.names=TRUE) 

# For each element in the vector 
for(i in 1:length(preprocessed)) { 
# Get the files in each subdirectory by appending a number to the absolute path 
    files <- list.files(sprintf("preprocessed_dir/%.0f", i)) 
    # Create a Corpus object of all the files in the subdirectory 
    corpora <- Corpus(VectorSource(files)) 
    # Stem the words in the Corpus object 
    corpora <- tm_map(corpora, SnowballStemmer) 
    # (Try to) write the object to the file system 
    writeCorpus(corpora) 
}

FWIW：调用class(corpora)回报 [1] "VCorpus" "Corpus" "list" 所以对象显然不是类型character

来源

2013-02-27 Ben Piché

实际上它看起来像'语料库'是一个'语料库'类对象之前，然后它成为一个'字符类型对象在词干之后（不能用'writeCorpus'写入）调查方式强制它回到了“语料库”对象！ – 2013-02-27 20:25:23

好吧，我是n00b，所以我不能回答我自己的问题，但是这里是： 'tm'将'tm_map'调用的'Corpus'对象转换为'character'对象'。他们必须被强制回到'Corpus'对象，然后才能通过调用 '> corpora < - Corpus（VectorSource（corpora））将它们写入文件系统 – 2013-02-27 20:34:30

将语料库写入哪个目录？ – 2014-04-17 21:44:28

我wordering为什么要导出语料库。如果您想将文本显示给其他人，则可以使用原文文本。

如果你想导出它并在R中重用它，我的建议是你可以使用save（）函数来保存语料库到.RData中。

然后，如果你想加载它，只需使用load（）函数。

来源

2016-05-26 13:16:16

从TM输出语料库R

回答

相关问题