在R中完成任务

我正在研究R中的文本挖掘，这里有几个来自我的语料库的文档，在删除了标点符号，数字，URL和停用词后。在R中完成任务

myStopwords <- setdiff(myStopwords, c("r", "big")) 
myCorpus <- tm_map(myCorpus, removeWords, myStopwords) 
myCorpus <- tm_map(myCorpus, stripWhitespace) 
myCorpusCopy <- myCorpus 
for (i in c(1:2, 320)) 
{ 
    cat(paste0("[", i, "] ")) 
    writeLines(strwrap(as.character(myCorpus[[i]]), 60)) 
} 

[1] examples calling java code r 
[2] simulating mapreduce r big data analysis using flights data 
rbloggers 
[320] r reference card data mining now cran lists many useful r 
functions packages data mining applications

在那之后，我想了如下词干，

myCorpus <- tm_map(myCorpus, stemDocument) 
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy)

当我尝试运行for循环，它显示NA，如下

for (i in c(1:2, 320)) 
{ 
cat(paste0("[", i, "] ")) 
writeLines(strwrap(as.character(myCorpus[[i]]), 60)) 
} 

[1] NA 
[2] NA 
[320] NA

任何想法，我在这里错了吗？

来源

2017-07-02 subro

我复制你的问题一个内置的数据集：

data("crude") 

myCorpus  <- as.VCorpus(crude) 
myCorpusCopy <- myCorpus 
myCorpus <- tm_map(myCorpus, stemDocument) 
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy)

我发现最后一行后myCorpus对象的元素在它们的结构更多的领域，如meta和我的情况content和现在这些元素被命名为字符向量。

，您仍然可以访问的元素：

myCorpus[[1]]

Diamond Shamrock Corp said that\neffect today it had cut it contract price for crude oil by\n1.50 dlrs a barrel.\n The reduct bring it post price for West Texas\nIntermedi to 16.00 dlrs a barrel, the copani said.\n "The price reduct today was made in the light of falling\noil product price and a weak crude oil market," a company\nspokeswoman said.\n Diamond is the latest in a line of U.S. oil compani that\nhav cut it contract, or posted, price over the last two days\ncit weak oil markets.\n Reuter 
                                                                                                                                 "content" 
                                                                                                                                  <NA> 
                                                                                                                                 "meta"

但as.character()方法是打在物体的元素的新结构（str()）从你想的正好相反部分。现在，正文文本显然实际存储为names。

我是能够解决这样的循环：

for (i in c(1:2, length(myCorpus))) 
{ 
    cat(paste0("[", i, "] ")) 
    writeLines(strwrap(as.character(names(myCorpus[[i]])), 60)) 
}

[1] Diamond Shamrock Corp said that effect today it had cut it 
contract price for crude oil by 1.50 dlrs a barrel. The 
reduct bring it post price for West Texas Intermedi to 
16.00 dlrs a barrel, the copani said. "The price reduct 
today was made in the light of falling oil product price 
and a weak crude oil market," a company spokeswoman said. 
Diamond is the latest in a line of U.S. oil compani that 
hav cut it contract, or posted, price over the last two 
days cit weak oil markets. Reuter 

[2] OPEC may be forc to meet befor a schedul June session to 
readdress it product cutting agr if the organ want to halt 
the current slide in oil prices, oil industri analyst said. 
"The movement to higher oil price was never to be as easy a 
OPEC thought. They may need an emerg meet to sort out th 
problems," said Daniel Yergin, director of Cambridg Energy 
Research Associates, CERA. Analyst and oil industri sourc 
said the problem OPEC face is excess oil suppli in world 
oil markets. "OPEC problem is not a price problem but a 
production issu and must be address in that way," said Paul 
Mlotok, oil analyst with Salomon Brother Inc. He said the 
market earlier optim about OPEC and its abl to keep product 
under control have given way to a pessimist outlook that 
the organ must address soon if it wish to regain the initi 
in oil prices. But some other analyst were uncertain that 
even an emerg meet would address the problem of OPEC 
production abov the 15.8 mln bpd quota set last December. 
"OPEC has to learn that in a buyer market you cannot have 
deem quotas, fix price and set differentials," said the 
region manag for one of the major oil compani who spoke on 
condit that he not be named. "The market is now tri to 
teach them that lesson again," he added. David T. Mizrahi, 
editor of Mideast reports, expect OPEC to meet befor June, 
although not immediately. However, he is not optimist that 
OPEC can address it princip problems. "They will not meet 
now as they tri to take advantag of the wint demand to sell 
their oil, but in late March and April when demand 
slackens," Mizrahi said. But Mizrahi said that OPEC is 
unlik to do anyth more than reiter it agreement to keep 
output at 15.8 mln bpd." Analyst said that the next two 
month will be critic for OPEC abil to hold togeth price and 
output. "OPEC must hold to it pact for the next six to 
eight weeks sinc buyer will come back into the market 
then," said Dillard Sprigg of Petroleum Analysi Ltd in New 
York. But Bijan Moussavar-Rahmani of Harvard Univers 
Energy and Environ Polici Center said that the demand for 
OPEC oil ha been rise through the first quarter and this 
may have prompt excess in it production. "Demand for their 
(OPEC) oil is clear abov 15.8 mln bpd and is probabl closer 
to 17 mln bpd or higher now so what we ar see character as 
cheat is OPEC meet this demand through current production," 
he told Reuter in a telephon interview. Reuter 
[20] Argentin crude oil product was down 10.8 pct in Januari 
1987 to 12.32 mln barrels, from 13.81 mln barrel in Januari 
1986, Yacimiento Petrolifero Fiscales said. Januari 1987 
natur gas output total 1.15 billion cubic metrers, 3.6 pct 
higher than 1.11 billion cubic metr produced in Januari 
1986, Yacimiento Petrolifero Fiscal added. Reuter

来源

2017-07-02 04:27:42

在R中完成任务

回答

相关问题