2017-08-01 20 views
0

我想从我的系统上传文件夹Shiny App以得到Corpus的Document Term Matrix以应用K-means
我尝试了各种方法来做到这一点,但我无法建立所有上传文件之间建立连接以创建语料库。
我能够通过在全球环境中创建语料库来应用K-means,但是我想通过ShinyApp上传文件夹或选择多个文件来完成此操作。如何上传ShinyApp中的文本文件夹以获取R中文件语料库中的文档术语矩阵?

下面是我做了什么至今代码:我可以上传多个文件

library(shiny) 
library(shinydashboard) 
library(shinythemes) 
library(shinyFiles) 
library(tm) 

ui <- dashboardPage(
    dashboardHeader(title = "Document_Clustering"), 
    dashboardSidebar( 
    sidebarMenu(
     menuItem("Data Processing", tabName = "DP", icon = icon("info-circle")), 
     menuItem("K-Means", tabName = "KMeans", icon = icon("th")) 
)), 
    dashboardBody(
    tabItems(
     tabItem(tabName = "DP", 
     fluidRow(
      box(fileInput('file1', 'Choose Files', 
         accept=c('text/csv', 
           'text/comma-separated-values,text/plain', 
           '.csv'), multiple = TRUE) 
      , solidHeader = TRUE)) 
    ,fluidRow(
    box(title = "Pre-processing", width = 15 ,tableOutput('proc')) 
) 

), 


    tabItem(tabName = "KMeans", 
      fluidRow(
      box(
       title = "Enter Number of Clusters:", 
       selectInput("C", choices =c(seq(1 , 15, 1)),label = NULL ,selected = 1), solidHeader = TRUE 
      )), 
      fluidRow(box(title = "Cluster", width = 9, textOutput("cluster1"))), 
      fluidRow(box(title = "Cluster Size", width = 9, textOutput("size1"))), 
      fluidRow(box(title= "Between Cluster Hetrogeneity" , width=9, textOutput("hetro1"))) 

) 
))) 

server <- shinyServer(function(input, output, session){ 
    myData <- reactive({ 
    inFile <- input$file1 
    if (is.null(inFile)) return(NULL) 

con<- file(inFile$datapath, open="rt", encoding = "UTF-8") 
text<-readLines(con) 
msg<- paste(text, collapse = "\n") 
close(con) 
msg<- msg 


myCorpus <- Corpus(VectorSource(msg)) 
myCorpus <- tm_map(myCorpus, tolower) 
myCorpus <- tm_map(myCorpus, PlainTextDocument) 
myCorpus<- tm_map(myCorpus,removePunctuation) 
myCorpus <- tm_map(myCorpus, removeNumbers) 
myCorpus <- tm_map(myCorpus, removeWords,stopwords("english")) 
myCorpus <- tm_map(myCorpus, stripWhitespace) 
dtm <- DocumentTermMatrix(myCorpus,control = list(minWordLength = 1)) 
dtm_tfxidf <- weightTfIdf(dtm) 
m11 <- as.matrix(dtm_tfxidf) 
ri <- m11 


set.seed(1234) 
### Only kmeans 
n2 <- input$C 
clusk <- kmeans(as.data.frame(ri), n2) #, nstart = 9) 

T3<- list(Name= m11, Cluster_K=clusk$cluster, Size_K= clusk$size, Hetro_K=clusk$betweenss/clusk$totss*100) 
    }) 

    output$proc <- renderTable({ 
    myData()$Name 
    }) 

    output$cluster1 <- renderText({ 
    myData()$Cluster_K 

    }) 

    output$size1 <- renderText({ 
    myData()$Size_K 

    }) 

    output$hetro1 <- renderText({ 
    myData()$Hetro_K 
    }) 

    }) 

shinyApp(ui= ui, server = server) 

使用上面的代码,但我在它进一步加工得到错误。 错误:我无法解决无效的'description'参数
此外,当我只上传单个文件,然后一切似乎工作,但我没有得到为什么群集大小为2的kmeans为单个文件。

任何形式的帮助,非常感谢。
在此先感谢!

回答

0

我们无法使用某些功能连接所有文件,并且该代码中缺少该功能。

为了使ShinyApp做工精细,更改低于服务器部分

替换此

con<- file(inFile$datapath, open="rt", encoding = "UTF-8") 
text<-readLines(con) 
msg<- paste(text, collapse = "\n") 
close(con) 
msg<- msg 

myCorpus <- Corpus(VectorSource(msg)) 

有了这个

get.msg <- function(path) 
{ 
    con <- file(path, open = "rt", encoding = "latin1") 
    text <- readLines(con) 
    msg <- text[seq(which(text == "")[1] + 1, length(text), 1)] 
    close(con) 
    return(paste(msg, collapse = "\n")) 
} 

data.docs <- inFile$datapath 
data.docs <- data.docs[which(data.docs != "cmds")] 
all.data <- sapply(data.docs, 
        function(p) get.msg(file.path(p))) 

myCorpus <- Corpus(VectorSource(all.data)) 
相关问题