0
我想从我的系统上传文件夹Shiny App
以得到Corpus的Document Term Matrix
以应用K-means
。
我尝试了各种方法来做到这一点,但我无法建立所有上传文件之间建立连接以创建语料库。
我能够通过在全球环境中创建语料库来应用K-means,但是我想通过ShinyApp上传文件夹或选择多个文件来完成此操作。如何上传ShinyApp中的文本文件夹以获取R中文件语料库中的文档术语矩阵?
下面是我做了什么至今代码:我可以上传多个文件
library(shiny)
library(shinydashboard)
library(shinythemes)
library(shinyFiles)
library(tm)
ui <- dashboardPage(
dashboardHeader(title = "Document_Clustering"),
dashboardSidebar(
sidebarMenu(
menuItem("Data Processing", tabName = "DP", icon = icon("info-circle")),
menuItem("K-Means", tabName = "KMeans", icon = icon("th"))
)),
dashboardBody(
tabItems(
tabItem(tabName = "DP",
fluidRow(
box(fileInput('file1', 'Choose Files',
accept=c('text/csv',
'text/comma-separated-values,text/plain',
'.csv'), multiple = TRUE)
, solidHeader = TRUE))
,fluidRow(
box(title = "Pre-processing", width = 15 ,tableOutput('proc'))
)
),
tabItem(tabName = "KMeans",
fluidRow(
box(
title = "Enter Number of Clusters:",
selectInput("C", choices =c(seq(1 , 15, 1)),label = NULL ,selected = 1), solidHeader = TRUE
)),
fluidRow(box(title = "Cluster", width = 9, textOutput("cluster1"))),
fluidRow(box(title = "Cluster Size", width = 9, textOutput("size1"))),
fluidRow(box(title= "Between Cluster Hetrogeneity" , width=9, textOutput("hetro1")))
)
)))
server <- shinyServer(function(input, output, session){
myData <- reactive({
inFile <- input$file1
if (is.null(inFile)) return(NULL)
con<- file(inFile$datapath, open="rt", encoding = "UTF-8")
text<-readLines(con)
msg<- paste(text, collapse = "\n")
close(con)
msg<- msg
myCorpus <- Corpus(VectorSource(msg))
myCorpus <- tm_map(myCorpus, tolower)
myCorpus <- tm_map(myCorpus, PlainTextDocument)
myCorpus<- tm_map(myCorpus,removePunctuation)
myCorpus <- tm_map(myCorpus, removeNumbers)
myCorpus <- tm_map(myCorpus, removeWords,stopwords("english"))
myCorpus <- tm_map(myCorpus, stripWhitespace)
dtm <- DocumentTermMatrix(myCorpus,control = list(minWordLength = 1))
dtm_tfxidf <- weightTfIdf(dtm)
m11 <- as.matrix(dtm_tfxidf)
ri <- m11
set.seed(1234)
### Only kmeans
n2 <- input$C
clusk <- kmeans(as.data.frame(ri), n2) #, nstart = 9)
T3<- list(Name= m11, Cluster_K=clusk$cluster, Size_K= clusk$size, Hetro_K=clusk$betweenss/clusk$totss*100)
})
output$proc <- renderTable({
myData()$Name
})
output$cluster1 <- renderText({
myData()$Cluster_K
})
output$size1 <- renderText({
myData()$Size_K
})
output$hetro1 <- renderText({
myData()$Hetro_K
})
})
shinyApp(ui= ui, server = server)
使用上面的代码,但我在它进一步加工得到错误。 错误:我无法解决无效的'description'参数。
此外,当我只上传单个文件,然后一切似乎工作,但我没有得到为什么群集大小为2的kmeans
为单个文件。
任何形式的帮助,非常感谢。
在此先感谢!