0
我正在使用r中的R获得类别(维基百科页面的底部大部分)。我已经使用SelectorGadget来标识用于类别提取的html节点。我使用的代码如下如何使用Rvest中的R获取Wikipedia中的“Categories”?
thepage <- read_html("https://en.wikipedia.org/wiki/San_Diego")
Categories <- thepage %>%
html_nodes("#mw-normal-catlinks") %>%
html_text()
Categories
得到的结果如下:
"Categories: San Diego1769 establishments in California1850 establishments in CaliforniaCities in San Diego County, CaliforniaCounty seats in CaliforniaIncorporated cities and towns in CaliforniaPopulated coastal places in CaliforniaPopulated places established in 1769San Antonio-San Diego Mail LineSan Diego County, CaliforniaSan Diego metropolitan areaSpanish mission settlements in North AmericaSpecial economic zones of the United StatesStagecoach stops in the United States"
正如你可以看到,有没有分隔符的类别区分。第一类是“圣地亚哥”,第二类是“加利福尼亚州的1769个机构”。我如何在列表中获得这些类别或以某种方式分离?