我目前正在努力从一个XML文件中的数据导入到R.导入XML数据与R与遗漏值
的XML文件有,我想在一个数据帧的单排多个记录。示例记录:
<rec resultID="5">
<header shortDbName="psyh" longDbName="PsycINFO" uiTerm="2015-99210-426">
<controlInfo>
<bkinfo>
<btl>The impact of zoo live animal presentations on students' propensity to engage in conservation behaviors.</btl>
<aug />
<isbn>9781321491562</isbn>
</bkinfo>
<chapinfo />
<revinfo />
<dissinfo>
<disstl>The impact of zoo live animal presentations on students' propensity to engage in conservation behaviors.</disstl>
</dissinfo>
<jinfo>
<jtl>Dissertation Abstracts International Section A: Humanities and Social Sciences</jtl>
<issn type="Print">04194209</issn>
</jinfo>
<pubinfo>
<dt year="2015" month="01" day="01">20150101</dt>
<vid>76</vid>
<iid>5-A(E)</iid>
</pubinfo>
<artinfo>
<ui type="umi">AAI3671924</ui>
<tig>
<atl>The impact of zoo live animal presentations on students' propensity to engage in conservation behaviors.</atl>
</tig>
<aug>
<au>Kirchgessner, Mandy L.</au>
</aug>
<sug>
<subj type="major">Animals</subj>
<subj type="major">Hope</subj>
<subj type="minor">Conservation (Ecological Behavior)</subj>
<subj type="minor">Outreach Programs</subj>
<subj type="minor">Psychological Development</subj>
</sug>
<ab>Zoos frequently deploy outreach programs, often called "Zoomobiles," to schools; these programs incorporate zoo resources, such as natural artifacts and live animals, in order to teach standardized content and in hopes of inspiring students to protect the environment. Educational research at zoos is relatively rare, and research on their outreach programs is non-existent. This leaves zoos vulnerable to criticisms as they have little to no evidence that their strategies support their missions, which target conservation outcomes. This study seeks to shed light on this gap by analyzing the impact that live animals have on offsite program participants' interests in animals and subsequent conservation outcomes. The theoretical lens is derived from the field of Conservation Psychology, which believes personal connections with nature serve as the motivational component to engagement with conservation efforts. Using pre, post, and delayed surveys combined with Zoomobile presentation observations, I analyzed the roles of sensory experiences in students' (N=197) development of animal interest and conservation behaviors. Results suggest that touching even one animal during presentations has a significant impact on conservation intents and sustainment of those intents. Although results on interest outcomes are conflicting, this study points to ways this kind of research can make significant contributions to zoo learning outcomes. Other significant variables, such as emotional predispositions and animal-related excitement, are discussed in light of future research directions. (PsycINFO Database Record (c) 2015 APA, all rights reserved)</ab>
<pubtype>Dissertation Abstract</pubtype>
<doctype>Dissertation</doctype>
</artinfo>
<language>English</language>
</controlInfo>
<displayInfo>
<pLink>
<url>http://search.ebscohost.com/login.aspx?direct=true&db=psyh&AN=2015-99210-426&site=ehost-live&scope=site</url>
</pLink>
</displayInfo>
</header>
</rec>
我尝试了以下方法,但它可以让大数据集变慢。此外,当节点缺少数据时,我希望函数为给定的行/记录返回“NA”,但是我不认为这可以通过此函数完成?
title <- xmlToDataFrame(nodes = getNodeSet(xmltop, "//atl"), stringsAsFactors = FALSE)
author <- xmlToDataFrame(nodes = getNodeSet(xmltop, "//artinfo/aug/au[1]"), stringsAsFactors = FALSE)
abstract <- xmlToDataFrame(nodes = getNodeSet(xmltop, "//artinfo/ab[1]"), stringsAsFactors = FALSE)
year <- xmlToDataFrame(nodes = getNodeSet(xmltop, "//pubinfo/dt"), stringsAsFactors = FALSE)
我试图按照指示在这里R dataframe from XML when values are multiple or missing没有成功:
doc = xmlParse(file.choose(), useInternalNodes = TRUE)
do.call(rbind, xpathApply(xmltop, "/rec", function(node) {
auth <- xmlValue(node[["artinfo/aug/au[1]"]])
if (is.null(auth)) auth <- NA
year <- xmlValue(node[["//pubinfo/dt"]])
if (is.null(year)) year <- NA
title <- xmlValue(node[["//atl"]])
if (is.null(title)) title <- NA
abstract <- xmlValue(node[["//artinfo/ab[1]"]])
if (is.null(abstract)) abstract <- NA
data.frame(auth, year, title, abstract, stringsAsFactors = FALSE)
}))
我仍然不是很acquitanted使用XPath和R但我想有某种问题与“节点“位上面?
你有一个通用的语言(C#, Java,Perl,PHP,Python,甚至包含MS Excel/Ac的VBA )与R安装?这些语言可以运行XSLT,它可以使用'xmlToDataFrame()'将XML重新设计为更简单的R导入格式? – Parfait
xmtodataframeframe工程(我用它上面)。我有VBA/Python。我尝试导入使用Excel,但是这使用多行每个/ REC节点wheras我只想要每行节点一行。 – user3084100