2013-05-16 35 views
3

导入R中的Infopath .XML格式并转换为数据框的最佳方式是什么?如果我在Excel中打开Infopath .XML文件,则数据框的行和列显示正确。将Infopath.XML格式导入到R中的数据框中

这是我尝试中的R使用XML包:

  1. - 稠XMLPARSE()来解析XML文件
  2. 我用xmlToDataFrame()以尝试解析的XML文件转换为一个数据帧

在步骤2中,然而,我收到以下错误:

Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("touch your head13011000", : 
    duplicate subscripts for columns 

但是,当我在Excel中打开XML文件时,似乎没有重复的列。如何将Infopath中的这个XML文件转换为R中的数据框?预期列应(当它们出现在Excel中):

TCID, DateCoded, tcAge, T1_B3, T1_B2, T1_B1, T1_B0, T1_A3, T1_A2, T1_A1, T1_A0, T1_DelayTotal, T2_A3, T2_A2, T2_A1, T2_A, T2_B3, T2_B2, T2_B1, T2_B0, T2_DelayTotal, Coder, notes_t1, note_t2, bachildpres30, baparpres30, bapassptgo, bapassptnogo, bamissgame, P1_B3, P1_B2, P1_B1, P1_B0, P1_A3, P1_A2, P1_A1, P1_A0, P1_DelayTotal, P1_action, P1_go-nogo, P1_score, P1_delay, P1_trial, P1_Ecommand, P1_imitation, P1_restraint, P1_ruleswitch, P1_trials, P1_gotrials, P1_nogotrials, T1_gotrials, T1_nogotrials, T1_trials, T2_gotrials, T2_nogotrials, T2_trials, P1_notplay, T1_trial, T1_go-nogo, T1_score, T1_delay, T1_action, T2_trial, T2_go-nogo, T2_score, T2_delay, T2_action 

对于XML文件中多次出现的变量,我想他们是在漫长的形式(即多个数据帧行为相同的变量)。我对XML文件没有多少经验,所以非常感谢您的帮助。

下面就是解析XML文件看起来像R'当我使用XMLPARSE:

<my:myFields lang="en-us" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:st="urn:schemas-microsoft-com:office:smarttags" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2009-07-01T18:12:59" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003"> 
<my:SPSS> 
    <my:TCID>10</my:TCID> 
    <my:DateCoded>2013-04-01</my:DateCoded> 
    <my:tcAge>30</my:tcAge> 
    <my:T1_B3>6</my:T1_B3> 
    <my:T1_B2>0</my:T1_B2> 
    <my:T1_B1>0</my:T1_B1> 
    <my:T1_B0>0</my:T1_B0> 
    <my:T1_A3>0</my:T1_A3> 
    <my:T1_A2>0</my:T1_A2> 
    <my:T1_A1>1</my:T1_A1> 
    <my:T1_A0>5</my:T1_A0> 
    <my:T1_DelayTotal>1</my:T1_DelayTotal> 
    <my:T2_A3 nil="true"/> 
    <my:T2_A2 nil="true"/> 
    <my:T2_A1 nil="true"/> 
    <my:T2_A0 nil="true"/> 
    <my:T2_B3 nil="true"/> 
    <my:T2_B2 nil="true"/> 
    <my:T2_B1 nil="true"/> 
    <my:T2_B0 nil="true"/> 
    <my:T2_DelayTotal nil="true"/> 
    <my:Coder>Name</my:Coder> 
</my:SPSS> 
<my:notes_t1/> 
<my:note_t2/> 
<my:bachildpres30>0</my:bachildpres30> 
<my:baparpres30>0</my:baparpres30> 
<my:bapassptgo>1</my:bapassptgo> 
<my:bapassptnogo>0</my:bapassptnogo> 
<my:bamissgame>0</my:bamissgame> 
<my:P1_B3>4</my:P1_B3> 
<my:P1_B2>0</my:P1_B2> 
<my:P1_B1>0</my:P1_B1> 
<my:P1_B0>1</my:P1_B0> 
<my:P1_A3>0</my:P1_A3> 
<my:P1_A2>0</my:P1_A2> 
<my:P1_A1>1</my:P1_A1> 
<my:P1_A0>3</my:P1_A0> 
<my:P1_DelayTotal>0</my:P1_DelayTotal> 
<my:group2> 
    <my:group3> 
    <my:P1_action>touch your head</my:P1_action> 
    <my:P1_go-nogo>1</my:P1_go-nogo> 
    <my:P1_score>3</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>1</my:P1_trial> 
    <my:P1_Ecommand>1</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your nose</my:P1_action> 
    <my:P1_go-nogo>1</my:P1_go-nogo> 
    <my:P1_score>3</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>2</my:P1_trial> 
    <my:P1_Ecommand>1</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your tummy</my:P1_action> 
    <my:P1_go-nogo>1</my:P1_go-nogo> 
    <my:P1_score>3</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>3</my:P1_trial> 
    <my:P1_Ecommand>1</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your head</my:P1_action> 
    <my:P1_go-nogo>1</my:P1_go-nogo> 
    <my:P1_score>0</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>4</my:P1_trial> 
    <my:P1_Ecommand>0</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your head</my:P1_action> 
    <my:P1_go-nogo>1</my:P1_go-nogo> 
    <my:P1_score>3</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>5</my:P1_trial> 
    <my:P1_Ecommand>0</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your nose</my:P1_action> 
    <my:P1_go-nogo>1</my:P1_go-nogo> 
    <my:P1_score>3</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>6</my:P1_trial> 
    <my:P1_Ecommand>0</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>clap your hands</my:P1_action> 
    <my:P1_go-nogo>1</my:P1_go-nogo> 
    <my:P1_score>3</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>7</my:P1_trial> 
    <my:P1_Ecommand>0</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your nose</my:P1_action> 
    <my:P1_go-nogo>0</my:P1_go-nogo> 
    <my:P1_score>0</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>8</my:P1_trial> 
    <my:P1_Ecommand>0</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your ears</my:P1_action> 
    <my:P1_go-nogo>0</my:P1_go-nogo> 
    <my:P1_score>0</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>9</my:P1_trial> 
    <my:P1_Ecommand>0</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your tummy</my:P1_action> 
    <my:P1_go-nogo>0</my:P1_go-nogo> 
    <my:P1_score>0</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>10</my:P1_trial> 
    <my:P1_Ecommand>0</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your eyes</my:P1_action> 
    <my:P1_go-nogo>0</my:P1_go-nogo> 
    <my:P1_score>1</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>11</my:P1_trial> 
    <my:P1_Ecommand>0</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>1</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
    <my:group3> 
    <my:P1_action>touch your eyes</my:P1_action> 
    <my:P1_go-nogo>1</my:P1_go-nogo> 
    <my:P1_score>3</my:P1_score> 
    <my:P1_delay>0</my:P1_delay> 
    <my:P1_trial>12</my:P1_trial> 
    <my:P1_Ecommand>0</my:P1_Ecommand> 
    <my:P1_imitation>0</my:P1_imitation> 
    <my:P1_restraint>0</my:P1_restraint> 
    <my:P1_ruleswitch>0</my:P1_ruleswitch> 
    </my:group3> 
</my:group2> 
<my:P1_trials>9</my:P1_trials> 
<my:P1_gotrials>5</my:P1_gotrials> 
<my:P1_nogotrials>4</my:P1_nogotrials> 
<my:T1_gotrials>6</my:T1_gotrials> 
<my:T1_nogotrials>6</my:T1_nogotrials> 
<my:T1_trials>12</my:T1_trials> 
<my:T2_gotrials>0</my:T2_gotrials> 
<my:T2_nogotrials>0</my:T2_nogotrials> 
<my:T2_trials>0</my:T2_trials> 
<my:P1_notplay/> 
<my:group4> 
    <my:group5> 
    <my:T1_trial>1</my:T1_trial> 
    <my:T1_go-nogo>1</my:T1_go-nogo> 
    <my:T1_score>3</my:T1_score> 
    <my:T1_delay>1</my:T1_delay> 
    <my:T1_action>Touch your tongue</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>2</my:T1_trial> 
    <my:T1_go-nogo>1</my:T1_go-nogo> 
    <my:T1_score>3</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Touch your teeth</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>3</my:T1_trial> 
    <my:T1_go-nogo>0</my:T1_go-nogo> 
    <my:T1_score>0</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Touch your ear</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>4</my:T1_trial> 
    <my:T1_go-nogo>1</my:T1_go-nogo> 
    <my:T1_score>3</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Clap your hands</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>5</my:T1_trial> 
    <my:T1_go-nogo>0</my:T1_go-nogo> 
    <my:T1_score>0</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Clap your hands</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>6</my:T1_trial> 
    <my:T1_go-nogo>0</my:T1_go-nogo> 
    <my:T1_score>0</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Touch your eyes</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>7</my:T1_trial> 
    <my:T1_go-nogo>0</my:T1_go-nogo> 
    <my:T1_score>0</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Touch your feet</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>8</my:T1_trial> 
    <my:T1_go-nogo>1</my:T1_go-nogo> 
    <my:T1_score>3</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Touch your nose</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>9</my:T1_trial> 
    <my:T1_go-nogo>0</my:T1_go-nogo> 
    <my:T1_score>1</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Touch your nose</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>10</my:T1_trial> 
    <my:T1_go-nogo>1</my:T1_go-nogo> 
    <my:T1_score>3</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Touch your tummy</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>11</my:T1_trial> 
    <my:T1_go-nogo>0</my:T1_go-nogo> 
    <my:T1_score>0</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Wave your hand</my:T1_action> 
    </my:group5> 
    <my:group5> 
    <my:T1_trial>12</my:T1_trial> 
    <my:T1_go-nogo>1</my:T1_go-nogo> 
    <my:T1_score>3</my:T1_score> 
    <my:T1_delay>0</my:T1_delay> 
    <my:T1_action>Touch your head</my:T1_action> 
    </my:group5> 
</my:group4> 
<my:group6> 
    <my:group7> 
    <my:T2_trial>1</my:T2_trial> 
    <my:T2_go-nogo>0</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Touch your tongue</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>2</my:T2_trial> 
    <my:T2_go-nogo>0</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Touch your teeth</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>3</my:T2_trial> 
    <my:T2_go-nogo>1</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Touch your ear</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>4</my:T2_trial> 
    <my:T2_go-nogo>0</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Clap your hands</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>5</my:T2_trial> 
    <my:T2_go-nogo>1</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Clap your hands</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>6</my:T2_trial> 
    <my:T2_go-nogo>1</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Touch your eyes</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>7</my:T2_trial> 
    <my:T2_go-nogo>1</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Touch your feet</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>8</my:T2_trial> 
    <my:T2_go-nogo>0</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Touch your nose</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>9</my:T2_trial> 
    <my:T2_go-nogo>1</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Touch your nose</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>10</my:T2_trial> 
    <my:T2_go-nogo>0</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Touch your tummy</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>11</my:T2_trial> 
    <my:T2_go-nogo>1</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Wave your hand</my:T2_action> 
    </my:group7> 
    <my:group7> 
    <my:T2_trial>12</my:T2_trial> 
    <my:T2_go-nogo>0</my:T2_go-nogo> 
    <my:T2_score/> 
    <my:T2_delay>0</my:T2_delay> 
    <my:T2_action>Touch your head</my:T2_action> 
    </my:group7> 
</my:group6> 
</my:myFields> 
+0

你能提供一些关于你的预期结果的更多信息吗?您所说的预期列名只能在“SPSS”节点中找到。所有其他节点都有名称以“P1”,“P2”,“T1”等开头的值。是否所有1都应该一起出现?所有的P?那些应该如何映射到SPSS节点中包含的值? – SchaunW

+0

嗨Schaun,我刚刚添加了更多关于预期结果的细节,并列出了最终结果中的变量以及在有多个变量的情况下使用的数据框架结构(长格式)。这有帮助吗?非常感谢你看这个。我不知道从哪里开始。 – dadrivr

回答

6

根据我的经验,xmlToDataFrame当XML在一个非常一致的方式已经结构才有效。您正在使用的数据以多种不同方式构建:

# Assuming you've already read your data into a character vector called `xml_file  
xml_file <- xmlParse(xml_file) 
xml_file <- xmlToList(xml_file) 

stack(sapply(xml_file, length)) 
    values   ind 
1  22   SPSS 
2  0  notes_t1 
3  0  note_t2 
4  1 bachildpres30 
5  1 baparpres30 
6  1 bapassptgo 
7  1 bapassptnogo 
8  1 bamissgame 
9  1   P1_B3 
10  1   P1_B2 
11  1   P1_B1 
12  1   P1_B0 
13  1   P1_A3 
14  1   P1_A2 
15  1   P1_A1 
16  1   P1_A0 
17  1 P1_DelayTotal 
18  12  group2 
19  1  P1_trials 
20  1 P1_gotrials 
21  1 P1_nogotrials 
22  1 T1_gotrials 
23  1 T1_nogotrials 
24  1  T1_trials 
25  1 T2_gotrials 
26  1 T2_nogotrials 
27  1  T2_trials 
28  0 P1_notplay 
29  12  group4 
30  12  group6 
31  1  .attrs 

因此,大部分节点都包含单个值。一些是空的。 “SPSS”节点包含22个值,全部使用不同的名称,“group2”,“group4”和“group6”全部包含12个节点,每个节点包含多个值,但节点间值相似。当我查看Excel导入文件时做了什么后,它将12个节点的组件堆叠在一起,然后将所有22个“SPSS”组件与所有单值节点一起串起,并重复该字符串与堆叠12节点组件所创建的行数一样多,然后将这两个组件按列排列在一起。

为了做到这一点,分离出从12节点组块的长字符串:

xml_file_singles <- xml_file[sapply(xml_file, length) != 12] 
xml_file_singles[sapply(xml_file_singles, length) == 0] <- NA 
xml_file_singles <- unlist(xml_file_singles) 

xml_file_multiples <- xml_file[sapply(xml_file, length) == 12] 

现在采取的12节点的块,把每个块成数据帧:

xml_file_multiples <- lapply(1:length(xml_file_multiples), function(i) { 

    x <- lapply(xml_file_multiples[[i]], function(y) { 
    data.frame(as.list(unlist(y)), stringsAsFactors = FALSE)}) 
    x <- do.call("rbind", x) 
    cbind("group" = names(xml_file_multiples)[i], x) 
}) 

现在使用plyr包的rbind.fill功能把所有的新数据帧一起:

require(plyr) 

xml_file_multiples <- do.call("rbind.fill", xml_file_multiples) 

现在cbind你的价值观的长串到你绑定的dataframes:

xml_final <- cbind(as.list(xml_file_singles), xml_file_multiples, 
    stringsAsFactors = FALSE) 

这种方法,如Excel的,引入了一大堆的NAS,因为你的不同的12个节点块列名都略有不同。如果在调用rbind.fill这样做过:

xml_file_multiples <- lapply(1:length(xml_file_multiples), function(i) { 

    x <- lapply(xml_file_multiples[[i]], function(y) { 
    data.frame(as.list(unlist(y)), stringsAsFactors = FALSE)}) 
    x <- do.call("rbind", x) 
    x <- cbind("group" = names(xml_file_multiples)[i], x) 
    colnames(x) <- gsub("^\\w\\d_", "", colnames(x)) 
    x 
}) 

你会产生较少的NA的,因为你会产生较少的冗余列,但随后你就必须依靠价值观中的“组”列来跟踪其行最初出现在哪个节点。

+0

非常有帮助的答案。清晰,评论良好。谢谢! – dadrivr

+0

这个答案值得的不仅仅是它收到的两个upvotes。 –