2016-05-31 37 views
0

我对R来说比较新,并且正在尝试使用XPath读取XML文件并将其转换为R中的数据帧。我已经找到了一个解决方案,将文件转换成我将能够处理它的列表。但是,我需要我的程序运行速度相对较快。 我已经在w3school.com上检查了这个教程(http://www.w3schools.com/xsl/xpath_nodes.asp)on XPath,但是他们没有解释我在XML文件中找到的符号 我想创建一个包含不同客户及其属性的数据框文件的开始不需要在我的计算XML节点表示法(R中的XPath到数据框)

下面是该文件的摘录:?

$config 
<config> 
    <competition id="0" name="0" pomId="1.3.1-SNAPSHOT" timeslotLength="60" bootstrapTimeslotCount="336" bootstrapDiscardedTimeslots="24" timeslotsOpen="24" deactivateTimeslotsAhead="1" minimumOrderQuantity="0.01" timezoneOffset="-6" latitude="45" simulationRate="720" simulationModulo="3600000"> 
<description/> 
<simulationBaseTime> 
    <iMillis>1255132800000</iMillis> 
</simulationBaseTime> 
<broker>default broker</broker> 
<customer id="4097" name="HighIncome-2_8" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="85.0" multiContracting="false" canNegotiate="false"/> 
<customer id="4100" name="HighIncome-2_9" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="60.0" multiContracting="false" canNegotiate="false"/> 
<customer id="4103" name="HighIncome-2_10" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="60.0" multiContracting="false" canNegotiate="false"/> 
<customer id="4106" name="HighIncome-2_11" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="85.0" multiContracting="false" canNegotiate="false"/> 

如何我指的是每个客户是他们的属性点,属性

回答

0

在XML中,?有两种结构类型保存值:

  1. 元件(也被称为节点或标签),封入与角形托架,其值被保持在开口<element></element>
  2. 属性与@它的值被赋予等于操作员前缀之间

针对您的特殊XML,客户ID人口元素powerTypecustomerClasscontrollableKWupRegulationKWdownRegulationKWstorageCapacitymultiContracting,和canNegotiate作为属性。

就R XML模块,以从xpathSApply()提取的一组值,其XPath 1.0中的功能,就必须指定fun参数作为xmlValue为元素值和xmlAttrs属性值。从那里你可以操纵输出的列表或矩阵进行数据帧迁移。特别为您的需要,您可以简单地将数据提取到矩阵中并转换为最终的数据帧。在XPath表达式中使用double forward slash可在文档中的任意位置查找特定位置,这里是客户。

library(XML) 
xmlstr <- '<config> 
      <competition id="0" name="0" pomId="1.3.1-SNAPSHOT" timeslotLength="60" bootstrapTimeslotCount="336" bootstrapDiscardedTimeslots="24" timeslotsOpen="24" deactivateTimeslotsAhead="1" minimumOrderQuantity="0.01" timezoneOffset="-6" latitude="45" simulationRate="720" simulationModulo="3600000"> 
       <description/> 
       <simulationBaseTime> 
        <iMillis>1255132800000</iMillis> 
       </simulationBaseTime> 
       <broker>default broker</broker> 
       <customer id="4097" name="HighIncome-2_8" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="85.0" multiContracting="false" canNegotiate="false"/> 
       <customer id="4100" name="HighIncome-2_9" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="60.0" multiContracting="false" canNegotiate="false"/> 
       <customer id="4103" name="HighIncome-2_10" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="60.0" multiContracting="false" canNegotiate="false"/> 
       <customer id="4106" name="HighIncome-2_11" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="85.0" multiContracting="false" canNegotiate="false"/> 
      </competition> 
      </config>'  
xml <- xmlParse(xmlstr) 

# MATRIX OF CUSTOMER ATTRIBS 
customerAttribs <- xpathSApply(doc=xml, path="//customer", xmlAttrs) 

# TRANSPOSE TO DATA FRAME 
df <- data.frame(t(customerAttribs)) 

#  id   name population  powerType customerClass controllableKW \ 
# 1 4097 HighIncome-2_8   1 ELECTRIC_VEHICLE   SMALL   -3.3 
# 2 4100 HighIncome-2_9   1 ELECTRIC_VEHICLE   SMALL   -3.3 
# 3 4103 HighIncome-2_10   1 ELECTRIC_VEHICLE   SMALL   -3.3 
# 4 4106 HighIncome-2_11   1 ELECTRIC_VEHICLE   SMALL   -3.3 
# upRegulationKW downRegulationKW storageCapacity multiContracting canNegotiate 
# 1   -3.3    3.3   85.0   false  false 
# 2   -3.3    3.3   60.0   false  false 
# 3   -3.3    3.3   60.0   false  false 
# 4   -3.3    3.3   85.0   false  false