2017-06-13 24 views
1

我有一些在线订单数据为XML。我想和订单,销售,退货总数的报告等使用R从XML数据生成销售报告?转换为数据框?

<ArrayOfItem> 
<Item> 
<total>333.3</total> 
<terminalid>1</terminalid> 
<subtotal>330</subtotal> 
<storeid>1000</storeid> 
<itemlist> 
<TransactionLine><LineNumber>1</LineNumber><Name>Moto G Turbo Edition Black</Name><ItemUPC>5479892348535</ItemUPC><Quantity>1</Quantity><SalePrice>330</SalePrice><IndividualPrice>330</IndividualPrice><CreatedDate>2017-06-13T09:42:52.1411148Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>3.3</TotalTax><AppliedTaxes><LineTax><TaxId>0</TaxId><Amount>0</Amount><CreatedDate>0001-01-01T00:00:00</CreatedDate></LineTax></AppliedTaxes><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine> 
</itemlist> 
<transactiontenders>1</transactiontenders> 
<transactiontenders>2</transactiontenders> 
<transactiontenders>4</transactiontenders> 
<transactiontype>1</transactiontype> 
<transdate>2017-06-13T09:52:54Z</transdate> 
<transtime>09:52</transtime> 
</Item> 
<Item> 
<total>343.59</total> 
<terminalid>1</terminalid> 
<subtotal>340.29</subtotal> 
<storeid>1000</storeid> 
<itemlist> 
<TransactionLine><LineNumber>1</LineNumber><Name>Moto G Turbo Edition Black</Name><ItemUPC>5479892348535</ItemUPC><Quantity>1</Quantity><SalePrice>330</SalePrice><IndividualPrice>330</IndividualPrice><CreatedDate>2017-06-13T09:53:00.8548823Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>3.3</TotalTax><AppliedTaxes><LineTax><TaxId>0</TaxId><Amount>0</Amount><CreatedDate>0001-01-01T00:00:00</CreatedDate></LineTax></AppliedTaxes><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine> 
<TransactionLine><LineNumber>2</LineNumber><Name>This Was A Man</Name><ItemUPC>777221028297</ItemUPC><Quantity>1</Quantity><SalePrice>4.99</SalePrice><IndividualPrice>4.99</IndividualPrice><CreatedDate>2017-06-13T09:53:07.8263895Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>0</TotalTax><AppliedTaxes /><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine> 
<TransactionLine><LineNumber>3</LineNumber><Name>A Prisoner of Birth</Name><ItemUPC>4000111222302</ItemUPC><Quantity>1</Quantity><SalePrice>5.3</SalePrice><IndividualPrice>5.3</IndividualPrice><CreatedDate>2017-06-13T09:53:11.124866Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>0</TotalTax><AppliedTaxes /><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine> 
</itemlist> 
<transactiontenders>1</transactiontenders><transactiontenders>2</transactiontenders> 
<transactiontype>1</transactiontype> 
<transdate>2017-06-13T09:53:29Z</transdate> 
<transtime>09:53</transtime> 
</Item> 
</ArrayOfItem> 

我做了这样的事情:

library(XML) 
y <- xmlToDataFrame('C:\\App\\06122017.XML') 
nrow(y) # To get total number of order 
doc = xmlInternalTreeParse('C:\\App\\06122017.XML') 
transactionlineItems <- xpathSApply(doc, '//TransactionLine') # list 
transactionlineItems 

我尝试这样得到的总和,但它不起作用。

colSums(y[,c("total")]) # not working 

transactionlineItems是XML元素,从中我想得出一个数据帧,应用一些逻辑(查看是否在特定项目的销售或收益),并为销售单独总数的列表并返回。此外,获得每个产品的数量,以查看哪个产品销售得更多。现在我正在做这个浏览器端,通过将逻辑应用于JSON格式的相同数据。我想将它移到服务器端并选择了R编程。

回答

0

如果你真的有你的热量的数据帧转换设置:

你在正确的轨道上。这个答案结合了你的想法xmlToDataFramexpathSApply。您应该小心确保数字值不作为字符或甚至因素处理。

library(XML) 

order.xml.string <- '<?xml version="1.0" encoding="UTF-8"?> 
<ArrayOfItem> 
<Item> 
<total>333.3</total> 
<terminalid>1</terminalid> 
<subtotal>330</subtotal> 
<storeid>1000</storeid> 
<itemlist> 
<TransactionLine> 
<LineNumber>1</LineNumber> 
<Name>Moto G Turbo Edition Black</Name> 
<ItemUPC>5479892348535</ItemUPC> 
<Quantity>1</Quantity> 
<SalePrice>330</SalePrice> 
<IndividualPrice>330</IndividualPrice> 
<CreatedDate>2017-06-13T09:42:52.1411148Z</CreatedDate> 
<Status>0</Status> 
<ShippingCost>0</ShippingCost> 
<TotalTax>3.3</TotalTax> 
<AppliedTaxes> 
<LineTax> 
<TaxId>0</TaxId> 
<Amount>0</Amount> 
<CreatedDate>0001-01-01T00:00:00</CreatedDate> 
</LineTax> 
</AppliedTaxes> 
<AppliedDiscounts/> 
<ItemCondition>SellableAsNew</ItemCondition> 
<ReturnReason>PoorQuality</ReturnReason> 
</TransactionLine> 
</itemlist> 
<transactiontenders>1</transactiontenders> 
<transactiontenders>2</transactiontenders> 
<transactiontenders>4</transactiontenders> 
<transactiontype>1</transactiontype> 
<transdate>2017-06-13T09:52:54Z</transdate> 
<transtime>09:52</transtime> 
</Item> 
<Item> 
<total>343.59</total> 
<terminalid>1</terminalid> 
<subtotal>340.29</subtotal> 
<storeid>1000</storeid> 
<itemlist> 
<TransactionLine> 
<LineNumber>1</LineNumber> 
<Name>Moto G Turbo Edition Black</Name> 
<ItemUPC>5479892348535</ItemUPC> 
<Quantity>1</Quantity> 
<SalePrice>330</SalePrice> 
<IndividualPrice>330</IndividualPrice> 
<CreatedDate>2017-06-13T09:53:00.8548823Z</CreatedDate> 
<Status>0</Status> 
<ShippingCost>0</ShippingCost> 
<TotalTax>3.3</TotalTax> 
<AppliedTaxes> 
<LineTax> 
<TaxId>0</TaxId> 
<Amount>0</Amount> 
<CreatedDate>0001-01-01T00:00:00</CreatedDate> 
</LineTax> 
</AppliedTaxes> 
<AppliedDiscounts/> 
<ItemCondition>SellableAsNew</ItemCondition> 
<ReturnReason>PoorQuality</ReturnReason> 
</TransactionLine> 
<TransactionLine> 
<LineNumber>2</LineNumber> 
<Name>This Was A Man</Name> 
<ItemUPC>777221028297</ItemUPC> 
<Quantity>1</Quantity> 
<SalePrice>4.99</SalePrice> 
<IndividualPrice>4.99</IndividualPrice> 
<CreatedDate>2017-06-13T09:53:07.8263895Z</CreatedDate> 
<Status>0</Status> 
<ShippingCost>0</ShippingCost> 
<TotalTax>0</TotalTax> 
<AppliedTaxes/> 
<AppliedDiscounts/> 
<ItemCondition>SellableAsNew</ItemCondition> 
<ReturnReason>PoorQuality</ReturnReason> 
</TransactionLine> 
<TransactionLine> 
<LineNumber>3</LineNumber> 
<Name>A Prisoner of Birth</Name> 
<ItemUPC>4000111222302</ItemUPC> 
<Quantity>1</Quantity> 
<SalePrice>5.3</SalePrice> 
<IndividualPrice>5.3</IndividualPrice> 
<CreatedDate>2017-06-13T09:53:11.124866Z</CreatedDate> 
<Status>0</Status> 
<ShippingCost>0</ShippingCost> 
<TotalTax>0</TotalTax> 
<AppliedTaxes/> 
<AppliedDiscounts/> 
<ItemCondition>SellableAsNew</ItemCondition> 
<ReturnReason>PoorQuality</ReturnReason> 
</TransactionLine> 
</itemlist> 
<transactiontenders>1</transactiontenders> 
<transactiontenders>2</transactiontenders> 
<transactiontype>1</transactiontype> 
<transdate>2017-06-13T09:53:29Z</transdate> 
<transtime>09:53</transtime> 
</Item> 
</ArrayOfItem>' 

然后

doc <- xmlParse(order.xml.string, asText = TRUE) 
y <- 
    xmlToDataFrame(nodes = getNodeSet(doc, "//TransactionLine"), 
       stringsAsFactors = FALSE) 
nrow(y) # To get total number of order 

numeric.cols <- c("Quantity", 
        "SalePrice", 
        "IndividualPrice", 
        "ShippingCost", 
        "TotalTax") 

y[, numeric.cols] <- 
    lapply(y[, numeric.cols], as.numeric) 

colSums(y[(y$ItemCondition == "SellableAsNew" & 
      y$ReturnReason == "PoorQuality"), numeric.cols]) 

Quantity  SalePrice IndividualPrice ShippingCost  TotalTax 
    4.00   670.29   670.29   0.00   6.60 

xmlToList方法:

我爱dataframes不亚于任何人,但我不经常发现xmlToDataFrame是一个很好的解决方案。我不认为这个XML内容现在真的具有严格的矩形形状。例如,即使在TransactionLine路径中,它看起来像税和回扣路径是嵌套的(而不是平坦的)。即使当前的格式适合于数据帧转换,它可能会在将来发生变化,然后您需要从数据帧单元中解析出数据结构。可以考虑xmlToList而不是?或者甚至将数据保留为XML并将XPath表达式中的所有逻辑应用于xmlApply函数。

order.xml <- 
    xmlTreeParse(order.xml.string, 
       asText = TRUE, 
       useInternalNodes = TRUE) 
orders <- xmlRoot(order.xml) 
y <- xmlToList(orders) 

my.totals <- sapply(y, function(one.item) { 
    return(as.numeric(one.item$total)) 
}) 

total.total <- sum(my.totals) 
print(total.total) 

[1] 676.89 
+0

谢谢,这是一天的在线订单XML。在购物网站上,一天内会有多笔订单。每个订单由标记表示。在一个订单中,顾客可能已经购买了许多项目,每个项目由“”表示。每个交易行都会有数量,状态(销售或退货),价格。如果我可以将所有交易行的列表转换为单个数据框,那么执行其他步骤会更容易。比如哪个项目是最购买的,完成多少销售或退货。我在上面的代码中看到一个元素被取出用于求和。 – user3327953

+0

谢谢,我会尝试使用数据框和列表。截至目前,在服务器端,我将XML转换成JSON。整个逻辑使用JavaScript通过每个订单分离必需的性质循环上的浏览器来完成。我很担心,如果响应变得太大,那么浏览器可能会崩溃。我的同事要求我尝试R编程。 – user3327953