2011-06-24 31 views
4

我是clojure的新手,请耐心等待。我有一个看起来像这样的XML解析clojure中的XML

<?xml version="1.0" encoding="UTF-8"?> 
<XVar Id="cdx9" Type="Dictionary"> 
    <XVar Id="Base.AccruedPremium" Type="Multi" Value="" Rows="1" Columns="1"> 
    <Row Id="0"> 
     <Col Id="0" Type="Num" Value="0"/> 
    </Row> 
    </XVar> 
    <XVar Id="TrancheAnalysis.IndexDuration" Type="Multi" Value="" Rows="1" Columns="1"> 
    <Row Id="0"> 
     <Col Id="0" Type="Num" Value="3.4380728252313069"/> 
    </Row> 
    </XVar> 
    <XVar Id="TrancheAnalysis.IndexLevel01" Type="Multi" Value="" Rows="1" Columns="1"> 
    <Row Id="0"> 
     <Col Id="0" Type="Num" Value="30693.926279941188"/> 
    </Row> 
    </XVar> 
    <XVar Id="TrancheAnalysis.TrancheDelta" Type="Multi" Value="" Rows="1" Columns="1"> 
    <Row Id="0"> 
     <Col Id="0" Type="Num" Value="8.9304387917502073"/> 
    </Row> 
    </XVar> 
    <XVar Id="TrancheAnalysis.TrancheDuration" Type="Multi" Value="" Rows="1" Columns="1"> 
    <Row Id="0"> 
     <Col Id="0" Type="Num" Value="3.0775955481964035"/> 
    </Row> 
    </XVar> 
</XVar> 

然后它重复。由此我希望能够产生一个CSV文件,这些列

IndexName,TrancheAnalysis.IndexDuration,TrancheAnalysis.TrancheDuration 
cdx9,3.4380728252313069,3.0775955481964035 
......................................... 
......................................... 

我能够解析像

<?xml version="1.0" encoding="UTF-8"?> 
<CalibrationData> 
    <IndexList> 
    <Index> 
     <Calibrate>Y</Calibrate> 
     <UseClientIndexQuotes>Y</UseClientIndexQuotes> 
     <IndexName>HYCDX10</IndexName> 
     <Tenor>06/20/2013</Tenor> 
     <TenorName>3Y</TenorName> 
     <IndexLevels>219.6</IndexLevels> 
     <Tranche>Equity0To0.15</Tranche> 
     <TrancheStart>0</TrancheStart> 
     <TrancheEnd>0.15</TrancheEnd> 
     <UseBreakEvenSpread>1</UseBreakEvenSpread> 
     <UseTlet>0</UseTlet> 
     <IsTlet>0</IsTlet> 
     <PctExpectedLoss>0</PctExpectedLoss> 
     <UpfrontFee>52.125</UpfrontFee> 
     <RunningFee>0</RunningFee> 
     <DeltaFee>5.3</DeltaFee> 
     <CentralCorrelation>0.1</CentralCorrelation> 
     <Currency>USD</Currency> 
     <RescalingMethod>PTIndexRescaling</RescalingMethod> 
     <EffectiveDate>06/17/2011</EffectiveDate> 
    </Index> 
    </IndexList> 
</CalibrationData> 

一个简单的XML文件,此代码

(ns DynamicProgramming 
    (:require [clojure.xml :as xml])) 
;Get the Input Files 
(def calibrationFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/CalibrationQuotes.xml") 
(def mktdataFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/MarketData.xml") 
(def sample "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/Sample.xml") 

;Parse the Calibration Input File 
    (def CalibOp (for [x 
        (xml-seq 
        (xml/parse (java.io.File. calibrationFile))) 
      :when (or 
        (= :IndexName (:tag x)) 
        (= :Tenor (:tag x)) 
        (= :UpfrontFee (:tag x)) 
        (= :RunningFee (:tag x)) 
        (= :DeltaFee (:tag x)) 
        (= :IndexLevels (:tag x)) 
        (= :TrancheStart (:tag x)) 
        (= :TrancheEnd (:tag x)) 
       )] 
    (first(:content x)))) 
    (println CalibOp) 

但第二个XML很简单;另一方面,我不知道如何迭代第一个XML示例的嵌套结构并提取我想要的信息。

任何帮助将是伟大的。

回答

8

我会用data.zip(原名clojure.contrib.zip-filter)。它提供了很多xml解析能力,并且它很容易执行像表达式一样的xpath。 README将其描述为用于过滤树的系统,尤其是的XML树。

下面我有一些示例代码,用于为CSV文件创建“行”。该行是列名称到属性值的映射。

(ns work 
    (:require [clojure.xml :as xml] 
       [clojure.zip :as zip] 
       [clojure.contrib.zip-filter.xml :as zf])) 

; create a zip from the xml file 
(def zip (zip/xml-zip (xml/parse "data.xml"))) 

; pulls out a list of all of the root "Id" attribute values 
(zf/xml-> zip (zf/attr :Id)) 

(defn value [xvar-zip] 
    "Finds the id and value for a particular element" 
    (let [id (-> xvar-zip zip/node :attrs :Id) ; manual access 
     value (zf/xml1-> xvar-zip ; use xpath like expression to pull value out 
         :Row ; need the row element 
         :Col ; then the column element 
         (zf/attr :Value))] ; and finally pull the Value out 
    {id value})) 

; gets the "column-value" pair for a single column 
(zf/xml1-> zip 
      (zf/attr= :Id "cdx9") ; filter on id "cdx9" 
      :XVar ; filter on XVars under it 
      (zf/attr= :Id "TrancheAnalysis.IndexDuration") ; filter on id 
      value) ; apply the value function on the result of above 

; creates a map of every column key to it's corresponding value 
(apply merge (zf/xml-> zip (zf/attr= :Id "cdx9") :XVar value)) 

我不知道xml如何与多个Dictionary XVars一起工作,因为它是一个根元素。如果需要,对此类工作有用的其他功能之一是mapcat,其中cat是映射函数返回的所有值。

test source还有一些例子。

我的另一个重要推荐是确保你使用了很多小功能。你会发现更容易调试,测试和使用的东西。

+3

应该更新此答案以反映对data.zip的更改:https://github.com/clojure/data.zip/ – dgorissen