2012-07-18 22 views
10

搜索XML我有以下示例XML:Clojure中

<data> 
    <products> 
    <product> 
     <section>Red Section</section> 
     <images> 
     <image>img.jpg</image> 
     <image>img2.jpg</image> 
     </images> 
    </product> 
    <product> 
     <section>Blue Section</section> 
     <images> 
     <image>img.jpg</image> 
     <image>img3.jpg</image> 
     </images> 
    </product> 
    <product> 
     <section>Green Section</section> 
     <images> 
     <image>img.jpg</image> 
     <image>img2.jpg</image> 
     </images> 
    </product> 
    </products> 
</data> 

我知道如何分析它的Clojure

(require '[clojure.xml :as xml]) 
(def x (xml/parse 'location/of/that/xml')) 

这将返回描述XML

{:tag :data, 
:attrs nil, 
:content [ 
    {:tag :products, 
     :attrs nil, 
     :content [ 
      {:tag :product, 
      :attrs nil, 
      :content [] .. 
嵌套地图

这个结构当然可以用标准的Clojure函数遍历,但它可能会变得非常冗长,特别是如果比较t例如,用XPath查询它。是否有任何帮手来遍历和搜索这样的结构?我怎样才能,例如

  • 得到所有<product>
  • 列表只得到文本“img2.jpg”
  • 得到其section是产品的“红色款,其<images>标签包含一个<image>产品“

感谢

回答

9

运用data.zipZippers这里是你的第二个用例的解决方案:在许多情况下

(ns core 
    (:use clojure.data.zip.xml) 
    (:require [clojure.zip :as zip] 
      [clojure.xml :as xml])) 

(def data (zip/xml-zip (xml/parse PATH))) 
(def products (xml-> data :products :product)) 

(for [product products :let [image (xml-> product :images :image)] 
         :when (some (text= "img2.jpg") image)] 
    {:section (xml1-> product :section text) 
    :images (map text image)}) 
=> ({:section "Red Section", :images ("img.jpg" "img2.jpg")} 
    {:section "Green Section", :images ("img.jpg" "img2.jpg")}) 
0

随着clojures地图和矢量语义对于访问XML的语法足够的线程第一个宏。在许多情况下,您希望更特定于xml的某些内容(如xpath库),但在许多情况下,现有语言几乎与添加任何依赖关系一样简洁。

(pprint (-> (xml/parse "/tmp/xml") 
     :content first :content second :content first :content first)) 
"Blue Section" 
3

下面是使用data.zip的替代版本,用于所有三个用例。我发现xml->xml1->具有非常强大的内置导航功能,向量中具有子查询。

;; [org.clojure/data.zip "0.1.1"] 

(ns example.core 
    (:require 
    [clojure.zip :as zip] 
    [clojure.xml :as xml] 
    [clojure.data.zip.xml :refer [text xml-> xml1->]])) 

(def data (zip/xml-zip (xml/parse "/tmp/products.xml"))) 

(let [all-products (xml-> data :products :product) 
     red-section (xml1-> data :products :product [:section "Red Section"]) 
     img2 (xml-> data :products :product [:images [:image "img2.jpg"]])] 
    {:all-products (map (fn [product] (xml1-> product :section text)) all-products) 
    :red-section (xml1-> red-section :section text) 
    :img2 (map (fn [product] (xml1-> product :section text)) img2)}) 

=> {:all-products ("Red Section" "Blue Section" "Green Section"), 
    :red-section "Red Section", 
    :img2 ("Red Section" "Green Section")} 
+0

+1我知道你以后回答,但你有所有3个问题的唯一答案,你很好地分离导航和报告结果 – 2017-02-03 14:45:28

1

The Tupelo library可以很容易地解决类似这样的使用tupelo.forest树状数据结构的问题。请see this question for more information。 API文档can be found here

在这里,我们加载你的xml数据,并将其首先转化为有活力,然后使用tupelo.forest使用的本地树结构。利布斯&数据DEF:

(ns tst.tupelo.forest-examples 
    (:use tupelo.forest tupelo.test) 
    (:require 
    [clojure.data.xml :as dx] 
    [clojure.java.io :as io] 
    [clojure.set :as cs] 
    [net.cgrand.enlive-html :as en-html] 
    [schema.core :as s] 
    [tupelo.core :as t] 
    [tupelo.string :as ts])) 
(t/refer-tupelo) 

(def xml-str-prod "<data> 
        <products> 
         <product> 
         <section>Red Section</section> 
         <images> 
          <image>img.jpg</image> 
          <image>img2.jpg</image> 
         </images> 
         </product> 
         <product> 
         <section>Blue Section</section> 
         <images> 
          <image>img.jpg</image> 
          <image>img3.jpg</image> 
         </images> 
         </product> 
         <product> 
         <section>Green Section</section> 
         <images> 
          <image>img.jpg</image> 
          <image>img2.jpg</image> 
         </images> 
         </product> 
        </products> 
        </data> ") 

和初始化代码:

(dotest 
    (with-forest (new-forest) 
    (let [enlive-tree   (->> xml-str-prod 
           java.io.StringReader. 
           en-html/html-resource 
           first) 
      root-hid    (add-tree-enlive enlive-tree) 
      tree-1    (hid->hiccup root-hid) 

在HID后缀代表“十六进制ID”,它是作用就像一个指向节点/叶在树中唯一的十六进制值。在这个阶段,我们刚刚加载在林中的数据结构中的数据,创建树-1,它看起来像:

[:data 
[:tupelo.forest/raw "\n     "] 
[:products 
    [:tupelo.forest/raw "\n      "] 
    [:product 
    [:tupelo.forest/raw "\n      "] 
    [:section "Red Section"] 
    [:tupelo.forest/raw "\n      "] 
    [:images 
    [:tupelo.forest/raw "\n       "] 
    [:image "img.jpg"] 
    [:tupelo.forest/raw "\n       "] 
    [:image "img2.jpg"] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "] 
    [:product 
    [:tupelo.forest/raw "\n      "] 
    [:section "Blue Section"] 
    [:tupelo.forest/raw "\n      "] 
    [:images 
    [:tupelo.forest/raw "\n       "] 
    [:image "img.jpg"] 
    [:tupelo.forest/raw "\n       "] 
    [:image "img3.jpg"] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "] 
    [:product 
    [:tupelo.forest/raw "\n      "] 
    [:section "Green Section"] 
    [:tupelo.forest/raw "\n      "] 
    [:images 
    [:tupelo.forest/raw "\n       "] 
    [:image "img.jpg"] 
    [:tupelo.forest/raw "\n       "] 
    [:image "img2.jpg"] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n     "]] 
[:tupelo.forest/raw "\n     "]] 

接下来,我们删除所有空白字符串与此代码:

blank-leaf-hid?  (fn [hid] (and (leaf-hid? hid) ; ensure it is a leaf node 
           (let [value (hid->value hid)] 
             (and (string? value) 
             (or (zero? (count value)) ; empty string 
              (ts/whitespace? value)))))) ; all whitespace string 

blank-leaf-hids  (keep-if blank-leaf-hid? (all-hids)) 
>>     (apply remove-hid blank-leaf-hids) 
tree-2    (hid->hiccup root-hid) 

产生好得多的结果树(打嗝格式)

[:data 
[:products 
    [:product 
    [:section "Red Section"] 
    [:images [:image "img.jpg"] [:image "img2.jpg"]]] 
    [:product 
    [:section "Blue Section"] 
    [:images [:image "img.jpg"] [:image "img3.jpg"]]] 
    [:product 
    [:section "Green Section"] 
    [:images [:image "img.jpg"] [:image "img2.jpg"]]]]] 

下面的代码然后计算解答上述三个问题:

product-hids   (find-hids root-hid [:** :product]) 
product-trees-hiccup (mapv hid->hiccup product-hids) 

img2-paths   (find-paths-leaf root-hid [:data :products :product :images :image] "img2.jpg") 
img2-prod-paths  (mapv #(drop-last 2 %) img2-paths) 
img2-prod-hids  (mapv last img2-prod-paths) 
img2-trees-hiccup (mapv hid->hiccup img2-prod-hids) 

red-sect-paths  (find-paths-leaf root-hid [:data :products :product :section] "Red Section") 
red-prod-paths  (mapv #(drop-last 1 %) red-sect-paths) 
red-prod-hids  (mapv last red-prod-paths) 
red-trees-hiccup  (mapv hid->hiccup red-prod-hids)] 

带结果:

(is= product-trees-hiccup 
    [[:product 
    [:section "Red Section"] 
    [:images 
     [:image "img.jpg"] 
     [:image "img2.jpg"]]] 
    [:product 
    [:section "Blue Section"] 
    [:images 
     [:image "img.jpg"] 
     [:image "img3.jpg"]]] 
    [:product 
    [:section "Green Section"] 
    [:images 
     [:image "img.jpg"] 
     [:image "img2.jpg"]]]]) 

(is= img2-trees-hiccup 
    [[:product 
    [:section "Red Section"] 
    [:images 
    [:image "img.jpg"] 
    [:image "img2.jpg"]]] 
    [:product 
    [:section "Green Section"] 
    [:images 
    [:image "img.jpg"] 
    [:image "img2.jpg"]]]]) 

(is= red-trees-hiccup 
    [[:product 
    [:section "Red Section"] 
    [:images 
    [:image "img.jpg"] 
    [:image "img2.jpg"]]]])))) 

完整例子可以发现in the forest-examples unit test