2012-09-02 105 views
1

我是Elisp的新手,我需要将一段LaTeX代码转换为XML。将LaTeX转换为XML elisp

乳胶:

\tag[read=true]{Please help}\tag[notread=false]{Please help II} 

XML:

<tag read='true'> Please help </tag> 
<tag notread='false'> please help </tag> 

我写了一些正则表达式搜索和查找\tag但现在我需要以某种方式阅读readnotread并将它们分配为属性,然后读取其“=”后的值。 我已经试过正则表达式:

[..] (while (re-search-forward "\\\\\\<tag\\>\\[" nil t) [..] 
+1

请添加您尝试过的正则表达式。 – 2012-09-02 08:46:44

+0

@Tichodroma:添加它,你能帮忙吗? – Daniel

回答

1

这不是一个完整的解决方案,但希望演示了如何使用正则表达式的反向引用。

简言之,每次与该正则表达式\\(...\\)创建组被捕获,并且可以与(match-string N),其中N是顺序号的组中,从1开始为最左边的左括号回顾,并在继续,使得每个开口括号内的数字比前一个数字高。

(所以,如果你有交替,一些反向引用是不确定的。如果应用正则表达式"\\(foo\\)\\|\\(bar\\)"字符串"bar"(match-string 1)将是空的,并且(match-string 2)"bar"。)

(while 
    (re-search-forward 
    "\\\\\\<\\(tag\\)\\>\\[\\([^][=]*\\)=\\([^][]*\\)\\]{\\([^}]*\\)}" 
    nil t) 
    (insert (concat "<" (match-string 1) " " 
      (match-string 2) "='" (match-string 3) "'>" 
      (match-string 4) 
      "</" (match-string 1) ">\n"))) 

这当然正则表达式是个怪物;你可能想分解并记录它。

(defconst latex-to-xml-regex 
    (concat "\\\\"    ; literal backslash 
      "\\<"     ; word boundary (not really necessary) 
      "\\(tag\\)"   ; group 1: capture tag 
      "\\["     ; literal open square bracket 
      "\\("     ; group 2: attribute name 
      "[^][=]*"    ; attribute name regex 
      "\\)"     ; group 2 end 
      "="     ; literal 
      "\\("     ; group 3: attribute value 
      "[^][]*"    ; attribute value regex 
      "\\)"     ; group 3 end 
      "\\]"     ; literal close square bracket 
      "{"     ; begin text group 
      "\\("     ; group 4: text 
      "[^}]*"    ; text regex 
      "\\)"     ; group 4 end 
      "}"     ; end text group 
     ) "Regex for `latex-to-xml` (assuming your function is called that)")