用不同的结构替换结构并保留一些值

我想将输入传输到bash中的输出。我试图使用sed，但它不起作用 - 我可能错了。到目前为止，我有这个（只是尝试，如果我能提取ID），但它不工作：用不同的结构替换结构并保留一些值

sed 's;id="([a-zA-Z:]+)";\\1;p' input

输入

<mediaobject> 
    <imageobject id="fig:deployment"> 
     <caption>Application deployment</caption> 
     <imagedata fileref="images/deployment.png" width="90%" /> 
    </imageobject> 
</mediaobject>

输出

<img src="images/deployment.png" width="90%" id="fig:deployment" title="Application deployment" />

来源

2012-09-07 user219882

AWK可几乎无处不安装了bash和可以避免一些可能会使用sed遇到的陷阱（例如，如果在XML属性并不一致排序）。

awk ' 
    ## set a variable to mark that we are in a mediaobject block 
    $1=="<mediaobject>" { object=1 } 

    ## mark that we have exited the object block 
    $1=="</mediaobject>" { object=0 } 

    ## if we are in an mediaobject block and we find an imageblock 
    $1=="<imageobject" && object==1 { 
     iobject=1       ## record that we are in an imageblock 
     id = substr($2, 5, length($2) - 6) ## this is unnecessary for output 
    } 

    ## if we have a line with image data 
    $1~/<imagedata/ && iobject==1 { 
     fileref=substr($2,9,length($2)-8) ## the path, including the quotations 
     width=$3       ## the width 
    } 

    ## if we have a caption line 
    $1~/<caption>/ && iobject==1 { 
     gsub("(</?caption>|^ *| *$)", "") ## remove xml and leading/trailing whitespace 
     caption=$0       ## record the modified line as the caption 
    } 

    ## when we arrive at the end of an imageblock 
    $1=="</imageobject>" && object==1 { 
     iobject=0               ## record it 
     printf("<img src=%s %s title=\"%s\" />\n", fileref, width, caption) ## print record 
    } 

' input

虽然正如我所说，此代码应工作得很好，不管属性是如何orded，它会失败，如果线路变更单上的属性（这不太可能）。如果遇到问题，你可以这样做：

## use match to find the beginning of the attribute 
## use a nested substr() to pull only the value of fileref (with quotations) 
fileref = substr(substr($0, match($0,/fileref=[a-z\/"]+/),RLENGTH),9))

来源

2012-09-07 18:11:48 worfly

很好，但它有一点瑕疵。结果是' Application deployment '。我怎样才能摆脱这些空间？ – user219882

请你能简单介绍一下代码吗？我从来没有见过这样的awk，所以我不明白它... – user219882

它应该像书面工作（所以也许剪切和粘贴错误）。具体来说，gsub行的正则表达式包含“^ *”和“* $”的匹配项，它们应该替换带有“”的那些（将其删除）。 – worfly

使用xsh：

open 1.xml ; 
rename img mediaobject ; 
mv img/imageobject/@id into img ; 
set img/@title img/imageobject/caption ; 
set img/@src img/imageobject/imagedata/@fileref ; 
mv img/imageobject/imagedata/@width into img ; 
rm (img/* | img/text()) ;

来源

2012-09-07 14:18:32 choroba

这将是不错，但我没有在服务器上XSH，它不能安装... – user219882

With sed：

sed -n '\!<mediaobject>!{ 
    n; 
    s/ *[^ ]* \(id="[^"]*"\).*/\1/; 
    h; n; 
    s/ *[^>]*>\([^<]*\).*/title="\1"/; 
    H; n; 
    s/ *<[^ ]* *fileref=\("[^"]*"\) *\(width="[^"]*"\).*/src=\1 \2/; 
    H; n; 
    x; 
    s/\n/ /g; 
    s/^/<img /; 
    s/$/ \/>/; 
    p 
}' input

来源

2012-09-07 14:32:30 perreal

用不同的结构替换结构并保留一些值

回答

相关问题