2011-08-25 40 views
0

现在我的工作我的AppleScript和我被困在这里..让我们这个片段作为HTML代码的AppleScript:子字符串或格式HTML

<body><div>Apple don't behave accordingly <a href = "http://apple.com>apple</a></div></body> 

一个例子,我需要的是现在返回没有html标签的单词。或者通过与它的一切或删除托架也许有任何其他方式重新格式化HTML成纯文本..

结果应该是:

苹果不规矩因此苹果

回答

0

如何使用textutil

on run -- example (don't forget to escape quotes) 
    removeMarkup from "<body><div>Apple don't behave accordingly <a href = \"http://apple.com\">apple</a></div></body>" 
end run 

to removeMarkup from someText -- strip HTML using textutil 
    set someText to quoted form of ("<!DOCTYPE HTML PUBLIC>" & someText) -- fake a HTML document header 
    return (do shell script "echo " & someText & " | /usr/bin/textutil -stdin -convert txt -stdout") -- strip HTML 
end removeMarkup 
+0

工程就像一个魅力..谢谢.. – sicKo

0
on findStrings(these_strings, search_string) 
    set the foundList to {} 
    repeat with this_string in these_strings 
     considering case 
      if the search_string contains this_string then set the end of the foundList to this_string 
     end considering 
    end repeat 
    return the foundList 
end findStrings 

findStrings({"List","Of","Strings","To","find..."}, "...in String to search") 
+0

我不想搜索字符串..我试图从html代码中删除html标签..代码会每次都不一样.. – sicKo

1

以为我会添加一个额外的答案,因为我有问题。如果你想UTF-8字符不会迷路,你需要:

set plain_text to do shell script "echo " & quoted form of ("<!DOCTYPE HTML PUBLIC><meta charset=\"UTF-8\">" & html_string) & space & "| textutil -convert txt -stdin -stdout" 

你基本上需要添加<meta charset=\"UTF-8\"> meta标签,以确保textutil认为这是UTF-8编码的文件。