2010-05-06 39 views
2

我有这个字符串,有我想删除的非法字符,但我不知道可能存在什么样的字符。Applescript:清除字符串

我构建了一个我不希望被过滤的字符列表,并且我构建了该脚本(从我在网上找到的另一个字符串)。

on clean_string(TheString) 
    --Store the current TIDs. To be polite to other scripts. 
    set previousDelimiter to AppleScript's text item delimiters 
    set potentialName to TheString 
    set legalName to {} 
    set legalCharacters to {"a", "b", "c", "d", "e", "f", 
"g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", 
"s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", 
"F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", 
"S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5", 
"6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é", 
"É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ", 
"õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%", 
"/", "(", ")", "&", "€", "#", "@", "=", "*", "+", "-", ",", ".", 
"–", "_", " ", ":", ";", ASCII character 10, ASCII character 13} 

    --Whatever you want to eliminate. 
    --Now iterate through the characters checking them. 
    repeat with thisCharacter in the characters of potentialName 
     set thisCharacter to thisCharacter as text 
     if thisCharacter is in legalCharacters then 
      set the end of legalName to thisCharacter 
      log (legalName as string) 

     end if 
    end repeat 
    --Make sure that you set the TIDs before making the 
    --list of characters into a string. 
    set AppleScript's text item delimiters to "" 
    --Check the name's length. 
    if length of legalName is greater than 32 then 
     set legalName to items 1 thru 32 of legalName as text 
    else 
     set legalName to legalName as text 
    end if 
    --Restore the current TIDs. To be polite to other scripts. 
    set AppleScript's text item delimiters to previousDelimiter 
    return legalName 
end clean_string 

问题是,这个脚本是慢地狱,并给我超时。

我在做什么是检查字符的字符和比较legalCharacters列表。如果角色在那里,那很好。如果不是,请忽略。

有没有一种快速的方法来做到这一点?

“看TheString的每一个字符,并删除那些不legalCharacters”

感谢您的任何帮助。

回答

3

你遇到了什么非ascii字符?你的文件编码是什么?

使用shell脚本和tr,sed或perl来处理文本要高效得多。所有语言默认安装在OS X中。

您可以使用带tr的shell脚本(如下面的示例)剥离返回,并且还可以使用sed剥离空格(不在以下示例中):

set clean_text to do shell script "echo " & quoted form of the_string & "| tr -d '\\r\\n' " 

Technical Note TN2065: do shell script in AppleScript

或者,使用Perl,这将去除的非打印字符:

set x to quoted form of "Sample text. smdm#$%%&" 
set y to do shell script "echo " & x & " | perl -pe 's/[^[:alnum:]|[:space:]]//g'" 

搜索周围SO使用TR的其它实例中,sed和perl用Applescript处理文本。或搜索MacScripter/AppleScript | Forums

2

在Applescript中迭代总是很慢,并且确实没有更快的方法解决这些问题。登录循环是减慢速度的绝对保证。明智地使用log命令。

但是,在您的特定情况下,您有一个长度限制,并将长度检查移入重复循环中可能会大大缩短处理时间(只需不到一秒钟,即可在脚本调试器中运行,而不管文本的长度):

on clean_string(TheString) 
    set potentialName to TheString 
    set legalName to {} 
    set legalCharacters to {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5", "6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é", "É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ", "õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%", "/", "(", ")", "&", "€", "#", "@", "=", "*", "+", "-", ",", ".", "–", "_", " ", ":", ";", ASCII character 10, ASCII character 13} 
with timeout of 86400 seconds --86400 seconds = 24 hours 

    repeat with thisCharacter in the characters of potentialName 
     set thisCharacter to thisCharacter as text 
     if thisCharacter is in legalCharacters then 
     set the end of legalName to thisCharacter 
     if length of legalName is greater than 32 then 
     return legalName as text 
     end if 
     end if 
    end repeat 
end timeout 
    return legalName as text 
    end clean_string 
+0

谢谢,但这个循环给我这个错误结果: 错误“AppleEvent超时。”号码-1712 ...我想文本太长,applescript不愿意等它完成。 – SpaceDog 2010-05-06 22:00:02

+0

我已经给代码添加了一个超时模块,但是您不应该在这里获取(我相信默认超时时间为60秒)。我在这个页面的完整文本上运行代码没有任何问题。我认为你可能不得不把周围的调用包装到子程序的调用或堆栈中更高的地方。 – 2010-05-07 00:03:51

2

另一个shell脚本方法可能是:

set clean_text to do shell script "echo " & quoted form of the_string & "|sed \"s/[^[:alnum:][:space:]]//g\"" 
使用SED删除一切,不是一个字母数字字符,或空间

。更多正则表达式参考here

+0

这也是一个很好的字符串,用于处理文本。 – markratledge 2010-05-08 15:51:23

+0

短而甜美... + 1 – Marlon 2015-08-19 15:51:32

0

BBEdit或TextWrangler在这方面会快很多。下载TextWrangler(它是免费的),然后打开你的文件并运行Text - > Zap Gremlins ...就可以了。这是否做你需要的?如果是这样,用冷饮料庆祝。如果没有,尝试BBEdit(它不是免费的),并根据需要创建一个新的文本工厂,并提供尽可能多的“全部替换”条件,然后打开文件并运行文本工厂。