2012-01-06 35 views
0

所以,我的代码只是使用HTML标签创建一个字符串的内联差异(在每个单词的基础上),因此CSS可以隐藏/显示被删除/添加。 在我的测试中,我使用()添加和{}删除。虽然做了一些字符串maniplation,我来了一些奇怪的编码

这里是我的文字 输入:

"e&nbsp;<b><u>Zerg</u></b>&nbsp;a" 
"e Zerg a" 

输出:

"e(?)(\240){&nbsp;<b>}{<u>}Zerg(?)(\240){</u>}{</b>}{&nbsp;}a" 

现在,我不都改变编码做任何事情,所以...我真的很困惑至于一个问号和\ 240如何到达那里。 o.o

这是什么样的编码?

我使用Ruby 1.8.7

发现的问题根源。它发生在我的新字符串转换为DIFF :: LCS阵列来使用:

该代码:

def self.convert_html_string_to_html_array(str) 
=begin 
    Things like &nbsp (and other char codes), and tags need to be considered the same element 
    also handles the decision to diff per char or per word 

    also need to take into consideration javascript and css that might be in the middle of a selection 
=end 
    result = Array.new 
    compare_words = str.has_at_least_one_word? 
    i = 0 
    while i < str.length do 
     cur_char = str[i..i] 
     case cur_char 
     when "&" 
     # for this we have two situations, a stray char code, and a char code preceeding a tag 
     next_index = str.index(";", i) 
     case str[next_index + 1..next_index + 1] # check to see if tag 
     when "<" 
      next_index = str.index(">", i) 
     end 
     result << str[i..next_index] 
     i = next_index 
     when "<" 
     next_index = str.index(">", i) 
     result << str[i..next_index] 
     i = next_index 
     when " " 
     result << cur_char 
     else 
     if compare_words 
      # in here we need to check the above rules again, cause tags can be touching regular text 
      next_index = i + 1 
      next_index = str.index(" ", next_index) 
      next_index = str.length if next_index.nil? 
      next_index -= 1 

      if i < str.length and str[i..next_index].include?("<") # beginning of a tag 
      next_index = str.index(">", i) 
      end 

      result << str[i..next_index] # don't want to include the space 
      i = next_index 
     else 
      result << cur_char 
     end 
     end 
     i += 1 
    end 

    return result # removes the trailing empty string 
    end 

澄清,这一点:

'e Zerg a' 

被变成这样的:

[ 
    [0] "e", 
    [1] "\302", 
    [2] "\240", 
    [3] "Z", 
    [4] "e", 
    [5] "r", 
    [6] "g", 
    [7] "\302", 
    [8] "\240", 
    [9] "a" 
] 

回答

0

更新到1.9.2或以上(我建议使用RVM),1.8.7有一些奇怪的东西用绳子怎么回事...

+0

lol http://stackoverflow.com/questions/8761092/trying-to-upgrade-from-from-ruby-1-8-7-to-1-9-2-while-still-using-rails- 2-3-8 workin on it = p – NullVoxPopuli 2012-01-06 17:18:23

+0

我只是假设1.9.2解决了这个问题,因为这是特定于超过8位的unicode字符。 – NullVoxPopuli 2012-01-06 17:45:15