从多个字符串

我删除重复的文字：从多个字符串

a = "This is Product A with property B and propery C. Buy it now!" 
b = "This is Product B with property X and propery Y. Buy it now!" 
c = "This is Product C having no properties. Buy it now!"

我正在寻找一种算法，可以这样做：

> magic(a, b, c) 
=> ['A with property B and propery C', 
    'B with property X and propery Y', 
    'C having no properties']

我必须找到在1000+文本重复。超级表演不是必须的，但会很好。

- 更新

我正在寻找单词序列。所以，如果：

d = 'This is Product D with text engraving: "Buy". Buy it now!'

第一个“卖”不应该重复。我猜测我必须使用n之后的字眼，以便看作是重复的。

来源

2013-08-24 Willian

问题不明确？如何定义重复的文本？ –

为什么“有财产”在重复时不重复？：D – fl00r

1）如果有第四个字符串“Bumblebee zebra”。 '魔术（a，b，c，d）'会被期望返回所有四个未修改的字符串？ 2）预期如何使用位置信息，例如“魔术师”示例删除了“立即购买！”尽管事实上这是字符串的不同部分。可能你正在寻找一个'diff'函数？ –

def common_prefix_length(*args) 
    first = args.shift 
    (0..first.size).find_index { |i| args.any? { |a| a[i] != first[i] } } 
end 

def magic(*args) 
    i = common_prefix_length(*args) 
    args = args.map { |a| a[i..-1].reverse } 
    i = common_prefix_length(*args) 
    args.map { |a| a[i..-1].reverse } 
end

a = "This is Product A with property B and propery C. Buy it now!" 
b = "This is Product B with property X and propery Y. Buy it now!" 
c = "This is Product C having no properties. Buy it now!" 

magic(a,b,c) 
# => ["A with property B and propery C", 
#  "B with property X and propery Y", 
#  "C having no properties"]

来源

2013-08-24 11:05:24 falsetru

我喜欢你的解决方案看序列而不是单个单词！ – Willian

你的数据

sentences = [ 
    "This is Product A with property B and propery C. Buy it now!", 
    "This is Product B with property X and propery Y. Buy it now!", 
    "This is Product C having no properties. Buy it now!" 
]

你的魔法

def magic(data) 
    prefix, postfix = 0, -1 
    data.map{ |d| d[prefix] }.uniq.compact.size == 1 && prefix += 1 or break while true 
    data.map{ |d| d[postfix] }.uniq.compact.size == 1 && prefix > -postfix && postfix -= 1 or break while true 
    data.map{ |d| d[prefix..postfix] } 
end

你的输出

magic(sentences) 
#=> [ 
#=> "A with property B and propery C", 
#=> "B with property X and propery Y", 
#=> "C having no properties" 
#=> ]

或者你可以使用loop代替while true

def magic(data) 
    prefix, postfix = 0, -1 
    loop{ data.map{ |d| d[prefix] }.uniq.compact.size == 1 && prefix += 1 or break } 
    loop{ data.map{ |d| d[postfix] }.uniq.compact.size == 1 && prefix > -postfix && postfix -= 1 or break } 
    data.map{ |d| d[prefix..postfix] } 
end

来源

2013-08-24 12:07:57 fl00r

当'data'碰巧是一串相同的字符串时，你的'magic'不会终止。你必须检查'prefix'和'postfix'索引，这个位置的'd'中的字符存在。 – sawa

好抓，@sawa！固定 – fl00r

-1

编辑：此代码有错误。只是留下我的回答供参考，因为如果人们在被降低评分后删除答案，我不喜欢它。每个人都会犯错误:-)

我喜欢@filttru的方法，但觉得代码不必要的复杂。这里是我的尝试：

def common_prefix_length(strings) 
    i = 0 
    i += 1 while strings.map{|s| s[i] }.uniq.size == 1 
    i 
end 

def common_suffix_length(strings) 
    common_prefix_length(strings.map(&:reverse)) 
end 

def uncommon_infixes(strings) 
    pl = common_prefix_length(strings) 
    sl = common_suffix_length(strings) 
    strings.map{|s| s[pl...-sl] } 
end

由于OP可关注业绩，我做了一个快速基准：

require 'fruity' 
require 'securerandom' 

prefix = 'PREFIX ' 
suffix = ' SUFFIX' 
test_data = Array.new(1000) do 
    prefix + SecureRandom.hex + suffix 
end 

def fl00r_meth(data) 
    prefix, postfix = 0, -1 
    data.map{ |d| d[prefix] }.uniq.size == 1 && prefix += 1 or break while true 
    data.map{ |d| d[postfix] }.uniq.size == 1 && postfix -= 1 or break while true 
    data.map{ |d| d[prefix..postfix] } 
end 

def falsetru_common_prefix_length(*args) 
    first = args.shift 
    (0..first.size).find_index { |i| args.any? { |a| a[i] != first[i] } } 
end 

def falsetru_meth(*args) 
    i = falsetru_common_prefix_length(*args) 
    args = args.map { |a| a[i..-1].reverse } 
    i = falsetru_common_prefix_length(*args) 
    args.map { |a| a[i..-1].reverse } 
end 

def padde_common_prefix_length(strings) 
    i = 0 
    i += 1 while strings.map{|s| s[i] }.uniq.size == 1 
    i 
end 

def padde_common_suffix_length(strings) 
    padde_common_prefix_length(strings.map(&:reverse)) 
end 

def padde_meth(strings) 
    pl = padde_common_prefix_length(strings) 
    sl = padde_common_suffix_length(strings) 
    strings.map{|s| s[pl...-sl] } 
end 

compare do 
    fl00r do 
    fl00r_meth(test_data.dup) 
    end 

    falsetru do 
    falsetru_meth(*test_data.dup) 
    end 

    padde do 
    padde_meth(test_data.dup) 
    end 
end

这些结果如下：

Running each test once. Test will take about 1 second. 
fl00r is similar to padde 
padde is faster than falsetru by 30.000000000000004% ± 10.0%

来源

2013-08-24 14:56:14

愿意解雇他的反对者吗？ –

当数据碰巧是一个相同字符串的数组时，您的代码将不会终止。你必须检查'i'索引，该位置字符串中的字符存在。 – sawa

您的代码与我的第一版答案类似。我改为当前版本，因为我认为创建/删除中间数组（'map {..} .uniq.size'）可能会导致性能下降。根据你的基准，我错了。 ;） – falsetru

从多个字符串

回答

相关问题