2014-12-02 17 views
0

假设我有两个字符串如何在Java中查找两个字符串之间的所有重叠短语?

  1. 我喜欢鸡肉沙拉,这是我最喜欢的食物。

  2. 这本书包含了制作各种食品,包括蛋糕,鸡肉沙拉食谱吨的等

这里的两个字符串之间的重叠短语 - 鸡肉,沙拉,鸡肉沙拉,餐饮。

找到两个字符串之间重叠短语的最佳方法是什么?假设两者都是清晰的语法和语义,并且第一个总是比第二个短。

+0

您可以创建较短的字符串中的单词的哈希值,然后检查第二的每一个字对第一,或只需插入每个单词都成为哈希,如果找到一个使用,以表明它重叠 – Abbath 2014-12-02 05:00:16

+0

我将标记短字符串并在长字符串中搜索。在附注中,如果在较长的字符串中找到较短字符串的任何可能的子字符串,则应考虑使用附加的停用词列表来忽略搜索常用字词,例如,to,at,it等, – 2014-12-02 05:01:09

+0

,生成很多令牌 – 2014-12-02 05:04:44

回答

0

我试过这种方法。似乎足以满足您对salad, chicken, chicken salad, food重叠短语的需求。

public static void main(String a[]) throws IOException{ 
    String firstSentence = "I like chicken salad, it's my favorite food"; 
    String secondSentence = "This book contains tons of recipes on making all sorts of food, including cakes, chicken salad, etc"; 
    String[] firstSentenceWords = firstSentence.replaceAll("[.,]", "").split(" "); 
    Set<String> overlappingPhrases = new HashSet<String>();  
    String lastPhrase = "";  
    for(String word : firstSentenceWords){ 
     if(lastPhrase.isEmpty()){ 
      lastPhrase = word; 
     }else{ 
      lastPhrase = lastPhrase + " " + word; 
     } 
     if(secondSentence.contains(word)){ 
      overlappingPhrases.add(word); 
      if(secondSentence.contains(lastPhrase)){ 
       overlappingPhrases.add(lastPhrase); 
      } 
     }else{ 
      lastPhrase = ""; 
     } 
    } 
    System.out.println(overlappingPhrases); 
} 

overlappingPhrases集包含[chicken salad, chicken, salad, food]

0

首先,我想你可以使用蛮力算法。您可以在绍尔串溅出的单词,你也洒在一长串这样的话:

String short_words[] = short_string.spilt(" "); 
String long_words[] = long_string.spilt(" "); 

接下来,您可以迭代器词语的short_words array.and检查每个字是否long_words阵列英寸但是复杂性如此糟糕以至于0(m * n)。 秒,我想你可以使用哈希函数来做到这一点。

+0

但是,蛮力算法不会返回我'鸡肉沙拉',而是'鸡肉','沙拉',... – bcbishop 2014-12-02 05:18:13

4

你可以尝试这样的事情:

**

List<String> al = new ArrayList<String>(); 
    String one = "I like chicken salad, it's my favorite food."; 
    String result = one.replaceAll("[.,]",""); 
    String[] tokens = result.split(" "); 
    String second = "This book contains tons of recipes on making all sorts of food, including cakes, chicken salad, etc."; 
    System.out.println(result); 
    for(int i=0;i<tokens.length;i++){ 
     if(second.indexOf(tokens[i])>=0){ 
      al.add(tokens[i]); 
     } 
    } 
    System.out.println(al); 
    } 

**

+0

这与@thinkinjava解释的算法是一样的,它不会工作,因为它可以'回到我'鸡肉沙拉'... – bcbishop 2014-12-02 05:32:27

+0

是的..你说得对。 – 2014-12-03 04:47:41

0

您的要求的一种方法:

public static void overlappingPhrases() { 
    List<String> list = new ArrayList<>(); 
    String string1 = "I like chicken salad, it's my favorite food."; 
    String string2 = "This book contains tons of recipes on making all sorts of food, including cakes, chicken salad, etc."; 
    String[] words = string1.replaceAll("[.,]","").split(" "); 
    System.out.println(string1+"\n"+string2); 
    for(int i=0;i<words.length;i++){ 
     if(string2.indexOf(words[i])>=0){ 
      list.add(words[i]);  
      int j=i; 
      String tmp=words[i]; 
      while(j+1<words.length){ 
       if(string2.indexOf(tmp + " " + words[++j])>=0) 
        tmp = tmp + " " + words[j]; 
       else { 
        if (!tmp.equals(words[i])) 
         list.add(tmp);       
        break; 
       } 
      }       
     }        
    } 
    System.out.println("Overlapping phrases: "+list); 
} 

输出:

[chicken, chicken salad, salad, food] 
+0

的想法,尽管O复杂度似乎不低。 – bcbishop 2014-12-03 15:57:57

相关问题