使用动态编程将字符串拆分为一个有效字符串

我需要找到一个动态编程算法来解决这个问题。我尝试过但无法弄清楚。这里是问题：使用动态编程将字符串拆分为一个有效字符串

给你一串n个字符s [1 ... n]，你认为它是一个损坏的文本文件，其中所有的标点符号都已经消失了（所以它看起来像“itwasthebestoftimes ......“）。您希望使用字典来重建文档，该字典以布尔函数dict（*）的形式提供，以便对于任何字符串w，如果w是有效字，则dict（w）的值为1，并且值为0除此以外。

给出一个动态规划算法，确定字符串s [*]是否可以重组为一个有效词序列。运行时间应该至多为O（n^2），假设每次调用字典都需要单位时间。
如果字符串有效，使您的算法输出相应的单词序列。

来源

2011-03-15 Pet

那么，这是一本教科书的练习。我没有解决方案，我不知道如何解决这个问题。 – Pet

想到的第一件事 - 模棱两可。假设你的字典有单词“是”，“她”和“洗衣机”。不过，你可以选择最短的单词。等等，不......你可以从单词中剔除一部分，并渲染字符串无效 - 就像从'自动'中捕获'auto'一样。 – alxx

您是否尝试先搜索答案？关于这个问题已经很少有问题了 - http://stackoverflow.com/questions/4755157/split-string-into-words，http://stackoverflow.com/questions/3553958/tokenize-valid-words-from -a-long-string，http://stackoverflow.com/questions/3466972/how-to-split-a-string-into-words-ex-stringintowords-string-into-words。其中一些提到动态编程解决方案。 – hoha

字符串s []可能会被分成多种方式。下面的方法找到我们可以分割s []的最大单词数。下面是该算法

bestScore的草图/伪代码[I] - >存储在其第i个字符可以被分开单词的最大数量（这将是MINUS_INFINITY否则）

for (i = 1 to n){ 
    bestScore[i] = MINUS_INFINITY 
    for (k = 1 to i-1){ 
     bestScore[i] = Max(bestSCore[i], bestScore[i-k]+ f(i,k)) 
    } 
}

其中f （I，K）被定义为：

f(i,k) = 1 : if s[i-k+1 to i] is in dictionary 
     = MINUS_INFINITY : otherwise

bestScore [n]的将存储字的最大数量，其中S []可以被分割（如果该值MINUS_INFINIY，S []不能拆分）

显然，运行时间为O（n^2）

由于这看起来像一本教科书练习，我不会编写代码来重构实际的拆分位置。

来源

2011-03-15 11:35:30

让你的压实文档的长度为N.

令B（n）是一个布尔值：如果该文档可以被分成从位置n处的文档中开始的话。

b（N）为真（因为空字符串可以分成0个字）。给定b（N），b（N-1），... b（N-k），你可以通过考虑以字符N - k - 1开始的所有单词来构造b（N - k - 1）有b（N - k - 1 + len（w））集，然后将b（N - k - 1）设为真。如果没有这样的单词，则将b（N - k - 1）设置为false。

最终，您计算b（0），它告诉您是否可以将整个文档拆分为单词。

在伪代码：

def try_to_split(doc): 
    N = len(doc) 
    b = [False] * (N + 1) 
    b[N] = True 
    for i in range(N - 1, -1, -1): 
    for word starting at position i: 
     if b[i + len(word)]: 
     b[i] = True 
     break 
    return b

有一些技巧，你可以做的就是“字开始位置i”高效，但你要求的O（N^2）算法，所以你可以查找字典中从i开始的每个字符串。

要生成的话，就可以修改上述算法来存储好词语，或只是产生这样的：

def generate_words(doc, b, idx=0): 
    length = 1 
    while true: 
    assert b(idx) 
    if idx == len(doc): return 
    word = doc[idx: idx + length] 
    if word in dictionary and b(idx + length): 
     output(word) 
     idx += length 
     length = 1

这里b是该算法的第一部分产生的布尔数组。

来源

2011-03-15 11:39:37

如果考虑以字符N - k - 1开始的所有单词，这样做效率不高吗？如果存在i <= j

0123pDp很明显，但如果你知道字典的话，我想你可以使用一些预计算来让它在O(N)中更快。 Aho-Corasick

来源

2011-03-15 11:56:06 mariusgherman

C++中的DP解决方案：

int main() 
{ 
    set<string> dict; 
    dict.insert("12"); 
    dict.insert("123"); 
    dict.insert("234"); 
    dict.insert("12345"); 
    dict.insert("456"); 
    dict.insert("1234"); 
    dict.insert("567"); 
    dict.insert("123342"); 
    dict.insert("42"); 
    dict.insert("245436564"); 
    dict.insert("12334"); 

    string str = "123456712334245436564"; 

    int size = str.size(); 
    vector<int> dp(size+1, -1); 
    dp[0] = 0; 
    vector<string > res(size+1); 
    for(int i = 0; i < size; ++i) 
    { 
     if(dp[i] != -1) 
     { 
      for(int j = i+1; j <= size; ++j) 
      { 
       const int len = j-i; 
       string substr = str.substr(i, len); 
       if(dict.find(substr) != dict.end()) 
       { 
        string space = i?" ":""; 
        res[i+len] = res[i] + space + substr; 
        dp[i+len] = dp[i]+1; 
       } 
      } 
     } 
    } 
    cout << *dp.rbegin() << endl; 
    cout << *res.rbegin() << endl; 

    return 0; 
}

来源

2011-03-15 12:27:22 hidayat

你为什么不给出描述你做了什么并解释你为什么这样做？ –

@tobias能否请你解释一下它的算法 – EmptyData

只是一些随机代码不能帮助任何人。你应该提供一个解释。 – adijo

为了形式化什么@MinhPham建议。

这是一个动态编程解决方案。

给定一个字符串str，让

B [I] =真，如果子串STR [0 ... I]（含）可以分成有效单词。

将一些起始字符添加到str，例如！，以表示空白字。 str =“！” + STR

基础案例是空字符串，所以

B [0] =真。

对于迭代情况：

B [j]的=真，如果B [I] == TRUE和STR [i..j]是一个字对于所有的i <Ĵ

来源

2013-07-03 19:12:17 mingxiao

以下是针对此问题的O（n^2）解决方案。

void findstringvalid() { 
string s = "itwasthebestoftimes"; 
set<string> dict; 
dict.insert("it"); 
dict.insert("was"); 
dict.insert("the"); 
dict.insert("best"); 
dict.insert("of"); 
dict.insert("times"); 

vector<bool> b(s.size() + 1, false); 
vector<int> spacepos(s.size(), -1); 
//Initialization phase 
b[0] = true; //String of size 0 is always a valid string 
for (int i = 1; i <= s.size(); i++) { 
    for (int j = 0; j <i; j++) { 
     //string of size s[ j... i] 
     if (!b[i]) { 
      if (b[j]) { 
       //check if string "j to i" is in dictionary 
       string temp = s.substr(j, i - j); 
       set<string>::iterator it = dict.find(temp); 
       if (it != dict.end()) { 
        b[i] = true; 
        spacepos[i-1] = j; 
       } 
      } 
     } 
    } 
} 
if(b[s.size()]) 
    for (int i = 1; i < spacepos.size(); i++) { 
     if (spacepos[i] != -1) { 
      string temp = s.substr(spacepos[i], i - spacepos[i] + 1); 
      cout << temp << " "; 
    } 
    } 
}

来源

2016-09-27 23:35:07

使用动态编程将字符串拆分为一个有效字符串

回答

相关问题