在Prolog中将字符串拆分为单词

我编写了以下代码以将CharList（整数列表）拆分为单词列表。我已经用跟踪运行它，它完成所有工作并将它全部放入列表中，但它不会将WordList变量绑定到值。使用Swipl在Prolog中将字符串拆分为单词

splitintowords(CharList, WordList):- splitintowords_(CharList, [], [], WordList). 
splitintowords_([],[], Res, Res). 
splitintowords_([],Word, WordList,_):- 
    \+ Word = [], 
    append([Word], WordList, NWordList), 
    splitintowords_([],[], NWordList,_). 
splitintowords_([H|T],Word, WordList,_):- 
    alphabetic(H), 
    append([H], Word, NWord), 
    splitintowords_(T,NWord, WordList,_). 
splitintowords_([H|T],Word, WordList,_):- 
    \+ alphabetic(H), 
    append([Word], WordList, NWordList), 
    splitintowords_(T, [], NWordList,_).

测试用例：

string_codes('hello there, how are you?',T), splitintowords(T, W).

这里有一个替代的实现，我写的第一个，这一个不工作，要么：

splitintowords(CharList, WordList):- 
    splitintowords_(CharList, [[]], WordList). 
splitintowords_([], WordList, WordList). 
splitintowords_([H|T], [Word|Words], _):- 
    alphabetic(H), 
    WordN = [H|Word], 
    splitintowords_(T, [WordN|Words], _). 
splitintowords_([H|T], Words):- 
    \+ alphabetic(H), 
    WordsN = [[]|Words], 
    splitintowords(T, WordsN).

这一个是简单，似乎跟踪做同样的事情。

来源

2016-10-07 ejbs

请提供谓词字母/ 1。 – coder

字母（N）： - char_type（N，alpha）。 – ejbs

这个例子中的词是什么？ “你好，那里，你怎么样，你呢？” –

对于你的第一个解决方案，问题是虽然你建立了正确的列表，但你在每个步骤中传递一个匿名变量'_'，它与任何事物相统一，而不是你想要返回的列表变量的名字结束。

我试图构建解决方案谓词is_alpha的/ 1代替字母，因为你的避风港;吨给出的代码，但它应该是相同的....有一些变化代码：

splitintowords(CharList, WordList):- splitintowords_(CharList, [], [], WordList). 

splitintowords_([],[], Res, Res). 
splitintowords_([],Word, WordList,L):- 
    \+ Word = [], 
    append([Word], WordList, NWordList), 
    splitintowords_([],[], NWordList,L). 
splitintowords_([H|T],Word, WordList,L):- 
    is_alpha(H), 
    append([H], Word, NWord), 
    splitintowords_(T,NWord, WordList,L). 
splitintowords_([H|T],Word, WordList,L):- 
    \+ is_alpha(H), 
    append([Word], WordList, NWordList), 
    splitintowords_(T, [], NWordList,L).

例如：

string_codes('hello there, how are you?',T), splitintowords(T, W). 
T = [104, 101, 108, 108, 111, 32, 116, 104, 101|...], 
W = [[117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104|...], [111, 108, 108|...]] ; 
false.

注意，因为该解决方案是非常大的它显示...，你可以通过按w看到完整的解决方案SWI-序言：

?- string_codes('hello there, how are you?',T), splitintowords(T, W). 
T = [104, 101, 108, 108, 111, 32, 116, 104, 101|...], 
W = [[117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104|...], [111, 108, 108|...]] [write] 
T = [104, 101, 108, 108, 111, 32, 116, 104, 101, 114, 101, 44, 32, 104, 111, 119, 32, 97, 114, 101, 32, 121, 111, 117, 63], 
W = [[117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104, 116], [111, 108, 108, 101, 104]] ; 
false.

（第二个T，W是第一个T，W的完整解）。

而且你第二个解决方案的一些类似的变化：

splitintowords(CharList, WordList):- 
    splitintowords_(CharList, [[]], WordList). 

splitintowords_([], WordList, WordList). 

splitintowords_([H|T], [Word|Words], L):- 
    is_alpha(H), 
    WordN = [H|Word], 
    splitintowords_(T, [WordN|Words], L). 

splitintowords_([H|T], Words,L):- 
    \+ is_alpha(H), 
    WordsN = [[]|Words], 
    splitintowords_(T, WordsN,L).

例子：

?- string_codes('hello there, how are you?',T), splitintowords(T, W). 
T = [104, 101, 108, 108, 111, 32, 116, 104, 101, 114, 101, 44, 32, 104, 111, 119, 32, 97, 114, 101, 32, 121, 111, 117, 63], 
W = [[], [117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104, 116], [111, 108, 108, 101, 104]] ; 
false.

需要注意的是在中间某个地方有一个空表比以前的解决方案的更多。如果你想摆脱空列表，这将是写一个更断言非常简单：

delete([],[]). 
delete([H|T],[H|T1]):-dif(H,[]),delete(T,T1). 
delete([[]|T],T1):-delete(T,T1).

例子：

?- delete([[], [117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104, 116], [111, 108, 108, 101, 104]],L). 
L = [[117, 111, 121], [101, 114, 97], [119, 111, 104], [101, 114, 101, 104, 116], [111, 108, 108, 101, 104]] ; 
false.

（它可以除去空列表）。

编辑

我也与你的谓语alphabetic/1进行了测试，并产生完全相同的输出与is_alpha/1。

来源

2016-10-07 21:09:56 coder

当然哦！实际上我不知道_是否有语义上的差异，我认为这只是一个和其他任何人一样的名字。也是的，我怀疑我会用我的算法得到一些空列表:)。 – ejbs

很高兴帮助，注意'_'只是一个匿名变量，你不想成为匿名的你想传递一个列表，在其他情况下，你没有递归'_'可以工作，通常是当你不关心谓词返回的参数时使用。 – coder

这里是我使用差异列表实现更好的速度。它的工作原理非常简单，从Chars中抽取一个头部，检查H是否可以与任何分割字符统一，如果是，它会创建一个新的单词并将其添加到Bcu中，如果否，它会将头部附加到Acu的末尾。

%split_words(+Chars, +Splits, -Words). 
% it needs list of Chars, list of Split characters, and returns list of 
% words. 
split_words(Chars, Splits, Words) :- 
    split_words(Chars, Splits, X - X, Y - Y, Words). 
split_words([], _, [], Bcu, Words) :- listify(Bcu, Words). 
split_words([], _, Acu, Bcu, Words) :- 
    listify(Acu, W), add(Bcu, W, New), listify(New, Words). 
split_words([H|C], Splits, Acu, Bcu, Words) :- 
    member(H, Splits), !, listify(Acu, W), add(Bcu, W, New), 
    split_words(C, Splits, X - X, New, Words). 
split_words([H|C], Splits, Acu, Bcu, Words) :- 
add(Acu, H, New), split_words(C, Splits, New, Bcu, Words). 

add(X - [I|Y], I, X - Y). 

listify(X - [], X).

来源

2016-10-07 21:44:40 Floctioncers

我一定是误解了这个问题。我将使用字符而不是代码，以便在顶层查看发生的事情。如果我理解的措辞，并尝试给它Prolog的，你正在尝试做的：

?- string_chars("hello there, how are you?", Chars), 
    chars_words(Chars, Words). 
Chars = [h, e, l, l, o, ' ', t, h, e|...], 
Words = [[h, e, l, l, o], [t, h, e, r, e], [h, o, w], [a, r, e], [y, o, u]].

在这里，你已经采取的连续alpha人物所有的运行，并把再一起的话;你已经放弃了它们之间的所有非alpha字符。

在这个假设下，这里是你如何来解决：

条主要非字母字符
的在前面连续的字母字符是一个字
丢弃后非字母字符和去2

在Prolog：

chars_words([], []). 
chars_words([C|Cs], Words) :- 
     strip_non_alpha([C|Cs], Rest), 
     chars_words_aux(Rest, Words). 

chars_words_aux([], []). 
chars_words_aux([C|Cs], [W|Ws]) :- 
     word_rest([C|Cs], W, Rest0), 
     strip_non_alpha(Rest0, Rest), 
     chars_words_aux(Rest, Ws).

在任何时候都没有必要使用append/3，因为这些单词并不真正改变从原始列表到单词列表的字符顺序。

我已经使用称为partition_sorted/4的谓词来定义word_rest/3和strip_non_alpha/2。它接受一个列表并将其拆分为两部分：谓词成功的前部和休息（其余部分的第一个元素是谓词失败的原始列表中的第一个元素）。

strip_non_alpha(List, Rest) :- 
     partition_sorted(not_alpha, List, _, Rest). 

word_rest(List, Word, Rest) :- 
     partition_sorted(alpha, List, Word, Rest). 

not_alpha(X) :- \+ char_type(X, alpha). 
alpha(X) :- char_type(X, alpha).

最后，partition_sorted/4天真的定义：

partition_sorted(Goal, List, True, False) :- 
     partition_sorted_aux(List, Goal, True, False). 

partition_sorted_aux([], _, [], []). 
partition_sorted_aux([X|Xs], Goal, True, False) :- 
     (  call(Goal, X) 
     ->  True = [X|True0], 
       partition_sorted_aux(Xs, Goal, True0, False) 
     ;  True = [], False = [X|Xs] 
     ).

这个定义是幼稚的，因为它只是正常工作，如果Goal成功或不会留下选择点失败一次，输入列表是地面。

来源

2016-10-08 07:13:21

嗨！感谢您花时间回答您的问题。你对我的问题的理解是正确的，你的实施很好。我选择不接受（但是'upvote'），因为我主要关心的是理解我在统一变量的过程中做了什么错误，而不是算法本身。 – ejbs

@ejbs我希望这确实有帮助。被接受的答案和我的答案在相同的输入上给出了不同的输出，而且你还没有在你的问题中写下你期望的输出。 –

我没有意识到它的确如此。这是否在某些边缘情况下发生？事实上，如果我有字符串“你好，这些？很多单词”我会期待[“你好”，“这些”，“是”，“许多”，“单词”]。只要它是我正在使用/修复的算法，就可以处理输出错误（回馈多余的空列表）。 – ejbs

在Prolog中将字符串拆分为单词

回答

相关问题