用大写字母分词

我有从页面中提取文本的问题。有时候，我有这样的话：用大写字母分词

EatAppleGood

，但我想

Eat Apple Good

我获得了三个词在一起。我怎样才能用大写字母分开单词？

来源

2013-10-04 Joachim Low

有一拍：使用这个表达式http://jason.diamond.name/weblog/2009/08/15/split-camelcase-with-regular-expressions /并将其转换为序言 – Najzero

如果您使用的是原子vs字符串（即char代码列表），那么这些代码是相当不同的，因为这些代表是关于真正不同的数据类型的。

总之，使输入的副本

当前字，初始化为空
累加器保持的话见过这么远

然后决定如何处理空白，等...

为了保持简单，让我们看看最习惯的方式：字符列表

% words_to_lowercase(String, Word, WordsSeen, Result) 
% 
words_to_lowercase([C|Cs], WordR, Words, Result) :- 
    ( code_type(C, upper(L)) 
    -> reverse(WordR, Word), 
     WordsUpdated = [Word|Words], 
     Updated = [L] 
    ; Updated = [C|WordR], 
     WordsUpdated = Words 
    ), 
    words_to_lowercase(Cs, Updated, WordsUpdated, Result). 

words_to_lowercase([], W, Seen, Result) :- 
    reverse([W|Seen], Result).

能产生

?- words_to_lowercase("EatAppleGood",[],[],R), maplist(atom_codes,L,R). 
R = [[], [101, 97, 116], [97, 112, 112, 108, 101], [100, 111, 111, 103]], 
L = ['', eat, apple, doog].

你可以在开始摆脱空字（例如）将在基本情况下，模式匹配：

words_to_lowercase([], W, Seen, Result) :- 
    reverse([W|Seen], [[]|Result]).

编辑：哎呀，我忘了扭转最后一个字...

words_to_lowercase([], W, Seen, Result) :- 
    reverse(W, R), 
    reverse([R|Seen], [[]|Result]).

编辑关于正则表达式的建议，你从Najzero得到了评论，你可以很好地利用最近发布的regex包。先从

?- pack_install(regex).

然后

?- [library(regex)]. 
?- regex('([A-Z][a-z]+)+', [], 'EatAppleGood', L),maplist(atom_codes,A,L). 
L = [[69, 97, 116], [65, 112, 112, 108, 101], [71, 111, 111, 100]], 
A = ['Eat', 'Apple', 'Good'].

，因为我们有准备downcase_atom，我们可以做

?- regex('([A-Z][a-z]+)+', [], 'EatAppleGood', L),maplist(atom_codes,A,L),maplist(downcase_atom,A,D). 
L = [[69, 97, 116], [65, 112, 112, 108, 101], [71, 111, 111, 100]], 
A = ['Eat', 'Apple', 'Good'], 
D = [eat, apple, good].

来源

2013-10-04 08:34:30 CapelliC

用大写字母分词

回答

相关问题