如何摆脱标点符号？并检查拼写错误

消除标点符号
话结识新线和空间分割时，然后存储在阵列
检查文本文件有错误或不符合checkSpelling.m的函数文件
总和向上的误差该文章中的总数假定
没有建议是没有错误，则返回-1
误差的总和> 20，返回1
总和误差< = 20，返回的-1

我想检查某个段落的拼写错误，我面临的问题摆脱了标点符号。它可能有问题的其他原因，我返回如下错误：如何摆脱标点符号？并检查拼写错误

enter image description here

我DATA2文件是：

enter image description here

checkSpelling.m

function suggestion = checkSpelling(word) 

h = actxserver('word.application'); 
h.Document.Add; 
correct = h.CheckSpelling(word); 
if correct 
    suggestion = []; %return empty if spelled correctly 
else 
    %If incorrect and there are suggestions, return them in a cell array 
    if h.GetSpellingSuggestions(word).count > 0 
     count = h.GetSpellingSuggestions(word).count; 
     for i = 1:count 
      suggestion{i} = h.GetSpellingSuggestions(word).Item(i).get('name'); 
     end 
    else 
     %If incorrect but there are no suggestions, return this: 
     suggestion = 'no suggestion'; 
    end 

end 
%Quit Word to release the server 
h.Quit

f19.m

for i = 1:1 

data2=fopen(strcat('DATA\PRE-PROCESS_DATA\F19\',int2str(i),'.txt'),'r') 
CharData = fread(data2, '*char')'; %read text file and store data in CharData 
fclose(data2); 

word_punctuation=regexprep(CharData,'[`[email protected]#$%^&*()-_=+[{]}\|;:\''<,>.?/','') 

word_newLine = regexp(word_punctuation, '\n', 'split') 

word = regexp(word_newLine, ' ', 'split') 

[sizeData b] = size(word) 

suggestion = cellfun(@checkSpelling, word, 'UniformOutput', 0) 

A19(i)=sum(~cellfun(@isempty,suggestion)) 

feature19(A19(i)>=20)=1 
feature19(A19(i)<20)=-1 
end

来源

2014-05-06 user3340270

替换您的regexprep呼叫

word_punctuation=regexprep(CharData,'\W','\n');

这里\W找到的所有非字母数字字符（inclulding空格）获得与新行取代。

然后

word = regexp(word_punctuation, '\n', 'split');

正如你可以看到你不需要的空间分割（见上文）。但你可以删除空单元格：

word(cellfun(@isempty,word)) = [];

一切都为我工作。不过，我不得不说，你checkSpelling函数非常慢。在每次调用时，都必须创建一个ActiveX服务器对象，添加新文档，并在检查完成后删除该对象。考虑重写函数以接受字符串的单元数组。

UPDATE

我看到的唯一的问题是消除报价'字符（我，不这样做，等）。你可以用下划线（是的，它被认为是字母数字）或任何未使用的字符序列临时替换它们。或者，您可以使用所有非字母数字字符的列表在方括号中删除而不是\W。

UPDATE 2

另一种解决方案的第一更新：

word_punctuation=regexprep(CharData,'[^A-Za-z0-9''_]','\n');

来源

2014-05-07 20:32:41 yuk

如何摆脱标点符号？并检查拼写错误

回答

相关问题