2014-08-28 53 views
0

我想绘制文本文件的分发,但是我发现,我应该包括数字0-9和_ - 也给下面的代码MATLAB的性质分布

f = fopen('c:\nouns.txt'); 
ns = textscan(f, '%s'); 
fclose(f); 
%// Convert everything to chars 
letters_char = reshape(char(ns{:}),[],1); 

%// Get the case-insensitive count of each letter 
count_lettters = sum(bsxfun(@eq,letters_char,97:122),1) + ... 
    sum(bsxfun(@eq,letters_char,65:90),1) 

plot(count_lettters./sum(count_lettters)) 
bar(count_lettters./sum(count_lettters)) 
set(gca, 'XTickLabel',cellstr(char(97:122)'),'XTick',1:26) 

这将计算和绘制从az的字母分布 我想包括az和0-9和 - 和_ 任何建议?

+0

请提供最低工作例子,是对所遇到的问题更精确。 – fuesika 2014-08-28 20:06:04

+0

这就够了吗?或者您需要更多详细信息?> – user2085339 2014-08-28 20:13:03

+0

尝试运行只是您提供的部分..我想至少一个'@ eq'的定义仍然丢失。 – fuesika 2014-08-28 20:14:28

回答

2

代码

f = fopen(path_to_text_file); 
ns = textscan(f, '%s'); 
fclose(f); 

%// Convert everything to chars 
letters_char = reshape(char(ns{:}),[],1); 

%// Get the case-insensitive count of each letter 
count_lettters = sum(bsxfun(@eq,letters_char,97:122),1) + ... 
    sum(bsxfun(@eq,letters_char,65:90),1); 

count_numbers = sum(bsxfun(@eq,letters_char,48:57),1) 

underscore_c = sum(letters_char=='_') 
hyphen_c = sum(letters_char=='-') 

counts = [underscore_c hyphen_c count_numbers count_lettters] 

xtickstr = ['_'; '-'; cellstr(num2str([0:9]')) ; cellstr(char(97:122)')] 
bar(counts./sum(counts)) 
set(gca, 'XTickLabel',xtickstr,'XTick',1:numel(xtickstr)) 

xlabel('ASCII Characters') 
ylabel('Probability Distribution') 

输出的情节对于一个典型的文本文件

enter image description here