2016-10-30 75 views
0

我知道有很多方法counting words in a LaTeX document,比其他更精确。乳胶文档字统计

我所追求的是一种对LaTeX文档执行简单统计的方法。这是,而不是将所有单词分组并计算其长度,我想分别计算每个单词的实例数。

输出会是这个样子:

1. (15% - 456) that 
++++++++++++++++++++++++++++++++++++++++++++ 
2. (10% - 308) the 
++++++++++++++++++++++++++++++ 
3. (8% - 213) is 
+++++++++++++++++++++ 
4. (4% - 102) of 
+++++++++ 
5. (2% - 55) and 
++++ 

是否有任何工具,在那里,骗子做类似的事情做到这一点?

回答

0

我找不到任何需要的软件包/脚本,所以我最终创建了自己的软件包。

这是一个小的(基本的)Python脚本,但它可以完成这项工作。输出如下:

Number of unique words: 1945 
Total number of words: 16660 

    0. 1210  (7.26%) - the 
    1. 461  (2.77%) - in 
    2. 431  (2.59%) - of 
    3. 317  (1.90%) - a 
    4. 313  (1.88%) - and 
    5. 304  (1.82%) - for 
    6. 304  (1.82%) - to 
    7. 241  (1.45%) - is 
    8. 176  (1.06%) - words 
    9. 165  (0.99%) - by 
Sum percentage: 23.5% 

Word lengths distribution: 
1 ++ (317) 
2 ++++++++++++++++++++ (2602) 
3 ++++++++++++++++++++++++++++++ (3947) 
4 ++++++++++++++++++ (2342) 
5 +++++++++++++ (1752) 
6 ++++++++++ (1348) 
7 +++++++++ (1154) 
8 ++++++++ (1071) 
9 ++++++ (787) 
10 ++++ (586) 
11 +++ (383) 
12 + (129) 
13 + (123) 
14 + (36) 
15 + (83) 

它上传到Github上的回购:LaTexWordStats