2012-08-01 25 views
0

我有一些单词需要按频率排序。在我这样做之前,我需要删除诸如“the”,“it”等(任何少于三个字母的单词),以及所有数字和以#开头的任何单词(这些单词从Twitter,尽管下面的例子只是维基百科的一段随机段落)。从一个数组中删除几个单词 - Javascript

我可以删除一个单词,但一直在疯狂尝试删除多个或一个范围。有什么建议么?谢谢!

http://jsfiddle.net/9NzAC/6/

HTML:

<div id="text" style="background-color:Teal;position:absolute;left:100px;top:10px;height:500px;width:500px;"> 
Phrenology is a pseudoscience primarily focused on measurements of the human skull, based on the concept that the brain is the organ of the mind, and that certain brain areas have localized, specific functions or modules. The distinguishing feature of phrenology is the idea that the sizes of brain areas were meaningful and could be inferred by examining the skull of an individual. 
</div> 

JS:

//this is the function to remove words 
<script type="text/javascript"> 
    function removeA(arr){ 
     var what, a= arguments, L= a.length, ax; 
     while(L> 1 && arr.length){ 
      what= a[--L]; 
      while((ax= arr.indexOf(what))!= -1){ 
       arr.splice(ax, 1); 
      } 
     } 
      return arr; 
     } 
</script> 

//and this does the sorting & counting 
<script type="text/javascript"> 
    var getMostFrequentWords = function(words) { 
     var freq={}, freqArr=[], i; 

     // Map each word to its frequency in "freq". 
      for (i=0; i<words.length; i++) { 
      freq[words[i]] = (freq[words[i]]||0) + 1; 
     } 

     // Sort from most to least frequent. 
      for (i in freq) freqArr.push([i, freq[i]]); 
      return freqArr.sort(function(a,b) { return b[1] - a[1]; }); 
     }; 

     var words = $('#text').get(0).innerText.split(/\s+/); 

     //Remove articles & words we don't care about. 
     var badWords = "the"; 
      removeA(words,badWords); 
     var mostUsed = getMostFrequentWords(words); 
     alert(words); 

</script> 
+0

我建议你做'数组[我] = null'(或' “”'),然后就收拾你的阵列空节点。您可以使用'Array#filter'轻松实现该功能。 – 2012-08-01 03:24:30

+1

如果您遇到任何问题,请查看此帮助。 http://jsfiddle.net/n2jj4/1/ – 2012-08-01 05:00:26

+0

这是一段非常有用且全面的代码。非常感谢。这非常有帮助。 – user1307028 2012-08-01 05:48:33

回答

2

而不是从原始数组中删除,只是push到一个新的,它更简单,它会使您的代码更短,更具可读性。

var words = ['the', 'it', '12', '#twit', 'aloha', 'hello', 'bye'] 
var filteredWords = [] 

for (var i = 0, l = words.length, w; i < l; i++) { 
    w = words[i] 
    if (!/^(#|\d+)/.test(w) && w.length > 3) 
     filteredWords.push(w) 
} 

console.log(filteredWords) // ['aloha', 'hello'] 

演示:http://jsfiddle.net/VcfvU/

+0

哇。就是这样。非常感谢,非常感谢。 – user1307028 2012-08-01 03:46:50

+0

极力不建议隐藏括号,并且还建议将分号D: – 2012-08-01 16:00:14

1

我建议你做array[i] = null(或""),然后就收拾你的阵列空节点。您可以轻松实现,使用Array#filter

测试:http://jsfiddle.net/6LPep/ 代码:

var FORGETABLE_WORDS = ',the,of,an,and,that,which,is,was,'; 

var words = text.innerText.split(" "); 

for(var i = 0, word; word = words[i++];) { 
    if (FORGETABLE_WORDS.indexOf(',' + word + ',') > -1 || word.length < 3) { 
     words[i-1] = ""; 
    } 
} 

// falsy will get deleted 
words.filter(function(e){return e}); 
// as example 
output.innerHTML = words.join(" "); 

// just continue doing your stuff with "words" array. 
// ...​ 

我认为这是比你目前做的方式清洁。如果你需要其他的东西,我会更新这个答案。

+1

非常感谢您对此的帮助!学习了一种新技术 - 谢谢! – user1307028 2012-08-01 03:48:53

相关问题