2011-03-14 133 views
0
for (a = 0; a < filename; a++) { 

     try { 
      System.out 
        .println(" _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ "); 
      System.out.println("\n"); 
      System.out.println("The word inputted : " + word2); 
      File file = new File(
        "C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a 
          + ".txt"); 
      System.out.println(" _________________"); 

      System.out.print("| File = abc" + a + ".txt | \t\t \n"); 

      for (int i = 0; i < array2.length; i++) { 

       totalCount = 0; 
       wordCount = 0; 

       Scanner s = new Scanner(file); 
       { 
        while (s.hasNext()) { 
         totalCount++; 
         if (s.next().equals(array2[i])) 
          wordCount++; 

        } 

        System.out.print(array2[i] + " --> Word count = " 
          + "\t " + "|" + wordCount + "|"); 
        System.out.print(" Total count = " + "\t " + "|" 
          + totalCount + "|"); 
        System.out.printf(" Term Frequency = | %8.4f |", 
          (double) wordCount/totalCount); 

        System.out.println("\t "); 

        double inverseTF = Math.log10((float) numDoc 
          /(numofDoc[i])); 
        System.out.println(" --> IDF = " + inverseTF); 

        double TFIDF = (((double) wordCount/totalCount) * inverseTF); 
        System.out.println(" --> TF/IDF = " + TFIDF + "\n"); 



       } 
      } 
     } catch (FileNotFoundException e) { 
      System.out.println("File is not found"); 
     } 
    } 
} 

}如何总结总值?

这是输出示例:

字输入:你怎么样


| File = abc0.txt |

how - > Word count = | 4 |总计数= | 957 |术语频率= | 0.0042 |

--> IDF = 0.5642714398516419 

--> TF/IDF = 0.0023585013159943234 

是 - >字数= | 7 |总计数= | 957 |术语频率= | 0.0073 |

--> IDF = 0.1962946357308887 

--> TF/IDF = 0.00143580193324579 

you - > Word count = | 10 |总计数= | 957 |术语频率= | 0.0104 |

--> IDF = 0.1962946357308887 

--> TF/IDF = 0.002051145618922557 

我如何总结每个文本文件的整个3 TF/IDF?

回答

1

Asssuming你只是想运行总计是能够显示,那么你for loop之前添加类似:

double runningTfIDF = 0; 

然后计算当前TF/IDF后,再加入行

runningTfIDF += TFIDF; 

然后,在您的for loop之后,您可以添加一行以打印runningTfIDF。

编辑以包括更完整的答案

HashMap<String, BigDecimal> runningTdIDF = new HashMap<String, Double>(); 
HashMap<String, BigDecimal> wordCount = new HashMap<String, Double>(); 
HashMap<String, BigDecimal> frequency = new HashMap<String, Double>(); 
HashMap<String, BigDecimal> inverseTF = new HashMap<String, Double>(); 
for (int i = 0; i < array2.length; i++) { 

    totalCount = 0; 
    wordCountVal = 0; 

    Scanner s = new Scanner(file); 
    { 
     while (s.hasNext()) { 
      totalCount++; 
      if (s.next().equals(array2[i])) 
       wordCountVal++; 

      } 

      BigDecimal wordCount(array2[i],new BigDecimal(wordCountVal)); 

      BigDecimal frequencyVal = new BigDecimal((double) wordCount/totalCount)); 
     frequency.put(array2[i],frequencyVal); 

      BigDecimal inverseTFVal = new BigDecimal(Math.log10((float) numDoc 
          /(numofDoc[i]))); 
     inverseTF.put(array2[i], inverseTFVal); 


      BigDecaim TFIDF =new BigDecima(((wordCount/totalCount) * inverseTF)); 
      runningTfIDF.put(array2[i], TFIDF); 

    } 

    for(String word : wordCount.keySet()){ 
     System.out.print(word + " --> word count " 
     + "\t |"+wordCount.get(word)+"|"); 
     System.out.print(" Total count = " + "\t " + "|" 
      + totalCount + "|"); 
     System.out.printf(" Term Frequency = | %8.4f |", 
      frequency.get(word)); 

     System.out.println("\t "); 

     System.out.println(" --> IDF = " + inverseTF.get(word)); 

     System.out.println(" --> TF/IDF = " + runningTfIDF.get(word) + "\n"); 
    } 

}

这不是目前最清洁的实现,但总之你需要通过存储你的信息,每一个字和循环如果您想要显示以第一个可能的结果开始的总数,则在创建总计之后的单词。那有意义吗?

+0

谢谢先生,但我需要总计它,并显示在每个字的TF/IDF下。先生可以指导我吗? –

+0

你的意思是你会显示TFIDF的第一个字两次,TFIDF的第二个,其次是总和TFIDF?您可以在每次迭代时打印runningTfIDF,它会在那个时间点给出总和。 – dmcnelis

+0

我的意思是给第一个字的整个[3字总和] ..让它变得如此混乱.. –