2016-07-18 33 views
0

所以基本上这是一个解析器/余弦矩阵计算器,但我不断收到编译错误。我认为我有正确的阅读文本文件的输入路径。但它仍然不会编译。编译错误,认为我有我的输入文件错误,但无法确定是什么做错了

这是我的主类:

import java.io.FileNotFoundException; 
    import java.io.IOException; 

    public class TfIdfMain { 

    public static void main(String args[]) throws FileNotFoundException, IOException { 
     DocumentParser dp = new DocumentParser(); 
     dp.parseFiles("C:/Users/dachen/Documents/doc1.txt"); // give the location of source file 
     dp.tfIdfCalculator(); //calculates tfidf 
     dp.getCosineSimilarity(); //calculates cosine similarity 
    } 
} 

我的分析器类:

import java.io.BufferedReader; 
import java.io.File; 
import java.io.FileNotFoundException; 
import java.io.FileReader; 
import java.io.IOException; 
import java.util.ArrayList; 
import java.util.List; 

public class DocumentParser { 

    //This variable will hold all terms of each document in an array. 
    private List<String[]> termsDocsArray = new ArrayList<String[]>(); 
    private List<String> allTerms = new ArrayList<String>(); //to hold all terms 
    private List<double[]> tfidfDocsVector = new ArrayList<double[]>(); 

    /** 
    * Method to read files and store in array. 
    */ 
    public void parseFiles(String filePath) throws FileNotFoundException, IOException { 
     File[] allfiles = new File(filePath).listFiles(); 
     BufferedReader in = null; 
     for (File f : allfiles) { 
      if (f.getName().endsWith(".txt")) { 
       in = new BufferedReader(new FileReader(f)); 
       StringBuilder sb = new StringBuilder(); 
       String s = null; 
       while ((s = in.readLine()) != null) { 
        sb.append(s); 
       } 
       String[] tokenizedTerms = sb.toString().replaceAll("[\\W&&[^\\s]]", "").split("\\W+"); //to get individual terms 
       for (String term : tokenizedTerms) { 
        if (!allTerms.contains(term)) { //avoid duplicate entry 
         allTerms.add(term); 
        } 
       } 
       termsDocsArray.add(tokenizedTerms); 
      } 
     } 

    } 

    /** 
    * Method to create termVector according to its tfidf score. 
    */ 
    public void tfIdfCalculator() { 
     double tf; //term frequency 
     double idf; //inverse document frequency 
     double tfidf; //term requency inverse document frequency   
     for (String[] docTermsArray : termsDocsArray) { 
      double[] tfidfvectors = new double[allTerms.size()]; 
      int count = 0; 
      for (String terms : allTerms) { 
       tf = new TfIdf().tfCalculator(docTermsArray, terms); 
       idf = new TfIdf().idfCalculator(termsDocsArray, terms); 
       tfidf = tf * idf; 
       tfidfvectors[count] = tfidf; 
       count++; 
      } 
      tfidfDocsVector.add(tfidfvectors); //storing document vectors;    
     } 
    } 

    /** 
    * Method to calculate cosine similarity between all the documents. 
    */ 
    public void getCosineSimilarity() { 
     for (int i = 0; i < tfidfDocsVector.size(); i++) { 
      for (int j = 0; j < tfidfDocsVector.size(); j++) { 
       System.out.println("between " + i + " and " + j + " = " 
            + new CosineSimilarity().cosineSimilarity 
             (
             tfidfDocsVector.get(i), 
             tfidfDocsVector.get(j) 
             ) 
           ); 
      } 
     } 
    } 
} 

这是我的错误:

Exception in thread "main" java.lang.NullPointerException 
    at DocumentParser.parseFiles(DocumentParser.java:22) 
    at TfIdfMain.main(TfIdfMain.java:7) 

我在文档中的文本文件路径有误吗?

+2

我很困惑 - 你说你得到一个编译错误,但是,然后你显示一个运行时异常,而不是。你能澄清吗? – ruakh

+0

对不起,我不何处运行时间错误是,在异常线程“主”显示java.lang.NullPointerException \t在DocumentParser.parseFiles(DocumentParser.java:22) \t在TfIdfMain.main(TfIdfMain.java:7 ) –

回答

1

Windows文件路径应该使用\而不是/ 。另外还有另一个bug,代码不需要整个文件路径,只是目录路径。 所以不是

dp.parseFiles("C:/Users/dachen/Documents/doc1.txt"); 

应该

dp.parseFiles("C:\\Users\\dachen\\Documents"); 
+1

您可以显示正确的串... –

+0

@mursaleen艾哈迈德对不起先生艾哈迈德仍然得到错误:在DocumentParser.parseFiles(DocumentParser.java:22) \t在线程异常“主”显示java.lang.NullPointerException \t TfIdfMain.main(TfIdfMain.java:7) –

+0

你应该在路径中传递整个文件名吗?在'parseFiles'函数中你已经这样做了:'File [] allfiles = new File(filePath).listFiles();' –

0

listFiles()的文档指出它:

Returns null if this abstract pathname does not denote a directory

要传递的路径是不是一个目录。

相关问题