使用java在文件中搜索unicode字符串

如何使用java在文件中搜索unicode字符串？下面是我尝试过的代码。它工作的字符串不是unicode。使用java在文件中搜索unicode字符串

import java.util.regex.Matcher; 
    import java.util.regex.Pattern; 
    import java.io.*; 
    import java.util.*; 
    class file1 
    { 
    public static void main(String arg[])throws Exception 
    { 
    BufferedReader bfr1 = new BufferedReader(new InputStreamReader(
      System.in)); 
    System.out.println("Enter File name:"); 
    String str = bfr1.readLine(); 
    BufferedReader br=new BufferedReader(new InputStreamReader(System.in)); 
    String s; 
    int count=0; 
    int flag=0; 

    System.out.println("Enter the string to be found"); 
    s=br.readLine(); 
    BufferedReader bfr = new BufferedReader(new FileReader(str)); 
    String bfr2=bfr.readLine(); 
    Pattern p = Pattern.compile(s); 
      Matcher matcher = p.matcher(bfr2); 
      while (matcher.find()) { 
      count++; 
      }System.out.println(count); 
    }}

来源

2011-10-30 Rekharaj

嗯，有问题的三个可能的来源，我可以看到：

正则表达式可能不正确。你真的需要使用正则表达式吗？你想匹配一个模式，还是只是一个简单的字符串？
您可能无法从命令行获得非ASCII输入。您应该根据其Unicode字符转储输出字符串（请参阅后面的代码）。
您可能正在以错误的编码读取文件。目前您使用的是始终使用平台默认编码的FileReader。您正在尝试阅读的文件的编码是什么？我建议使用与文件相匹配的显式编码（例如UTF-8）使用FileInputStream包装在InputStreamReader中。

要调试真正值的字符串，我通常会使用这样的：

private static void dumpString(String text) { 
    for (int i = 0; i < text.length(); i++) { 
     char c = text.charAt(i); 
     System.out.printf("%d: %4h (%c)", i, c, c); 
     System.out.println(); 
    } 
}

这样，你可以看到每个char确切的UTF-16码点的字符串中。

来源

2011-10-30 07:40:21

使用java在文件中搜索unicode字符串

回答

相关问题