2009-11-20 75 views
0

可能重复:
How to parse this output and separate each field/word如何用分隔符分析文本?

我想分析下列数据,这样我得到的输出如下规定。

输入:

 
RTRV-ALM-EQPT::ALL:RA01; 

    SIMULATOR 09-11-20 13:52:15 
M RA01 COMPLD 
    "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," 
    "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," 
    "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," 
    "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," 
    "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," 
    "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," 
    "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," 
; 

输出:

 
1) RTRV-ALM-EQPT::ALL:RA01; 
2) SIMULATOR 
3) 09-11-20 
4) 13:52:15 
5) M 
6) RA01 
7) COMPLD 
8) "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," 
9) "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," 
10) "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," 
11) "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," 
12) "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," 
13) "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," 
14) "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," 
+0

因此,您是否在可能包含\ escped引号的空格中引用了空格? – 2009-11-20 10:59:48

+0

我认为这是第三次问这个问题。这是一种家庭作业还是什么? – 2009-11-20 11:00:13

回答

0

为了解析任意输入你要知道它的结构。

  1. 前四条线总是存在吗?
  2. 这四行中每一行的格式是什么?
1

最好的方法可能不是将第一个文本转换为第二个文本。

相反,首先将第一个文本解析为一组代表它们实际是什么的Java对象。例如,输入的第二行/第三行可能由具有“area”,“day”和“time”属性的Test类表示。 (只有你可以根据你对什么意思的了解来想出一个合理的模型)。

然后,一旦获得了文件信息的良好内存中表示形式,您可以考虑将文本输出为第二种情况。现在应该很容易从Java对象中打印出各种字段和属性,而不是试图在输入文本上进行实时转换。

1

假设文件相对较小,因此可以读入内存。尝试是这样的:

public class Main { 
    public static void main(String[] args) { 
     String text = "RTRV-ALM-EQPT::ALL:RA01;\n"+ 
      "\n"+ 
      " SIMULATOR 09-11-20 13:52:15\n"+ 
      "M RA01 COMPLD\n"+ 
      " \"SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\\\"Fan-T\\\",\"\n"+ 
      " \"SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\\\"Battery-T\\\",\"\n"+ 
      " \"SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\\\"Processor Failure\\\",\"\n"+ 
      " \"SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\\\"Laser-T\\\",\"\n"+ 
      " \"SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\\\" Laser-T\\\",\"\n"+ 
      " \"SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\\\"Laser-T\\\",\"\n"+ 
      " \"SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\\\"Laser-T\\\",\"\n"+ 
      ";"; 
     Matcher m = Pattern.compile("\"(?:\\\\.|[^\\\"])*\"|\\S+").matcher(text); 
     int n = 0; 
     while(m.find()) { 
      System.out.println((++n)+") "+m.group()); 
     } 
    } 
} 

输出:

1) RTRV-ALM-EQPT::ALL:RA01; 
2) SIMULATOR 
3) 09-11-20 
4) 13:52:15 
5) M 
6) RA01 
7) COMPLD 
8) "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," 
9) "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," 
10) "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," 
11) "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," 
12) "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," 
13) "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," 
14) "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," 
15) ; 

唯一的区别是,有一个15搭配:;,你忘了,我相信。

原始的正则表达式(没有所有的逃逸)看起来是这样的:

"(?:\\.|[^\\"])*"|\S+ 

和火柴:

"   # match a double quote 
(?:  # open non matching group 1 
    \\.  # match a backslash followed by any char (except line breaks) 
    |  # OR 
    [^\\"] # match any char except a backslash and a double quote 
)*   # close non matching group 1 and repeat it zero or more times 
"   # match a double quote 
|   # OR 
\S+  # match one or more characters other than white space chars 

换句话说:匹配带引号的字符串或匹配,但仅以一个字非空格字符

+0

很好的回答:) – 2009-11-20 11:42:28

+0

谢谢安德烈亚斯。 – 2009-11-20 12:28:41