2010-09-15 63 views
0

我有一个固定宽度的平面文件。更糟糕的是,每行可以是一个新的记录或高于该行的子记录,通过对各行的第一个字符标识:分析多行固定宽度文件

A0020SOME DESCRIPTION MORE DESCRIPTION 922 2321  # Separate 
A0021ANOTHER DESCRIPTIONMORE DESCRIPTION 23111442  # records 
B0021ANOTHER DESCRIPTION THIS TIME IN ANOTHER FORMAT # sub-record of record "0021" 

我使用Flatworm这似乎是一个很好的尝试库用于解析固定宽度的数据。不幸的是,它的文档陈述如下:

"Repeating segments are supported only for delimited files" 

(同上,“重复片段”)。

我宁可不写一个自定义分析器。 (1)是否可以在Flatworm中做到这一点?(2)是否有提供这种(多行,多子记录)功能的库?

回答

2

你看过JRecordBind吗?

http://jrecordbind.org/

“JRecordBind支持分级固定长度的文件:是其他记录类型的‘儿子’某种类型的记录。”

0

使用uniVocity-parsers您不仅可以读取固定宽度的输入,还可以读取主 - 行数据(其中一行具有子行)。

下面是一个例子:

//1st, use a RowProcessor for the "detail" rows. 
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor(); 

//2nd, create MasterDetailProcessor to identify whether or not a row is the master row. 
// the row placement argument indicates whether the master detail row occurs before or after a sequence of "detail" rows. 
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.TOP, detailProcessor) { 
    @Override 
    protected boolean isMasterRecord(String[] row, ParsingContext context) { 
     //Returns true if the parsed row is the master row. 
     return row[0].startsWith("B"); 
    } 
}; 

FixedWidthParserSettings parserSettings = new FixedWidthParserSettings(new FixedWidthFieldLengths(4, 5, 40, 40, 8)); 

// Set the RowProcessor to the masterRowProcessor. 
parserSettings.setRowProcessor(masterRowProcessor); 

FixedWidthParser parser = new FixedWidthParser(parserSettings); 
parser.parse(new FileReader(yourFile)); 

// Here we get the MasterDetailRecord elements. 
List<MasterDetailRecord> rows = masterRowProcessor.getRecords(); 
for(MasterDetailRecord masterRecord = rows){ 
// The master record has one master row and multiple detail rows. 
    Object[] masterRow = masterRecord.getMasterRow(); 
    List<Object[]> detailRows = masterRecord.getDetailRows(); 
} 

披露:我是这个库的作者。它是开放源代码和免费的(Apache V2.0许可证)。