单向使用awk
。这不是一个简单的脚本。该过程简而言之:关键点是变量'all_ranges',当重置从范围文件中读取保存其数据的范围时,当设置时,停止该过程并开始从'id-位置' 文件读取,检查位置在数组中的数据和打印如果匹配的范围。我试图避免多次处理范围文件,并通过块来完成,这使得它更加复杂。
编辑补充一点,我假设id
字段在这两个文件进行排序。否则,这个脚本会失败,你需要另一种方法。的script.awk
内容:
BEGIN {
## Arguments:
## ARGV[0] = awk
## ARGV[1] = <first_input_argument>
## ARGV[2] = <second_input_argument>
## ARGC = 3
f2 = ARGV[ --ARGC ];
all_ranges = 0
## Read first line from file with ranges to get 'class' header.
getline line <f2
split(line, fields)
class_header = fields[2];
}
## Special case for the header.
FNR == 1 {
printf "%s\t%s\n", $0, class_header;
next;
}
## Data.
FNR > 1 {
while (1) {
if (! all_ranges) {
## Read line from file with range positions.
ret = getline line <f2
## Check error.
if (ret == -1) {
printf "%s\n", "ERROR: " ERRNO
close(f2);
exit 1;
}
## Check end of file.
if (ret == 0) {
break;
}
## Split line in spaces.
num = split(line, fields)
if (num != 4) {
printf "%s\n", "ERROR: Bad format of file " f2;
exit 2;
}
range_id = fields[1];
if ($1 == fields[1]) {
ranges[ fields[3], fields[4] ] = fields[2];
continue;
}
else {
all_ranges = 1
}
}
if (range_id == $1) {
delete ranges;
ranges[ fields[3], fields[4] ] = fields[2];
all_ranges = 0;
continue;
}
for (range in ranges) {
split(range, pos, SUBSEP)
if ($2 >= pos[1] && $2 <= pos[2]) {
printf "%s\t%s\n", $0, ranges[ range ];
break;
}
}
break;
}
}
END {
for (range in ranges) {
split(range, pos, SUBSEP)
if ($2 >= pos[1] && $2 <= pos[2]) {
printf "%s\t%s\n", $0, ranges[ range ];
break;
}
}
}
运行它想:
awk -f script.awk file1 file2 | column -t
有了结果如下:
id position class
a1 21 Xfact
a1 39 Xfact
a1 77 xbreak
b1 88 Xbreak
b1 122 Xbreak
c1 22 Xbreak
这是功课?它看起来很夸张。 – Vatine 2012-07-20 09:58:04