将3列的文件转换为矩阵

我有一个文件，信息分成3列。第一列表示将填充矩阵的第一行的类别，第二列表示将在矩阵的第一列中的类别。第三行代表将填充矩阵大部分的值。原始文件的第1列和第2列可以颠倒过来，这并没有什么不同。将3列的文件转换为矩阵

的文件看起来像这样

Category1 type1 + 
Category1 type2 - 
Category1 type3 + 
Category2 type1 + 
Category2 type2 + 
Category2 type3 + 
Category3 type1 + 
Category3 type2 - 
Category3 type3 -

我希望把它变成看起来像这样

Category1 Category2 Category3 
type1 + + + 
type2 - + - 
type3 + + -

我想awk将可能做到这一点，我只是不矩阵知道如何让awk做到这一点

来源

2017-05-15 Jacob

关于输入数据：列是否由制表符分隔或用空格分隔？它应该如何与输出有关？ – Scheff

@Scheff一切都是制表符分隔 – Jacob

啊哈。我会很快发送一个解决方案。（它目前用于输入分隔的空间和用于输出分隔的选项卡。） – Scheff

awk来救援！

awk 'BEGIN {FS=OFS="\t"} 
      {col[$1]; row[$2]; val[$2,$1]=$3} 
    END {for(c in col) printf "%s", OFS c; print ""; 
      for(r in row) 
       {printf "%s", r; 
       for(c in col) printf "%s", OFS val[r,c] 
       print ""}}' file 

     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  -

来源

2017-05-15 16:54:44 karakfa

我认为这是在将它放入脚本而不是“单线程”的时候。 – 123

我并不是不同意...... – karakfa

这是一个基于GNU awk的解决方案。我强调这一点，因为多维数组（这是为了方便的解决方案而获得的）是GNU awk特有的特性。

我的脚本table2matrix.awk：

# collect values 
{ 
    # category=$1 ; type=$2 ; value=$3 
    if (!($1 in categories)) { categories[$1] } 
    types[$2][$1] = $3 
} 
# output of values 
END { 
    # print col. header 
    for (category in categories) { printf("\t%s", category); } 
    print "" 
    # print rows 
    for (type in types) { 
    printf("%s", type); 
    for (category in categories) { 
     printf("\t%s", types[type][category]); 
    } 
    print "" 
    } 
}

样品会话：

$ cat >table.txt <<EOF 
> Category1 type1 + 
> Category1 type2 - 
> Category1 type3 + 
> Category2 type1 + 
> Category2 type2 + 
> Category2 type3 + 
> Category3 type1 + 
> Category3 type2 - 
> Category3 type3 - 
> EOF 

$ awk -f table2matrix.awk table.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  - 

$ cat table.txt | sed $'s/ /\t/g' >table-tabs.txt 

$ awk -f table2matrix.awk table-tabs.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  - 

$ cat >table-sorted.txt <<EOF 
> Category1 type1 + 
> Category1 type3 + 
> Category2 type1 + 
> Category2 type2 + 
> Category2 type3 + 
> Category3 type1 + 
> Category1 type2 - 
> Category3 type2 - 
> Category3 type3 - 
> EOF 

$ awk -f table2matrix.awk table-sorted.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  - 

$ tac table.txt >table-reverse.txt 

$ awk -f table2matrix.awk table-reverse.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  - 

$ grep '+' table.txt >table-incompl.txt 

$ awk -f table2matrix.awk table-incompl.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2   + 
type3 +  + 

$

table.txt是分开的（从Web浏览器复制/粘贴）的空间，table-tabs.txt是table.txt与制表符代替空格序列。

从脚本（但不是来自Web浏览器中的代码示例）中可以看出，输出是制表符分隔的。

在测试了原始示例输入的一些变体之后，我修复了我的awk脚本。它变得更短一点，更类似于karafka的其他解决方案...

来源

2017-05-15 16:57:38 Scheff

将3列的文件转换为矩阵

回答

相关问题