升序排序字段，删除第一和最后一个数字

-6

a  10,5,3,66,50 
b  2,10,1,88,5,8,9 
c  4,60,10,39,55,22 
d  1,604,3,503,235,45,60,7 
e  20,59,33,2,6,45,36,34,22

我想按升序排列在第二列中的数据进行排序

a  3,5,10,50,66 
b  1,2,5,8,9,10,88 
c  4,10,22,39,55,60 
.... 
....

然后删除最小值和最大值。所以像这样：

a  5,10,50 
b  2,5,8,9,10 
c  10,22,39,55 
.... 
....

任何帮助将不胜感激！

来源

2014-04-21 user3546860

您可以编写一个软件程序来为你做的。然后运行该程序。 – juanchopanza

很酷的数据。它是一个文本文件吗？一个CSV？你已经读过了吗？到目前为止，你有什么？ – thegrinner

它是一个文本文件。我仍然试图找出如何提升单元格内的数据。我不知道如何甚至搜索这个功能。 – user3546860

的Python：

with open('the_file.txt', 'r') as fin, open('result.txt', 'w') as fout: 
    for line in fin: 
     f0, f1 = line.split() 
     fout.write('%s\t%s\n' % (f0, ','.join(sorted(f1.split(','), key=int)[1:-1])))

循环体可以解压缩为：

 f0, f1 = line.split()   # split fields on whitespace 
     items = f1.split(',')   # split second field on commas 
     items = sorted(items, key=int) # or items.sort(key=int) # sorts items as int 
     items = items[1:-1]    # get rid of first and last items 
     f1 = ','.join(items)   # reassemble field as csv 
     line = '%s\t%s\n' % (f0, f1) # reassemble line 
     fout.write(line)    # write it out

来源

2014-04-21 15:23:18

您还可以使用：'f1 = sorted（f1.split（'，'），key = int）[1：-1]' –

如果您有你的“索引”和你可以使用的正则表达式之间的空间数量不确定：re.split（'\ s \ W +'，line） –

@IanLaird：'str.split'取决于任务：''blue \ t \ r 1,2 \ n'.split（）'给出'['blue'，'1,2']'。 –

在这里你去：

awk '{l=split($2,a,",");asort(a);printf "%s\t",$1;for(i=2;i<l;i++) printf "%s"(i==l-1?RS:","),a[i]}' t 
a  5,10,50 
b  2,5,8,9,10 
c  10,22,39,55 
d  3,7,45,60,235,503 
e  6,20,22,33,34,36,45

PS如果我记得正确的，你需要gnu awk由于asort

它是如何工作的：

awk ' 
    {l=split($2,a,",")      # Split the data into array "a" and set "l" to length of array 
    asort(a)        # Sort the array "a" 
    printf "%s\t",$1      # Print the first column 
    for(i=2;i<l;i++)      # Run a loop from second element to second last element in array "a" 
     printf "%s"(i==l-1?RS:","),a[i]  # Print the element separated by "," except for last element, print a new line 
    }' file        # Read the file

来源

2014-04-21 15:20:35 Jotne

您可以使用'split'或'asort'的返回值并在'for loop'中使用它，而不是使用'length（a）'函数。同样在'ternary op'中，你可以做'？'\ n“：”，“'并跳过'print”“'。 –

@JS웃嗨，感谢您的信息，文章已更新。刚刚修改了一篇文章，我发现使用谷歌:)。 PS我认为'RS'优于''\ n“' – Jotne

完整的python示例。这假设你的数据是在一个文本文件中。你会这样称呼它。

./parser.py filename

，或者你可以管一起这样的：

echo 'a 3,2,1,4,5' | ./parser.py -

代码：

#!/bin/env python 
import argparse 
import sys 

def splitAndTrim(d): 
    line = str.split(d) 
    arr = sorted(map(int, line[1].split(','))) 
    print("{0} {1}".format(line[0], ",".join(map(str, arr[1:-1])))) 


if __name__ == '__main__': 
    parser = argparse.ArgumentParser() 
    parser.add_argument('FILE', type=argparse.FileType('r'), default=sys.stdin) 
    args = parser.parse_args(sys.argv[1:]) 
    for line in args.FILE: 
     splitAndTrim(line)

来源

2014-04-21 16:09:12

如果在'sorted'上使用'key = int'参数，则不需要将'str'转换为'int'到'str'。另外，如果你真的想全力以赴，捕获正则表达式中的空白，然后在输出中重用它。 –

感谢您的正则表达式的建议。映射到int仍然是必需的，因为它将剥离输入数据中的训练'\ n'。但我认为，你给我的分裂认识否定了正则表达式的整体需求。 –

'str.split（d）'与'd.split（）'相同。 –

嗯，这里是用perl一个替代的解决方案：

$ perl -F'\s+|,' -lane ' 
print $F[0] . "\t" . join "," , splice @{[sort { $a<=>$b } @F[1..$#F]]} , 1, $#F-2' file 
a  5,10,50 
b  2,5,8,9,10 
c  10,22,39,55 
d  3,7,45,60,235,503 
e  6,20,22,33,34,36,45

或与更新版本的perl你可以删除@{..}说：

perl -F'\s+|,' -lane ' 
    print $F[0] . "\t" . join "," , splice [sort { $a<=>$b } @F[1..$#F]] , 1, $#F-2 
' file

或者只是使用子脚本：

perl -F'\s+|,' -lane ' 
    print $F[0] . "\t" . join "," , (sort { $a<=>$b }@F[1..$#F]) [1..$#F-2] 
' file

来源

2014-04-21 17:47:20

好的！（请注意，您可以在'splice'命令中省略'@ {..}'，只需使用'splice [sort {$ a <=> $ b} @F [1 .. $＃F]]，1，$＃ F＃2。） –

谢谢@HåkonHægland，'splice'的第一个参数应该是一个数组，因此它不会接受匿名数组，除非您将其解除引用。 –

其实我认为方括号产生一个数组引用，请参阅：http://perldoc.perl.org/perlref.html ..这就是为什么你不需要取消引用它.. –

升序排序字段，删除第一和最后一个数字

回答

相关问题