尝试在删除重复内容后合并文件

我有n个文件，它们都有重叠和常见的文字。我想使用这n个文件创建一个文件，以便新文件只包含其中所有n个文件中存在的唯一行。

我正在寻找一个bash命令，可以为我做的python api。如果有算法，我也可以尝试自己编码。

2011-12-01 abc

如果线条的顺序并不重要，你可以这样做：

sort -u file1 file2 ...

这将（一）排序的所有在所有文件中的行，然后（B）删除重复。这将为您提供所有文件中唯一的行。

来源

2011-12-01 01:58:32 larsks

谢谢，这有助于。 – abc

一个常用的数据，你可以使用comm：

DESCRIPTION 
    The comm utility reads file1 and file2, which should be sorted lexically, 
and produces three text columns as output: lines only in file1; lines only in 
file2; and lines in both files.

另一个有用的工具是merge：

DESCRIPTION 
merge incorporates all changes that lead from file2 to file3 into file1. 
The result ordinarily goes into file1. merge is useful for combining separate 
changes to an original.

sort威力弄乱你的订单。您可以尝试以下awk命令。它尚未经过测试，因此请确保您备份您的文件。 :)

awk ' !x[$0]++' big_merged_file

这将从您的文件中删除所有重复的行。

来源

2011-12-01 02:25:57

这可能会为你工作：

# (seq 1 5; seq 3 7;) 
1 
2 
3 
4 
5 
3 
4 
5 
6 
7 
# (seq 1 5; seq 3 7;) | sort -nu 
1 
2 
3 
4 
5 
6 
7 
# (seq 1 5; seq 3 7;) | sort -n | uniq -u 
1 
2 
6 
7 
# (seq 1 5; seq 3 7;) | sort -n | uniq -d 
3 
4 
5

来源

2011-12-01 03:34:08 potong

您需要首先合并的一切，然后排序最后删除重复

#!/bin/bash for file in test/* do cat "$file" >> final done sort final > final2 uniq final2 final rm -rf final2

来源

2011-12-01 03:42:11 mezzie

尝试在删除重复内容后合并文件

回答

相关问题