如何按照其名称对Linux服务器中的文件进行分类？

如何使用ls命令和选项列出重复的文件名，并在不同的目录？如何按照其名称对Linux服务器中的文件进行分类？

2017-03-04 MasterGL

您的问题更适合[超级用户]（http://superuser.com/tour）。 [Stack Overflow是专业和爱好者程序员的问答网站]（http://stackoverflow.com/tour）。 – Cyrus

@Cyrus，这是一个公平的评论。事实证明，这比简单的bash globbing问题更复杂，并且需要一些有争议的程序设计（尽管很轻微），尽管提问者在他提问时没有意识到这一点。所以我在这里回答了。但是，你说的是正确的，正如问更多超级用户导向。 – eewanco

不要忘了标记答案为“接受”，如果它适合你！ – eewanco

不能使用一个单一的，基本ls命令来做到这一点。您必须使用其他POSIX/Unix/GNU实用程序的组合。例如，先找到重复的文件名：

find . -type f -exec basename "\{}" \; | sort | uniq -d > dupes

这通过在当前目录（.）整个目录层次结构意味着find的所有文件（-type f）和执行（-exec）命令basename（其条（\{}），命令结束（\;）。这些文件然后排序并打印出重复的行（uniq -d）。结果存入文件dupes。现在你的文件名被复制，但是你不知道它们在哪个目录。再次使用find来找到它们。使用bash作为你的shell：

while read filename; do find . -name "$filename" -print; done < dupes

这意味着通过（while）文件dupes和read的所有内容到变量filename每一行循环。对于每一行，再次执行find并搜索$filename的具体-name，并打印出来（-print，但它是隐含的，所以这是多余的）。

说实话，你可以结合这些不使用中间文件：

find . -type f -exec basename "\{}" \; | sort | uniq -d | while read filename; do find . -name "$filename" -print; done

如果你不熟悉它，在|操作装置，使用前一个命令的输出作为执行以下命令输入以下命令。示例：

[email protected]:~$ mkdir test 
[email protected]:~$ cd test 
[email protected]:~/test$ mkdir 1 2 3 4 5 
[email protected]:~/test$ mkdir 1/2 2/3 
[email protected]:~/test$ touch 1/0000 2/1111 3/2222 4/2222 5/0000 1/2/1111 2/3/4444 
[email protected]:~/test$ find . -type f -exec basename "\{}" \; | sort | uniq -d | while read filename; do find . -name "$filename" -print; done 
./1/0000 
./5/0000 
./1/2/1111 
./2/1111 
./3/2222 
./4/2222

声明：要求声明文件名都是数字。虽然我试图设计代码来处理带有空格的文件名（并且在我的系统上进行测试时，它可以工作），但是在遇到特殊字符，换行符，nuls或其他异常情况时，代码可能会中断。请注意，-exec参数具有特殊的安全考虑因素，不应该被超过任意用户文件的root用户使用。提供的简化示例仅用于说明和教学目的。请查阅您的man页面和相关的CERT建议，以获得完整的安全隐患。

来源

2017-03-04 16:41:54 eewanco

在几个位置触摸名称为“file01 01 17”的文件并尝试您的代码。 –

@GeorgeVasiliou，为我工作，我试了一下。此外，海报表明他的文件名都是数字。没有要求处理空格的文件名，所以我没有专门测试该场景的代码。不过，我会添加一个免责声明。 – eewanco

我有一个功能上的重复文件的bash我的个人资料（bash的4.4）。确实，找到是正确的工具。

我用分隔与空字符，而不是新的行（缺省查找操作）的查找结果-print0选项查找相结合。现在我可以捕获当前目录和子目录下的所有文件。

这将确保结果将是正确的不管文件名包含特殊字符，如空格或新行（在某些极少数情况下）。您可以构建一个数组，然后在该数组中找到重复的文件，而不是使用双重查找。然后你使用“duplicates”作为模式来grep整个数组。

因此，像这样的作品确定为我的功能：

$ IFS= readarray -t -d '' fn< <(find . -name 'file*' -print0) 
$ dupes=$(LC_ALL=C sort <(printf '\<%s\>$\n' "${fn[@]##*/}") |uniq -d) 
$ grep -e "$dupes" <(printf '%s\n' "${fn[@]}") |awk -F/ '{print $NF,"==>",$0}' |LC_ALL=C sort

这是一个测试：

$ IFS= readarray -t -d '' fn< <(find . -name 'file*' -print0) 
# find all files and load them in an array using null delimiter 
$ printf '%s\n' "${fn[@]}" #print the array 
./tmp/file7 
./tmp/file14 
./tmp/file11 
./tmp/file8 
./tmp/file9 
./tmp/tmp2/file09 99 
./tmp/tmp2/file14.txt 
./tmp/tmp2/file15.txt 
./tmp/tmp2/file$100 
./tmp/tmp2/file14.txt.bak 
./tmp/tmp2/file15.txt.bak 
./tmp/file1 
./tmp/file4 
./file09 99 
./file14 
./file$100 
./file1 

$ dupes=$(LC_ALL=C sort <(printf '\<%s\>$\n' "${fn[@]##*/}") |uniq -d) 
#Locate duplicate files 
$ echo "$dupes" 
\<file$100\>$ #Mind this one with special char $ in filename 
\<file09 99\>$ #Mind also this one with spaces 
\<file14\>$ 
\<file1\>$ 
#I have on purpose enclose the results between \<...\> to force grep later to capture full words and avoid file1 to match file1.txt or file11 

$ grep -e "$dupes" <(printf '%s\n' "${fn[@]}") |awk -F/ '{print $NF,"==>",$0}' |LC_ALL=C sort 
file$100 ==> ./file$100   #File with special char correctly captured 
file$100 ==> ./tmp/tmp2/file$100 
file09 99 ==> ./file09 99  #File with spaces in name also correctly captured 
file09 99 ==> ./tmp/tmp2/file09 99 
file1 ==> ./file1 
file1 ==> ./tmp/file1 
file14 ==> ./file14    #other files named file14 like file14.txt and file14.txt.bak not captured since they are not duplicates. 
file14 ==> ./tmp/file14

提示：

这一个<(printf '\<%s\>$\n' "${fn[@]##*/}")在使用过程中替换使用bash内置的参数扩展技术来查找结果的基本名称。
LC_ALL = C需要排序才能正确排序文件名。
在4.4之前的bash版本中，readarray不接受-d选项（分隔符）。在这种情况下，你可以将发现结果的阵列

而IFS =读-r -d'资源;做好FN + =（ “$ RES”）;完成< <（找到.... -print0）

来源

2017-03-05 01:15:17

如何按照其名称对Linux服务器中的文件进行分类？

回答

相关问题