我有2文本文件是如下(如1466786391
大量是唯一时间戳):合并两个文本文件删除重复
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
这:
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
PING 10.0.0.6 (10.0.0.6): 56 data byte
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 44 packets received, 12% packet loss
round-trip min/avg/max = 30.238/62.772/102.959 ms
1466786442
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
所以第一文件以timestamp
结尾,并且第二个文件在中间的某个位置具有相同的数据块,之后具有更多的数据,具体时间戳之前的数据是与第一个文件完全相同。
所以我想输出是这样的:
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 44 packets received, 12% packet loss
round-trip min/avg/max = 30.238/62.772/102.959 ms
1466786442
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
也就是说,将两者连接起来的文件,并创建第三个去除第二文件的副本(文字块那是已经存在于第一个文件。这里是我的代码:
public static void UnionFiles()
{
string folderPath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location), "http");
string outputFilePath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location), "http\\union.dat");
var union = Enumerable.Empty<string>();
foreach (string filePath in Directory
.EnumerateFiles(folderPath, "*.txt")
.OrderBy(x => Path.GetFileNameWithoutExtension(x)))
{
union = union.Union(File.ReadAllLines(filePath));
}
File.WriteAllLines(outputFilePath, union);
}
这是错误的输出我得到(文件结构被破坏):
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
round-trip min/avg/max = 30.238/62.772/102.959 ms
1466786442
round-trip min/avg/max = 5.475/40.986/96.964 ms
1466786492
round-trip min/avg/max = 5.276/61.309/112.530 ms
编辑:此代码被编写来处理多个文件,但是我很高兴,即使只有2可以正确完成。
但是,这并不会删除textblocks
,因为它会删除几条有用的行,并使输出完全无用。我被卡住了。
如何实现这一目标? 谢谢。
'工会= union.Union(File.ReadAllLines(文件路径));'这应该不创建布尔结合,从而去除重复块? –
是的,它应该,我假设格式(UTF8?)或空白问题? – Ouarzy
您需要实际_parse_文件并提取各个块作为Ouarzy建议的比较。其他一切都将导致丑陋,无法维护的黑客行为。 –