我需要做的是我有几个文件(txt)每个大约2GB。我需要削减文件让我们说每当'%% XGF NEW_SET'标记出现我需要创建新文件并将其存储。我认为这个标记大概每40-50行出现一次。每行有4-20个字符。 所以我需要将大文件切割成数千个小文件,然后再处理它们。我想到了这样的示例代码。高效的方式来读取和剪切文件
DirectoryInfo di = new DirectoryInfo(ConfigurationManager.AppSettings["BilixFilesDir"]);
var files = di.GetFiles();
int count = 0;
bool hasObject = false;
StringBuilder sb = new StringBuilder();
string line = "";
foreach (var file in files)
{
using (StreamReader sr = new StreamReader(file.FullName,Encoding.GetEncoding(1250)))
{
while ((line = sr.ReadLine()) != null)
{
//when new file starts
if (line.Contains("%%XGF NEW_SET"))
{
//when new file existed I need to store old one
if (hasObject)
{
File.WriteAllText(string.Format("{0}/{1}-{2}", ConfigurationManager.AppSettings["OutputFilesDir"], count++, file.Name), sb.ToString());
sb.Length = 0;
sb.Capacity = 0;
}
//setting exist flag
hasObject = true;
}
//when there is no new object
else
//when object exists adding new lines
if (hasObject)
sb.AppendLine(line);
}
//when all work done saving last object
if (hasObject)
{
File.WriteAllText(string.Format("{0}/{1}-{2}", ConfigurationManager.AppSettings["OutputFilesDir"], count++, file.Name), sb.ToString());
sb.Length = 0;
sb.Capacity = 0;
}
}
}
}
所以我的例子看起来像那样,但我需要高效率。任何想法我可以改进我的解决方案?由于
'%% XGF NEW_SET`是分割线上唯一的东西吗?如果没有,你正在失去其他信息,因为你正在扔掉这条线。 – 2011-02-11 14:47:58