查找并从列表中删除元素，同时保留后插入

位置使用在Python 2.7以下几点：查找并从列表中删除元素，同时保留后插入

dfile = 'new_data.txt' # Depth file no. 1 
d_row = [line.strip() for line in open(dfile)]

我已加载数据文件到列表，但不换行字符。现在我想索引d_row中的所有元素，其中字符串的开头不是数字和/或空的。接下来，我需要：

去除上述所有详细的非数字的实例和
的保存后插入字符串和索引到一个更新的文件。数据的

实施例：

Thu Mar 14 18:17:05 2013              
Fri Mar 15 01:40:25 2013 

FT 

DepthChange: 0.000000,2895.336,0.000 
1363285025.250000,9498.970 
1363285025.300000,9498.970 
1363285026.050000,9498.970 
1363287840.450042,9458.010 
1363287840.500042,9458.010 
1363287840.850042,9458.010 
1363287840.900042,9458.010 
DepthChange: 0.000000,2882.810,9457.200 
1363287840.950042,9458.010 
DepthChange: 0.000000,2882.810,0.000 
1363287841.000042,9457.170 
1363287841.050042,9457.170 
1363287841.100042,9457.170 
1363287841.150042,9457.170 
1363287841.200042,9457.170 
1363287841.250042,9457.170 
1363287841.300042,9457.170 
1363291902.750102,9149.937 
1363291902.800102,9149.822 
1363291902.850102,9149.822 
1363291902.900102,9149.822 
1363291902.950102,9149.822 
1363291903.000102,9149.822 
1363291903.050102,9149.708 
1363291903.100102,9149.708 
1363291903.150102,9149.708 
1363291903.200102,9149.708 
1363291903.250102,9149.708 
1363291903.300102,9149.592 
1363291903.350102,9149.592 
1363291903.400102,9149.592 
1363291903.450102,9149.592 
1363291903.500102,9149.592 
DepthChange: 0.000000,2788.770,2788.709 
1363291903.550102,9149.479 
1363291903.600102,9149.379

我已经做手动移除步骤这是耗时，因为该文件包含50多万行。目前我无法用一些修改来重写包含所有原始元素的文件。

任何提示将不胜感激。

来源

2013-10-22 user2904397

作为一个方面说明，你没有关闭你的输入文件。虽然这不像泄漏输出文件那样糟糕，但它仍然很糟糕。除非你有充分的理由不这样做，否则你几乎不应该在任何地方使用'open'。 – abarnert

你能说更多关于合并过程吗？有一点需要注意，当然如果你删除了行，删除行后的行索引就会改变。有几种方法可以解决这个问题，但并不清楚你在做什么。 –

dfile = 'new_data.txt' 
with open(dfile) as infile: 
    numericLines = set() # line numbers of lines that start with digits 
    emptyLines = set() # line numbers of lines that are empty 
    charLines = [] # line numbers of lines that start with a letter 
    for lineno, line in enumerate(infile): 
    if line[0].isalpha: 
     charLines.append(line.strip()) 
    elif line[0].isdigit(): 
     numericLines.add(lineno) 
    elif not line.strip(): 
     emptyLines.add(lineno)

来源

2013-10-22 00:33:35 inspectorG4dget

最简单的方法是两步：首先获取不匹配行的行数和行数，然后获取匹配行的行数。

d_rows = [line.strip() for line in open(dfile)] 
good_rows = [(i, row) for i, row in enumerate(d_rows) if is_good_row(row)] 
bad_rows = [(i, row) for i, row in enumerate(d_rows) if not is_good_row(row)]

这确实意味着在列表中做了两遍，但谁在乎？如果列表足够小，可以将整个事件读入内存中，那么额外的成本可能可以忽略不计。

或者，如果您需要避免在两次传递中构建两个列表的成本，那么您可能还需要避免一次读取整个文件，因此您必须更巧妙地做一些事情：

d_rows = (line.strip() for line in open(dfile)) # notice genexp, not list comp 
good_rows, bad_rows = [], [] 
for i, row in enumerate(d_rows): 
    if is_good_row(row): 
     good_rows.append((i, row)) 
    else: 
     bad_rows.append((i, row))

如果你可以把东西甚至更远回到地步，你甚至不需要明确good_rows和bad_rows列表，你可以在一个迭代一路经过不断的一切，并没有浪费内存或向上-front阅读时间全部：

d_rows = (line.strip() for line in open(dfile)) # notice genexp, not list comp 
with open(outfile, 'w') as f: 
    for i, row in enumerate(d_rows): 
     if is_good_row(row): 
      f.write(row + '\n') 
     else: 
      whatever_you_wanted_to_do_with(i, row)

来源

2013-10-22 00:35:56 abarnert

感谢所有回答我的问题的人。使用每个答复的一部分，我能够达到预期的结果。最后的工作如下：

goodrow_ind, badrow_ind, badrows = [], [], [] 

d_rows = (line for line in open(ifile)) 
with open(ofile, 'w') as f: 
    for i, row in enumerate(d_rows): 
     if row[0].isdigit(): 
      f.write(row) 
      goodrow_ind.append((i)) 
     else: 
      badrow_ind.append((i)) 
      badrows.append((row)) 

ifile.close() 

data = np.loadtxt(open(ofile,'rb'),delimiter=',')

结果是“好”和“坏”行分别用索引为每个。

来源

2013-10-24 01:13:50 user2904397

查找并从列表中删除元素，同时保留后插入

回答

相关问题