Python的这些标准

我试图清理一些规定 - 文本文件处理文本文件。我是一个Python的新手，对我来说如此光秃秃的。Python的这些标准

我的文字这样表示

NHIST_0003 (ZS.MC.BGE.0424SPVCOS) (21.12) 14.08
(ZS.MC.BLK.0424SPVCOS) (21.12) 14.08
(ZS.MC.GRY.0424SPVCOS) (21.12) 14.08
(ZS.MC.BLK.0525SPVCOS3) (21.12) 14.08
(ZS.MC.GRY.0525SPVCOS2) (21.12) 14.08
NHIST_0004 (ZS.MC.BGE.0424SPVCOS) (21.12) 14.08

我需要删除任何文本是未来第一“（”括号如果行有之前的任何文本以及删除我要的文本圆括号保管。我还需要进去干掉的数字与括号。看着行号一个，我只是想保持

ZS.MC.BGE.0424SPVC0S 14.08

这是我想出了试图把事情的代码。我宁愿不要使用重新表达式，因为在这个阶段对我来说太过于进步

fileName='reach.txt' 
fileName2='outreach.txt' 


while True: 
    f=open(fileName,'r') 
    for words in f: 
     x=words.split('(', 1)[-1] 
     g = open(fileName2,'w') 
     g.write(x) 
     g.close()

此循环是无限的。我认为通过关闭文件，我告诉系统停止处理生产线。

任何帮助，将不胜感激

感谢

来源

2014-04-08 weemo

'开放的（文件， 'R'）作为FH：在FH行：行[：row.find（ '（'）]'或者只是做'row.split（）'拿走你想要的东西。例如'x = row.split（）'和'x [1]，x [3]' – Torxed

，但是即使文本文件是x = row.split（）和x [1]，x [3]没有格式化全部相同？ – weemo

它没有，所以我重新编写代码来查找'（...）'，然后取出行中的最后一项，因为这看起来是一致的。 – Torxed

可以遍历在这样的文件中的行：

with open('filename.txt') as f: 
    for line in f.readlines(): 
     #do stuff

要采取从一条线，你想要的信息，你可以这样做：

cleaned = [] 
items = line.split() 
for item in items: 
    if item.startswith('(') and item.endswith(')'): 
     cleaned.append(item.strip('()')) 
     break 
cleaned.append(items[-1]) 
cleaned = ' '.join(cleaned)

全部程序：

in_file = 'reach.txt' 
out_file = 'outreach.txt' 

def clean(string): 
    if not string: 
     return string 

    cleaned = [] 
    items = string.split() 
    for item in items: 
     if item.startswith('(') and item.endswith(')'): 
      cleaned.append(item.strip('()')) 
      break 
    cleaned.append(items[-1]) 
    return ' '.join(cleaned) 

with open(in_file) as i, open(out_file, 'w') as o: 
    o.write('\n'.join([clean(line) for line in i]))

来源

2014-04-08 22:42:14

或只是'对于f'线，同样的事情。这也给了语法错误，因为缺少'：'（固定为你） – Torxed

辉煌！非常感谢。我喜欢你如何写它。非常可读和简单。 – weemo

Scorpion_God喜欢该代码，但提示出了索引错误 – weemo

fileName='reach.txt' 
fileName2='outreach.txt' 

def isfloat(s): 
    try: 
     float(s) 
     return True 
    except ValueError: 
     return False 

g = open(fileName2, 'w') 
with open(fileName, 'r') as fh: 
    for row in fh: 
     x = row.split() 
     for item in x: 
      if '(' in item and ')' in item: 
       first = item.strip('()') 
       break 
     for i in range(-1, 0-len(x), -1): 
      second = x[i] 
      if isfloat(second): 
       break 
     print(first, second) 
     g.write(' '.join((first, second)) + '\n') 
g.close()

其中给出：

ZS.MC.BGE.0424SPVCOS 14.08 
ZS.MC.BLK.0424SPVCOS 14.08 
ZS.MC.GRY.0424SPVCOS 14.08 
ZS.MC.BLK.0525SPVCOS3 14.08 
ZS.MC.GRY.0525SPVCOS2 14.08 
ZS.MC.BGE.0424SPVCOS 14.08

我们去那里，这段代码将处理各种故障的数据。例如，如果浮置值不是在最后将被覆盖，以及，如果所述(...)数据是不固定在可以说，在第二位置，但第一，这将被覆盖为好。

来源

2014-04-08 22:35:36 Torxed

你可以尝试使用正则表达式，如果每行里有(code you want) (thing you don't want)。

import re 
infile = 'reach.txt' 
outfile = 'outreach.txt' 

with open(infile, 'r') as inf, open(outfile, 'w') as outf: 
    for line in inf: 
     # each line has "* (what you want) (trash) *" 
     # always take first one 
     first = re.findall("(\([A-z0-9\.]*\))", line)[0] 

     items = line.strip().split(" ") 
     second = line[-1] 
     to_write = " ".join((first, second)) 
     outf.write(to_write + "\n")

"(\([A-z0-9\.]*\))"的任意组合（由[ ]*表示）相匹配正则表达式：

字母（A-z），
号码（0-9），和
周期（\.）

是在侧面括号（\(\)）。

从你的例子中，总会有两个匹配，比如ZS.MC.BLK.0424SPVCOS和21.12。 re.findall将在给定的顺序找到这两个。既然你想要的永远是第一，抓住与re.findall(regex, line)[0]。

来源

2014-04-08 22:39:51 wflynny

暂时还不能。太向前推进了。阅读关于它我只是没有得到通配符 – weemo

有没有比现在更好的时间学习！ – wflynny

@weemo'.'只是表示任何角色。所以“a ..”将匹配任何以“a”开头的三个字符串。 –

blacklist = set('1234567890.') 
with open('reach.txt') as infile, open('outreach.txt', 'w') as outfile: 
    for line in infile: 
     line = line.strip() 
     if not line: 
      continue 
     _left, line = line.split("(", 1) 
     parts = [p.rstrip(")").lstrip("(") for p in line.split()] 
     parts = [p for i,p in enumerate(parts) if not all(char in blacklist for char in p) or i==len(parts)-1] 
     outfile.write("%s\n" %(' '.join(parts)))

，跟你前充足reach.txt，我得到

ZS.MC.BGE.0424SPVCOS 14.08 
ZS.MC.BLK.0424SPVCOS 14.08 
ZS.MC.GRY.0424SPVCOS 14.08 
ZS.MC.BLK.0525SPVCOS3 14.08 
ZS.MC.GRY.0525SPVCOS2 14.08 
ZS.MC.BGE.0424SPVCOS 14.08

来源

2014-04-08 22:46:19 inspectorG4dget

ValueError：需要多个值才能解包 – weemo

@weemo：向我显示输入。我怀疑文件末尾是空行。如果是这样，编辑应该帮助 – inspectorG4dget

无论如何，我可以发布整个文本文件？是4000线。不幸的是，不严格遵循格式 – weemo

Python的这些标准

回答

相关问题