将字符串列表转换为Numpy数组（Python）

所以我试图从文本文件中提取一些数据。目前我能得到包含数据，这又使我的输出看起来像这样的正确的路线：将字符串列表转换为Numpy数组（Python）

[ 0.2  0.148 100. ] 
[ 0.3  0.222 100. ] 
[ 0.4  0.296 100. ] 
[ 0.5  0.37 100. ] 
[ 0.6  0.444 100. ]

所以基本上我有5名名单在每一个字符串。然而，正如你可以想象的，我想把所有这些都分解成一个numpy数组，每个字符串分成3个值。就像这样：

[[0.2, 0.148, 100], 
[0.3, 0.222, 100], 
[0.4, 0.296, 100], 
[0.5, 0.37, 100], 
[0.6, 0.444, 100]]

但是由于是在输出的分离器是随机的，即我不知道这是否是3位，5个空格或制表符，我是那种在如何做到这一点失去了。

UPDATE：

所以数据看起来有点像这样：

data_file = 

Equiv. Sphere Diam. [cm]: 6.9 
Conformity Index: N/A 
Gradient Measure [cm]: N/A 

Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%] 
       0     0      100 
       0.1    0.074      100 
       0.2    0.148      100 
       0.3    0.222      100 
       0.4    0.296      100 
       0.5    0.37      100 
       0.6    0.444      100 
       0.7    0.518      100 
       0.8    0.592      100 

Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1) 
Dose Cover.[%]: 100.0 
Sampling Cover.[%]: 100.0 

Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%] 
       0     0      100 
       0.1    0.074      100 
       0.2    0.148      100 
       0.3    0.222      100 
       0.4    0.296      100 
       0.5    0.37      100 
       0.6    0.444      100

和代码来获得线是：

with open(data_file) as input_data: 
     # Skips text before the beginning of the interesting block: 
     for line in input_data: 
      if line.strip() == 'Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed 
       break 
     # Reads text until the end of the block: 
     for line in input_data: # This keeps reading the file 
      if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)': 
       break 
      text_line = np.fromstring(line, sep='\t') 
      print text_line

所以数据也自之前的文本是随机的，所以我不能只说“跳过前5行”，但是标题总是相同的，并且它也是一样的（在下一个数据开始之前）。所以我只需要一种方法来获得原始数据，将其放入一个数组中，然后我可以从那里使用它。

希望它现在更有意义。

来源

2017-03-13 Denver Dang

使用正则表达式来分割'\ s +' – BlackBear

输入在缺少引号的情况下应该是字符串吗？ – languitar

它没有引号，这是肯定的。如果不是字符串，那么正确的术语是什么？ –

使用print text_line，您将看到阵列格式化为字符串。它们被单独格式化，所以列不排队。

[ 0.2  0.148 100. ] 
[ 0.3  0.222 100. ] 
[ 0.4  0.296 100. ] 
[ 0.5  0.37 100. ] 
[ 0.6  0.444 100. ]

而不是打印，你可以收集列表中的值，并在最后连接。

没有实际测试，我认为这会工作：

data = [] 
with open(data_file) as input_data: 
     # Skips text before the beginning of the interesting block: 
     for line in input_data: 
      if line.strip() == 'Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed 
       break 
     # Reads text until the end of the block: 
     for line in input_data: # This keeps reading the file 
      if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)': 
       break 
      arr_line = np.fromstring(line, sep='\t') 
      # may need a test on len(arr_line) to weed out blank lines 
      data.append(arr_line) 
data = np.vstack(data)

另一种选择是收集行不解析，并传递给np.genfromtxt。换句话说，使用你的代码作为过滤器来给numpy函数提供正确的线条。它从任何提供它的行输入 - 文件，列表，生成器。

def filter(input_data): 
    # Skips text before the beginning of the interesting block: 
    for line in input_data: 
     if line.strip() == 'Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed 
      break 
    # Reads text until the end of the block: 
    for line in input_data: # This keeps reading the file 
     if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)': 
      break 
     yield line 
with open(data_file) as f: 
    data = np.genfromtxt(filter(f)) # delimiter? 
print(data)

来源

2017-03-13 16:44:59 hpaulj

鉴于称为tmp.txt这样一个文本文件：

0.2  0.148 100. 
    0.3  0.222 100. 
    0.4  0.296 100. 
    0.5  0.37 100. 
    0.6  0.444 100.

的片段：

with open('tmp.txt', 'r') as in_file: 
    print [map(float, line.split()) for line in in_file.readlines()]

将输出：

[[0.2, 0.148, 100.0], [0.3, 0.222, 100.0], [0.4, 0.296, 100.0], [0.5, 0.37, 100.0], [0.6, 0.444, 100.0]]

哪个是你想要的希望之一。

来源

2017-03-13 13:23:09 Szabolcs

问题（我认为）是，我解析了整个.txt文件，其中有很多不仅仅是所看到的值的内容。所以我不太确定这个程序是否可行？（我更新了我的问题，所以它可能更有意义） –

1）添加with open之前：

import re 
d_input = []

2）取代

 text_line = np.fromstring(line, sep='\t') 
     print text_line

到

 d_input.append([float(x) for x in re.sub('\s+', ',', line.strip()).split(',')])

3）添加底：

d_array = np.array(d_input)

来源

2017-03-13 13:46:29

将字符串列表转换为Numpy数组（Python）

回答

相关问题