Python - 从.dat文件中过滤列并从其他列返回给定值

我是Python的新手，一直在用我创建的（150行）学生ID号，等级，年龄，class_code，area_code等等。我想要处理的数据不仅仅是按某一列（按年级，年龄等）进行过滤，而且还会创建一个与该行（学生ID）不同列的列表。我已经设法找到如何隔离需要查找特定值的列，但无法弄清楚如何创建我需要返回的值的列表。Python - 从.dat文件中过滤列并从其他列返回给定值

因此，这里是5行中的数据的样本：

1/A/15/13/43214 
2/I/15/21/58322 
3/C/17/89/68470 
4/I/18/6/57362 
5/I/14/4/00000 
6/A/16/23/34567

我需要的第一列（学生证）名单的基础上，筛选第二列（级）......（并最终第三列，第四列等，但如果我看到它只是第二个看起来如何，我想我可以找出其他）。另请注意：我没有在.dat文件中使用标题。

我想出了如何隔离/查看第二列。

import numpy 

data = numpy.genfromtxt('/testdata.dat', delimiter='/', dtype='unicode') 

grades = data[:,1] 
print (grades)

打印：

['A' 'I' 'C' 'I' 'I' 'A']

但现在，我怎么能拉就在第一列的对应于A的，C的，我是为单独的列表？

所以我想看到一个列表，也与第1列，为A的，C的整数之间的逗号，和我的

list from A = [1, 6] 
list from C = [3] 
list from I = [2, 4, 5]

同样，如果我可以看到它是如何与实现只是第二列，只有一个值（比如说A），我想我可以想出如何为B's，C's，D's等以及其他列做些什么。我只需要看一个例子来说明如何应用这个语法，然后就像其他的一样。

此外，我一直在使用numpy，但也读了关于熊猫，csv和我认为这些库也可能是可能的。但就像我说的，一直在使用numpy来处理.dat文件。我不知道其他库是否会更容易使用？

来源

2017-06-04 chitown88

大熊猫的解决方案：

import pandas as pd 

df = pd.read_csv('data.txt', header=None, sep='/') 
dfs = {k:v for k,v in df.groupby(1)}

因此，我们有DataFrames的字典：

In [59]: dfs.keys() 
Out[59]: dict_keys(['I', 'C', 'A']) 

In [60]: dfs['I'] 
Out[60]: 
    0 1 2 3  4 
1 2 I 15 21 58322 
3 4 I 18 6 57362 
4 5 I 14 4  0 

In [61]: dfs['C'] 
Out[61]: 
    0 1 2 3  4 
2 3 C 17 89 68470 

In [62]: dfs['A'] 
Out[62]: 
    0 1 2 3  4 
0 1 A 15 13 43214 
5 6 A 16 23 34567

如果你想拥有第一列的细分电子邮件列表：

In [67]: dfs['I'].iloc[:, 0].tolist() 
Out[67]: [2, 4, 5] 

In [68]: dfs['C'].iloc[:, 0].tolist() 
Out[68]: [3] 

In [69]: dfs['A'].iloc[:, 0].tolist() 
Out[69]: [1, 6]

来源

2017-06-04 16:08:18 MaxU

您可以浏览列表并制作一个布尔值来选择匹配特定等级的数组。这可能需要一些改进。

import numpy as np 

grades = np.genfromtxt('data.txt', delimiter='/', skip_header=0, dtype='unicode') 


res = {} 
for grade in set(grades[:, 1].tolist()): 
    res[grade] = grades[grades[:, 1]==grade][:,0].tolist() 

print res

来源

2017-06-04 15:45:08

所以我一直在玩到目前为止发布的不同解决方案。我喜欢你的解决方案。它将res显示为一组列表。我试图查找，而且我仍在搜索，但有没有办法将列表与列表分开？所以我可以基本上是水库的'A'级别列表，以及水库等的'C'级别？我所发现的只是将列表添加到集合中，或者从列表中删除列表，或者列表的子集和列表的子集。但我似乎无法找到任何有关多个列表的集合。 – chitown88

实际上你不需要任何广告用于这样一个简单任务的模块。 Pure-Python解决方案将逐行读取文件并使用str.split()对它们进行“解析”，它们将为您提供您的列表，然后您可以对任何参数进行非常多的过滤。喜欢的东西：

students = {} # store for our students by grade 
with open("testdata.dat", "r") as f: # open the file 
    for line in f: # read the file line by line 
     row = line.strip().split("/") # split the line into individual columns 
     # you can now directly filter your row, or you can store the row in a list for later 
     # let's split them by grade: 
     grade = row[1] # second column of our row is the grade 
     # create/append the sublist in our `students` dict keyed by the grade 
     students[grade] = students.get(grade, []) + [row] 
# now your students dict contains all students split by grade, e.g.: 
a_students = students["A"] 
# [['1', 'A', '15', '13', '43214'], ['6', 'A', '16', '23', '34567']] 

# if you want only to collect the A-grade student IDs, you can get a list of them as: 
student_ids = [entry[0] for entry in students["A"]] 
# ['1', '6']

但是，让我们回去了几步 - 如果你想你应该只存储您的列表，然后更广义的解决方案创建一个函数通过传递的参数进行过滤，所以：

# define a filter function 
# filters should contain a list of filters whereas a filter would be defined as: 
# [position, [values]] 
# and you can define as many as you want 
def filter_sublists(source, filters=None): 
    result = [] # store for our result 
    filters = filters or [] # in case no filter is returned 
    for element in source: # go through every element of our source data 
     try: 
      if all(element[f[0]] in f[1] for f in filters): # check if all our filters match 
       result.append(element) # add the element 
     except IndexError: # invalid filter position or data position, ignore 
      pass 
    return result # return the result 

# now we can use it to filter our data, first lets load our data: 

with open("testdata.dat", "r") as f: # open the file 
    students = [line.strip().split("/") for line in f] # store all our students as a list 

# now we have all the data in the `students` list and we can filter it by any element 
a_students = filter_sublists(students, [[1, ["A"]]]) 
# [['1', 'A', '15', '13', '43214'], ['6', 'A', '16', '23', '34567']] 

# or again, if you just need the IDs: 
a_student_ids = [entry[0] for entry in filter_sublists(students, [[1, ["A"]]])] 
# ['1', '6'] 

# but you can filter by any parameter, for example: 
age_15_students = filter_sublists(students, [[2, ["15"]]]) 
# [['1', 'A', '15', '13', '43214'], ['2', 'I', '15', '21', '58322']] 

# or you can get all I-grade students aged 14 or 15: 
i_students = filter_sublists(students, [[1, ["I"]], [2, ["14", "15"]]]) 
# [['2', 'I', '15', '21', '58322'], ['5', 'I', '14', '4', '00000']]

来源

2017-06-04 16:19:55 zwer

Python - 从.dat文件中过滤列并从其他列返回给定值

回答

相关问题