是否有一个从python csv模块构建的方法来枚举特定列的所有可能的值？

我有一个csv文件，它有很多列。现在我的要求是找到该特定列的所有可能值。是否有一个从python csv模块构建的方法来枚举特定列的所有可能的值？

python中是否有任何内置的函数可以帮助我获取这些值。

来源

2016-02-05 Apoorva sahay

问题不清楚：你有很多列，你想找到所有可能的值为“该”列？哪一列？你有很多 – Ramast

@Ramast我修改了它。 –

@Apoorvasahay以下任何答案都可以为您提供解决方案吗？如果是这样，请选择一个作为答案。 – Igor

你可以我们pandas。

示例文件many_cols.csv：

col1,col2,col3 
1,10,100 
1,20,100 
2,10,100 
3,30,100

查找每列的唯一值：

>>> import pandas as pd 
>>> df = pd.read_csv('many_cols.csv') 
>>> df.col1.drop_duplicates().tolist() 
[1, 2, 3] 
>>> df['col2'].drop_duplicates().tolist() 
[10, 20, 30] 
>>> df['col3'].drop_duplicates().tolist() 
[100]

对于所有列：

import pandas as pd 

df = pd.read_csv('many_cols.csv') 

for col in df.columns: 
    print(col, df[col].drop_duplicates().tolist())

输出：

col1 [1, 2, 3] 
col2 [10, 20, 30] 
col3 [100]

来源

2016-02-05 20:06:25

我会用这个set()。

可以说csv文件是这样的，我们只需要来自第二列的唯一值。

foo,1,bar 
baz,2,foo 
red,3,blue 
git,3,foo

以下是可以实现此目的的代码。我只是打印出独特的值来测试它的工作。

import csv 

def parse_csv_file(rawCSVFile): 
    fileLineList = [] 

    with open(rawCSVFile, newline='') as csvfile: 
     reader = csv.reader(csvfile) 
     for row in reader: 
      fileLineList.append(row) 

    return fileLineList 

def main(): 
    uniqueColumnValues = set() 
    fileLineList = parse_csv_file('sample.csv') 

    for row in fileLineList: 
     uniqueColumnValues.add(row[1]) # Selecting 2nd column here. 

    print(uniqueColumnValues) 

if __name__ == '__main__': 
    main()

来源

2016-02-05 19:41:44 Igor

过于“聪明”的方式一次找出所有行唯一值（假设所有列的大小相同，但它忽略了无缝空行）：

# Assumes somefile was opened properly earlier 
csvin = filter(None, csv.reader(somefile)) 
for i, vals in enumerate(map(sorted, map(set, zip(*csvin)))): 
    print("Unique values for column", i) 
    print(vals)

它使用zip(*csvin)做一个表格旋转（每次将正常的一行一次转换为一列），然后使用set和（对于很好的输出）对每列进行分类。

来源

2016-02-05 19:47:41 ShadowRanger

是否有一个从python csv模块构建的方法来枚举特定列的所有可能的值？

回答

相关问题