2016-05-20 56 views
1

以下代码会导致Pandas引发ValueError。我不知道为什么使用正常列表工作正常。当尝试使用列名称列表时,熊猫引发ValueError

fileFields = [str(input("Please enter the column name for the pedigree field in 
        your request file.\n")), 
       str(input("Please enter the column name for the pedigree field 
        in the Tissue Library file.\n")), 
       str(input("Please enter the column name for the sourceID field 
        in the Tissue Library file.\n")), 
       str(input("Please enter the column name for the pedigree field in 
        the Gold Standard file.\n")), 
       str(input("Please enter the column name for the sourceID field in 
        the Gold Standard file.\n"))] 

dfRequests = pd.read_csv(fileInputs[0], skipinitialspace=True, 
         usecols=fileFields[0]) 
dfTissueLibrary = pd.read_csv(fileInputs[1], skipinitialspace=True, 
           usecols=fileFields[1:2]) 
dfGoldStandard = pd.read_csv(fileInputs[2], skipinitialspace=True, 
          usecols=fileFields[3:4]) 

结果:

Traceback (most recent call last): 
    File "filepathway hidden for security", line 74, in <module> 
    usecols=fileFields[0]) 
    File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 529, in parser_f 
    return _read(filepath_or_buffer, kwds) 
    File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 295, in _read 
    parser = TextFileReader(filepath_or_buffer, **kwds) 
    File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 612, in __init__ 
    self._make_engine(self.engine) 
    File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 747, in _make_engine 
    self._engine = CParserWrapper(self.f, **self.options) 
    File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1154, in __init__ 
    col_indices.append(self.names.index(u)) 
ValueError: 'd' is not in list 

我觉得好像大熊猫正在从fileFields列表中的每个指标的字符串,并把他们变成字符串列表。我试图通过在调用它们之后创建索引字符串列表来解决这个问题,但那不起作用。有什么建议么?

+1

'fieldField [0]'返回一个字符串(输入的第一列),所以'd'可能是第一列的第一个字符,对吧?如果是这样,请设置'usecols = fieldFields'。 – miraculixx

回答

1

有什么建议吗?

我的方法是如下,使工艺简单,使用安全小助手功能:

def selective_read_csv(purpose, path): 
    # read just the header row and get the column names 
    columns = list(pd.read_csv(path, nrows=1).columns.values) 
    df = None 
    while df is None: 
     # present user with a selection of actual columns, taking 
     # out the guess work 
     file_fields = raw_input("[%s] Enter columns as a comma-separated list %s " % (purpose, columns)) 
     try: 
      df = pd.read_csv(path, usecols=file_fields.split(',')) 
     except ValueError as e: 
      print "Sorry, %s" % e 
      df = None 
    return df 
df = selective_read_csv('requests file', '/tmp/data.csv') 

这样,用户被提示,实际上是在文件的错误输入是列很好地处理:

[requests file] Enter columns as a comma-spearated list [u'a', u'b'] aaa 
Sorry, 'aaa' is not in list 
[requests file] Enter columns as a comma-spearated list [u'a', u'b'] 

然后调用这个函数为每个文件类型,例如:

dfRequests = selective_read_csv('requests file', fileInputs[0]) 
dfTissueLibrary = selective_read_csv('tissue library', fileInputs[1]) 
dfGoldStandard = selective_read_csv('gold standard', fileInputs[2])