2014-07-24 47 views
2

我想一个数据帧从大熊猫出口的原因练成这样做:熊猫数据框中导出到Excel类型错误

writer = pd.io.excel.ExcelWriter(args.out_file, engine='xlsxwriter', options={'constant_memory': True}) 
summary_data.to_excel(writer, sheet_name='summary', na_rep='NA', index=False) 

但我得到这一信息:

"cannot convert the series to {0}".format(str(converter))) 
TypeError: cannot convert the series to <type 'float'> 

有什么错我的数据框,所以我对这个错误信息有点困惑,它在数据帧包含少于1000行时发生,但一旦它变得更大,发生此错误

任何想法?

感谢

更新summary_data.info()

<class 'pandas.core.frame.DataFrame'> 
Int64Index: 2176 entries, 0 to 2175 
Data columns (total 27 columns): 
chrom         2176 non-null object 
coord         2176 non-null int64 
ref_base        2176 non-null object 
var_base        2176 non-null object 
normal_ref_counts      2176 non-null int64 
normal_var_counts      2176 non-null int64 
VOA867-A1_S43_merged_ref_counts   2176 non-null object 
VOA867-A1_S43_merged_var_counts   2176 non-null object 
VOA867-A1_S43_merged_somatic_status  2176 non-null object 
VOA867-E02_S73_merged_ref_counts  2176 non-null object 
VOA867-E02_S73_merged_var_counts  2176 non-null object 
VOA867-E02_S73_merged_somatic_status 2176 non-null object 
VOA867-F03_S76_merged_ref_counts  2176 non-null object 
VOA867-F03_S76_merged_var_counts  2176 non-null object 
VOA867-F03_S76_merged_somatic_status 2176 non-null object 
VOA867-F04_S75_merged_ref_counts  2176 non-null object 
VOA867-F04_S75_merged_var_counts  2176 non-null object 
VOA867-F04_S75_merged_somatic_status 2176 non-null object 
VOA867-F09_S74_merged_ref_counts  2176 non-null object 
VOA867-F09_S74_merged_var_counts  2176 non-null object 
VOA867-F09_S74_merged_somatic_status 2176 non-null object 
VOA867-T_S41_merged_ref_counts   2176 non-null object 
VOA867-T_S41_merged_var_counts   2176 non-null object 
VOA867-T_S41_merged_somatic_status  2176 non-null object 
VOA867xeno_S18_merged_ref_counts  2176 non-null object 
VOA867xeno_S18_merged_var_counts  2176 non-null object 
VOA867xeno_S18_merged_somatic_status 2176 non-null object 
dtypes: int64(3), object(24)None 

这里是产生它

def get_summary_data(data, normal_sample): 
    summary_data = [] 
    for index, normal_row in data[normal_sample].iterrows(): 
     out_row = {'chrom': index[0], 
        'coord': index[1], 
        'ref_base': normal_row['ref_base'], 
        'var_base': normal_row['var_base'], 
        'normal_ref_counts': normal_row['ref_counts'], 
        'normal_var_counts': normal_row['var_counts'], 
        } 

     normal_variant_status = normal_row['variant_status'] 

     normal_depth = out_row['normal_ref_counts'] + out_row['normal_var_counts'] 

     if normal_depth > 0: 
      normal_var_freq = out_row['normal_var_counts']/normal_depth 
     else: 
      normal_var_freq = 0 

     for sample in data: 
      if sample == normal_sample: 
       continue 

      sample_row = data[sample].ix[[index]] 

      out_row['{0}_ref_counts'.format(sample)] = sample_row['ref_counts'] 

      out_row['{0}_var_counts'.format(sample)] = sample_row['var_counts'] 

      sample_variant_status = str(sample_row['variant_status'].iget(0)) 

      sample_somatic_status = call_somatic_status(normal_variant_status, 
                 sample_variant_status, 
                 normal_var_freq, 
                 args.min_normal_germline_var_freq) 

      out_row['{0}_somatic_status'.format(sample)] = sample_somatic_status 

     summary_data.append(out_row) 

    columns = ['chrom', 'coord', 'ref_base', 'var_base', 'normal_ref_counts', 'normal_var_counts'] 

    for sample in data: 
     if sample == normal_sample: 
      continue 

     columns.append('{0}_ref_counts'.format(sample)) 

     columns.append('{0}_var_counts'.format(sample)) 

     columns.append('{0}_somatic_status'.format(sample)) 

    summary_data = pd.DataFrame(summary_data, columns=columns) 

    return summary_data 

计数功能应该是INT,但我可以看到它在这里被认为是字符串,可能是因为它是从另一个数据框提取的?

+0

show''df.info()'' – Jeff

+0

你应该只有对象类型的''object'' dtypes。你是如何生成数据的? – Jeff

+0

是正确的,但我可以看到计数有可疑的对象dtypes – Rad

回答

0

.to_excel只接受类型为object的列。快速的方式来解决,这是迫使所有列的写作前的对象类型:

summary_data = summary_data.astype(object) 

然后,你可以把它写不崩溃:

summary_data.to_excel(writer, sheet_name='summary', na_rep='NA', index=False) 

有一些改写(munging)做这里在某些情况下,我必须将列复制为对象类型。奇怪的。另一种选择是删除问题的列。