2017-07-09 28 views
1

我的代码从网站中获取信息并将其放入数据框中。但我不知道为什么代码的顺序会导致错误:AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas在更换熊猫数据框中的值时发生str错误

基本上,刮的数据有20多行和10列。

  • 某些数值位于括号内ie: (2,333),我想将其更改为:-2333
  • 一些值有话n.a,我想将其更改为numpy.nan
  • 某些值-,我想将其更改为numpy.nan了。

不起作用

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])): 

# Replacing necessary items for final clean up 

    final_df.replace('-', numpy.nan, inplace=True) 
    final_df.replace('n.a.', numpy.nan, inplace=True) 

    for i in final_df.columns: 
     final_df[i] = final_df[i].str.replace(')', '') 
     final_df[i] = final_df[i].str.replace(',', '') 
     final_df[i] = final_df[i].str.replace('(', '-') 

    # Appending Code to dataframe 
    final_df = final_df.T 
    final_df.insert(loc=0, column='Code', value=some_code) 

# This produces the error - AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas 

作品

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])): 

# Replacing necessary items for final clean up 

    for i in final_df.columns: 
     final_df[i] = final_df[i].str.replace(')', '') 
     final_df[i] = final_df[i].str.replace(',', '') 
     final_df[i] = final_df[i].str.replace('(', '-') 

    final_df.replace('-', numpy.nan, inplace=True) 
    final_df.replace('n.a.', numpy.nan, inplace=True) 

    # Appending Code to dataframe 
    final_df = final_df.T 
    final_df.insert(loc=0, column='Code', value=some_code) 

# This doesn't give me any errors and returns me what I want. 

,为什么出现这种情况有什么想法?

+0

这是不可复制的任何数据框,你可以给一个数据的例子吗? – PRMoureu

回答

1

对我的作品双replace - 先用regex=True用于替换子和第二的所有值:

np.random.seed(23) 
df = pd.DataFrame(np.random.choice(['(2,333)','n.a.','-',2.34], size=(3,3)), 
        columns=list('ABC')) 
print (df) 
     A  B  C 
0 2.34  - (2,333) 
1 n.a.  - (2,333) 
2 2.34 n.a. (2,333) 

df1 = df.replace(['\(','\)','\,'], ['-','',''], regex=True).replace(['-','n.a.'], np.nan) 
print(df1) 
     A B  C 
0 2.34 NaN -2333 
1 NaN NaN -2333 
2 2.34 NaN -2333 

df1 = df.replace(['-','n.a.'], np.nan).replace(['\(','\)','\,'], ['-','',''], regex=True) 
print(df1) 
     A B  C 
0 2.34 NaN -2333 
1 NaN NaN -2333 
2 2.34 NaN -2333 

编辑:

你的错误意味着你要更换一些非字符串列(例如,所有列NaN S IN B列)由str.replace

df1 = df.apply(lambda x: x.str.replace('\(','-').str.replace('\)','') 
          .str.replace(',','')).replace(['-','n.a.'], np.nan) 
print(df1) 
     A B  C 
0 2.34 NaN -2333 
1 NaN NaN -2333 
2 2.34 NaN -2333 

df1 = df.replace(['-','n.a.'], np.nan) 
     .apply(lambda x: x.str.replace('\(','-') 
         .str.replace('\)','') 
         .str.replace(',','')) 
print(df1) 

AttributeError: ('Can only use .str accessor with string values, which use np.object_ dtype in pandas', 'occurred at index B')

Bdtypefloat64

df1 = df.replace(['-','n.a.'], np.nan) 
print(df1) 
     A B  C 
0 2.34 NaN (2,333) 
1 NaN NaN (2,333) 
2 2.34 NaN (2,333) 

print (df1.dtypes) 
A  object 
B float64 
C  object 
dtype: object