1

我想使用在熊猫的drop_duplicates功能:http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.drop_duplicates.html删除与熊猫重复(从BigQuery的)追踪误差

ssc_df = bq.Query(ssc_ciq_match).to_dataframe() 
ssc_df.drop_duplicates(ssc_df.ssc_ssc_key, keep = False) 

我得到这个错误:

ErrorTraceback (most recent call last) 
<ipython-input-9-3b85467271be> in <module>() 
----> 1 ssc_df.drop_duplicates(ssc_df.ssc_ssc_key, keep = False) 

/usr/local/lib/python2.7/dist-packages/pandas/util/decorators.pyc in wrapper(*args, **kwargs) 
    89     else: 
    90      kwargs[new_arg_name] = new_arg_value 
---> 91    return func(*args, **kwargs) 
    92   return wrapper 
    93  return _deprecate_kwarg 

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in drop_duplicates(self, subset, keep, inplace) 
    3136   deduplicated : DataFrame 
    3137   """ 
-> 3138   duplicated = self.duplicated(subset, keep=keep) 
    3139 
    3140   if inplace: 

/usr/local/lib/python2.7/dist-packages/pandas/util/decorators.pyc in wrapper(*args, **kwargs) 
    89     else: 
    90      kwargs[new_arg_name] = new_arg_value 
---> 91    return func(*args, **kwargs) 
    92   return wrapper 
    93  return _deprecate_kwarg 

回答

0

我最初的想法是第一个参数(subset)应该是一个字符串或字符串列表。请你可以尝试以下方法吗?

ssc_df = bq.Query(ssc_ciq_match).to_dataframe() 
ssc_df.drop_duplicates('ssc_ssc_key', keep = False) 

如果这不能解决您的问题,请提供完整的堆栈跟踪?该问题仅包含部分堆栈跟踪。