我有以下代码:大熊猫:错误时回路在给定的大熊猫行
df_boundry = df_in.dropna().quantile([0.0, .8])
for row in df_in.iterrows():
for column in row:
if row[column] > df_boundry[column][0.8]:
row[column] = df_boundry[column][0.8]
基本上,每一个给定的行(记录),我们检查每个列的值。如果该值超过80百分位,我们将其替换为80-百分值。但是我在上面的代码中的错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-67-81b2be77cc8a> in <module>()
4 for row in df_in.iterrows():
5 for column in row:
----> 6 if row[column] > df_boundry[column][0.8]:
7 row[column] = df_boundry[column][0.8]
8
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
1995 return self._getitem_multilevel(key)
1996 else:
-> 1997 return self._getitem_column(key)
1998
1999 def _getitem_column(self, key):
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
2002 # get column
2003 if self.columns.is_unique:
-> 2004 return self._get_item_cache(key)
2005
2006 # duplicate columns & possible reduce dimensionality
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
1348 res = cache.get(item)
1349 if res is None:
-> 1350 values = self._data.get(item)
1351 res = self._box_item_values(item, values)
1352 cache[item] = res
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
3288
3289 if not isnull(item):
-> 3290 loc = self.items.get_loc(item)
3291 else:
3292 indexer = np.arange(len(self.items))[isnull(self.items)]
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
1945 return self._engine.get_loc(key)
1946 except KeyError:
-> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key))
1948
1949 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()
KeyError: 0
这里是df_in一些示例数据:
column_A | column_B | column_C
--------------------------------
0.5 | 0.5 | NaN
1.2 | NaN | NaN
NaN | 8.1 | 21.1
9.1 | 9.3 | 2.1
4.5 | 90.1 | 1.4
112.3 | 79.2 | 1.1
:
:
和df_boundry:
| column_A | column_B | column_C
----------------------------------------
0.0 | 0.1 | 0.4 | 0.0
0.8 | 110.4 | 80.1 | 20.5
为样本数据应该是预期的成果
column_A | column_B | column_C
--------------------------------
0.5 | 0.5 | NaN
1.2 | NaN | NaN
NaN | 8.1 | 20.5
9.1 | 9.3 | 2.1
4.5 | 80.1 | 1.4
110.4 | 79.2 | 1.1
:
:
即只有当单元格值> df_boundry [column] [0.8]时,我们用df_boundry [column] [0.8]代替它。
有没有人知道我在这里错过了什么?谢谢!
你能发布一个样本数据集(5-7行)吗? – MaxU
只要你明白错误,df_in.iterrows()就会返回一个(index,row)的元组。你可以通过在df_in.iterrows()中执行'idx,row'来解决这个问题,但即使在你这样做之后,row也是一个系列,所以'for行中的列'实际上是返回行中的每个值。尝试在循环中打印一些变量以进一步探索它。 – shawnheide