1
我有一个数据框,每个组ID有+ - 100行。我想对组ID进行分组,然后只保留一列的标准差低于阈值的组。我用下面的代码熊猫:如何选择组内标准偏差小的组?
# df is the dataframe with all rows
# group on groupID
df_grouped = df.groupby('groupID')
# this gives a table with groupID and the std within a group
df_grouped_std = df_grouped.std()
# from the df with standard deviations, I select only the groups
# where the standard deviation is withing limits
selection = df_grouped_std[df_grouped_std['col1']<1][df_grouped_std['col2']<0.05]
# now I try to select from the original dataframe 'df_grouped' the groups that were selected in the previous step.
df_plot = df_grouped[selection]
堆栈跟踪:
Traceback (most recent call last):
File "<ipython-input-72-2cd045ecb262>", line 1, in <module>
runfile('C:/Documents and Settings/a708818/Desktop/coloredByRol.py', wdir='C:/Documents and Settings/a708818/Desktop')
File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Documents and Settings/a708818/Desktop/coloredByRol.py", line 50, in <module>
df_plot = df_grouped[selection]
File "C:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 3170, in __getitem__
if key not in self.obj:
File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 688, in __contains__
return key in self._info_axis
File "C:\Anaconda\lib\site-packages\pandas\core\index.py", line 885, in __contains__
hash(key)
File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 647, in __hash__
' hashed'.format(self.__class__.__name__))
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashedus they cannot be hashed
我无法弄清楚如何选择我想要的数据。任何提示?
使用过滤器的解决方案看起来更清洁。谢谢! – marqram