NaNs问题：set_index（）。reset_index（）破坏数据

我读过NaN有问题，但以下原因导致我的数据实际损坏，而不是错误。这是一个错误？我在文档中遗漏了一些基本的东西吗？我想第二命令给出错误或以得到作为第一命令的响应相同：NaNs问题：set_index（）。reset_index（）破坏数据

ipdb> df 
    year PRuid QC  data 
18 2007 nonQC 0 8.014261 
19 2008 nonQC 0 7.859152 
20 2010 nonQC 0 7.468260 
21 1985  10 NaN 0.861403 
22 1985  11 NaN 0.878531 
23 1985  12 NaN 0.842704 
24 1985  13 NaN 0.785877 
25 1985  24 1 0.730625 
26 1985  35 NaN 0.816686 
27 1985  46 NaN 0.819271 
28 1985  47 NaN 0.807050 
ipdb> df.set_index(['year','PRuid','QC']).reset_index() 
    year PRuid QC  data 
0 2007 nonQC 0 8.014261 
1 2008 nonQC 0 7.859152 
2 2010 nonQC 0 7.468260 
3 1985  10 1 0.861403 
4 1985  11 1 0.878531 
5 1985  12 1 0.842704 
6 1985  13 1 0.785877 
7 1985  24 1 0.730625 
8 1985  35 1 0.816686 
9 1985  46 1 0.819271 
10 1985  47 1 0.807050

“QC”的值实际上是从NaN的改变为1，其中它应该是为NaN。

顺便说一句，对于我添加了“.reset_index（）”，但数据损坏是由set_index引入的。

而在此情况下，有趣的是，该版本是：

pd.version 
<module 'pandas.version' from '/usr/lib/python2.6/site-packages/pandas-0.10.1-py2.6-linux-x86_64.egg/pandas/version.pyc'>

来源

2013-05-12 CPBL

Nan值的索引声音越野车。我在0.11上，set_index显示QC指标级别上的NaN值。但是查看reset_index源代码显示'self.index.labels'和'self.index.levels'不会返回正确的NaN值。我会建议你向熊猫团队提交一个错误。 – Boud 2013-05-12 20:21:19

谢谢。好的，我在https://github.com/pydata/pandas/issues/3586 – CPBL 2013-05-13 10:52:49

提交了一个，并通过这个修复：https：//github.com/pydata/pandas/pull/3587，是一个错误，谢谢！ – Jeff 2013-05-13 18:09:36

因此，这是一个错误。到2013年5月底，熊猫0.11.1应该与bug修复一起发布（参见关于这个问题的评论）。同时，我避免在任何multiindex中使用带有NaN的值，例如在'QC'列中为NaN使用其他标志值（-99）。

来源

2013-05-13 21:45:32 CPBL

NaNs问题：set_index（）。reset_index（）破坏数据

回答

相关问题