掉落在的条件

我想解决以下问题：我有一个数据帧。对于其中一列，我有NAN和数字，以随机方式分发。我想根据此列删除行。我的标准是：如果上面的这条线和下面的线都有NAN值，那么我就放弃这条线。否则，我保留在我的数据框中。掉落在的条件

这是我设法得到的，但我很肯定这是错误的...任何帮助非常感谢！

i=0 
while i <= 500: 
    if (np.isnan(df.iloc[i+1]['column1'])) & (np.isnan(df.iloc[i-1]['column1'])): 
     df2[i] = df.drop(df[i])

来源

2017-08-14 python_newbie

是否要删除所有的NaN和那个值或只是那一行并保留NaN？无论如何，你是否会放弃所有的NaN？ –

我不想删除所有的NaN。我合并了两个数据集，我只关注属于数据集2的值以及数据集1中的值，它们在数据集2之前和之后立即出现。PS：数据集具有不同的列，因此此是所有NaNs的原因。 –

创建采样数据：

np.random.seed(0) 
df= pd.DataFrame({'column1': np.random.randn(10)}) 
df.iloc[[2, 4, 7], 0] = np.nan 
>>> df 
    column1 
0 1.764052 
1 0.400157 
2  NaN 
3 2.240893 # <<< Drop. 
4  NaN 
5 -0.977278 
6 0.950088 
7  NaN 
8 -0.103219 
9 0.410599

应用过滤器。

>>> df[~((df['column1'].shift(1).isnull()) & (df['column1'].shift(-1).isnull()))] 
    column1 
0 1.764052 
1 0.400157 
2  NaN 
4  NaN 
5 -0.977278 
6 0.950088 
7  NaN 
8 -0.103219 
9 0.410599

来源

2017-08-14 15:35:27 Alexander

不确定没有看到数据。我的猜测是'NaN'可能是文本值而不是numpy'NaN'。请注意，如果您有三个连续的"NNN“行，则会根据您的要求放弃中间一行。 – Alexander

嗨，亚历克斯，我发现了这个问题。我在逻辑条件内需要一个额外的条件。最后，这是最后的修正：df1 = df [〜（（df ['col1']。shift（1）.isnull（））＆（df ['col1']。shift（-1）.isnull ））＆（df ['col1']。isnull（）））] –

因此，您的条件是删除上面的那个是NaN的行，下面的那个是NaN，而行本身不是NaN。 – Alexander

示例数据：

my_df = pd.DataFrame({ 
    "col1":[5.43,np.nan, np.nan, 0.5, 0.4, 0.5, np.nan, 0.1, np.nan, 0.33] 
})

您可以创建移动列，避免循环您的数据集。

my_df['forward_shift'] = my_df.col1.shift(periods=1) 
my_df['backward_shift'] = my_df.col1.shift(periods=-1) 

out = my_df[-(np.isnan(my_df.forward_shift) & np.isnan(my_df.backward_shift))] 
out['col1'].reset_index(drop=True) 

0 NaN 
1 NaN 
2 0.5 
3 0.4 
4 0.5 
5 NaN 
6 NaN 
Name: col1, dtype: float64

来源

2017-08-14 15:37:34 gobrewers14

我尝试使用你的方法，但我得到了以下错误：{TypeError：ufunc'isnan'不支持输入类型，并且输入无法安全地强制转换为任何受支持的类型根据转换规则''安全'' 。 }我认为这是因为我所有的数据帧类型都是对象。我不知道如何解决这个问题... –

@LauraSimonsenLeal你可以试试'df ['col1'] = df ['col1']。astype（np.float32）''。这应该将其从一个对象改为float32。 – gobrewers14

掉落在的条件

回答

相关问题