2016-12-09 127 views
0

我有一个包含2列(时间和压力)的数据框。df.value.apply返回NaN

timestep value 
    0 393 
    1 389 
    2 402 
    3 408 
    4 413 
    5 463 
    6 471 
    7 488 
    8 422 
    9 404 
    10 370 

我首先需要找到每个压力值的频率和排名df['freq_rank']工作正常,但是当我试图通过列对计数值比较&找到区间的差异掩盖数据框,我得到NaN的结果..

import numpy as np 
import pandas as pd 
from matplotlib.pylab import * 
import re 
import pylab 
from pylab import * 
import datetime 
from scipy import stats 
import matplotlib.pyplot 

df = pd.read_csv('copy.csv') 
dataset = np.loadtxt(df, delimiter=";") 
df.columns = ["Timestamp", "Pressure"] 

## Timestep as int 
df = pd.DataFrame({'timestep':np.arange(3284), 'value': df.Pressure}) 

## Rank of the frequency of each value in the df 
vcs = {v: i for i, v in enumerate(df.value.value_counts().index)} 
df['freq_rank'] = df.value.apply(vcs.get) 
print(df.freq_rank) 


>>Output: 
>>0 131 
>>1 235 
>>2  99 
>>3  99 
>>4 101 
>>5 101 
>>6 131 
>>7  79 
>>8  79 



## Find most frequent value 
count = df['value'].value_counts().sort_values(ascending=[False]).nlargest(10).index.values[0] 

## Mask the DF by comparing the column against count value & find interval diff. 
x = df.loc[df['value'] == count, 'timestep'].diff() 
print(x) 

>>Output: 
>>50  1.0 
>>112  62.0 
>>215  103.0 
>>265  50.0 
>>276  11.0 
>>277  1.0 
>>278  1.0 
>>318  40.0 
>>366  48.0 
>>367  1.0 
>>368  1.0 
>>372  4.0 

df['freq'] = df.value.apply(x.get) 
print(df.freq) 

>>Output: 
>>0 NaN 
>>1 NaN 
>>2 NaN 
>>3 NaN 
>>4 NaN 
>>5 NaN 
>>6 NaN 
>>7 NaN 
>>8 NaN 

我不明白为什么print(x)返回正确的输出和print(df['freq'])返回NaN。

+1

请问您可以创建[mcve](http://stackoverflow.com/help/mcve)吗?请参阅[如何创建可重现的熊猫示例](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) –

+0

您需要进一步了解什么信息?我还包括了一段我的数据框。 – joasa

回答

1

我觉得你的问题是与最后一个语句df['freq'] = df.value.apply(x.get)

如果你只是想给x复制到新列df['freq']你可以:

df['freq'] = x

然后print(df.freq)会给你与您的print(x)声明相同的结果。


更新: 您的问题是与indicies。 df只有索引值为0-10,其中x有50,112,215 ... 当分配给df时,只添加具有现有索引的值。

+0

我试过了。即使我做'df ['freq'] = x',当我尝试'print(df)'或'print(df.freq)'时,我仍然可以看到NaN值 – joasa

+0

'print(x)'给你什么? – wonderkid2

+0

你可以在问题 – joasa