我有一个包含2列(时间和压力)的数据框。df.value.apply返回NaN
timestep value
0 393
1 389
2 402
3 408
4 413
5 463
6 471
7 488
8 422
9 404
10 370
我首先需要找到每个压力值的频率和排名df['freq_rank']
工作正常,但是当我试图通过列对计数值比较&找到区间的差异掩盖数据框,我得到NaN的结果..
import numpy as np
import pandas as pd
from matplotlib.pylab import *
import re
import pylab
from pylab import *
import datetime
from scipy import stats
import matplotlib.pyplot
df = pd.read_csv('copy.csv')
dataset = np.loadtxt(df, delimiter=";")
df.columns = ["Timestamp", "Pressure"]
## Timestep as int
df = pd.DataFrame({'timestep':np.arange(3284), 'value': df.Pressure})
## Rank of the frequency of each value in the df
vcs = {v: i for i, v in enumerate(df.value.value_counts().index)}
df['freq_rank'] = df.value.apply(vcs.get)
print(df.freq_rank)
>>Output:
>>0 131
>>1 235
>>2 99
>>3 99
>>4 101
>>5 101
>>6 131
>>7 79
>>8 79
## Find most frequent value
count = df['value'].value_counts().sort_values(ascending=[False]).nlargest(10).index.values[0]
## Mask the DF by comparing the column against count value & find interval diff.
x = df.loc[df['value'] == count, 'timestep'].diff()
print(x)
>>Output:
>>50 1.0
>>112 62.0
>>215 103.0
>>265 50.0
>>276 11.0
>>277 1.0
>>278 1.0
>>318 40.0
>>366 48.0
>>367 1.0
>>368 1.0
>>372 4.0
df['freq'] = df.value.apply(x.get)
print(df.freq)
>>Output:
>>0 NaN
>>1 NaN
>>2 NaN
>>3 NaN
>>4 NaN
>>5 NaN
>>6 NaN
>>7 NaN
>>8 NaN
我不明白为什么print(x)
返回正确的输出和print(df['freq'])
返回NaN。
请问您可以创建[mcve](http://stackoverflow.com/help/mcve)吗?请参阅[如何创建可重现的熊猫示例](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) –
您需要进一步了解什么信息?我还包括了一段我的数据框。 – joasa