2014-04-07 51 views
0

问题的简化示于例如:熊猫数据帧插入已计算行

在本质上我想在基于使用的值从2行跨越新行的计算现有行之间插入新的行。

在我的示例中,您可以看到我们插入了一行,它是行之前和之后的行的中点值。

我的目标是实际使用一个函数,该函数计算2个经度值之间的中点并插入该值。我认为这个简单的例子将演示所需的技术。如果我得到答案,我将包括经纬度的完整工作代码。

import pandas as pd 
import numpy as np 

def midpoint(x,y): 
    return (x+y)/2 

#we start with this 
pd.DataFrame(np.arange(2,10).reshape((4,2)),columns=['A','B']) 

    A B 
0 2 3 
1 4 5 
2 6 7 
3 8 9 

#want to get to this. 
pd.DataFrame(np.array([2,3,3,4,4,5,5,6,6,7,7,8,8,9]).reshape((7,2)),columns=['A','B']) 

    A B 
0 2 3 
1 3 4 
2 4 5 
3 5 6 
4 6 7 
5 7 8 
6 8 9 

Ok here is the example with the LatLons 

gp = pd.DataFrame(np.array([[25.7,-87.7],[26.3,-88.6],[27.2,-89.2],[28.2,-89.6]]),columns=['Latitude','Longitude']) 

    Latitude Longitude 
0  25.7  -87.7 
1  26.3  -88.6 
2  27.2  -89.2 
3  28.2  -89.6 

x = gp[['Latitude','Longitude']] 
y = gp[['Latitude','Longitude']].shift(periods=-1) 
foo = pd.merge(x, y , suffixes=['1','2'],left_index="True",right_index="True") 
#trim the last row as it has NaNs 
bar= foo[['Latitude1','Longitude1','Latitude2','Longitude2']][:-1] 
#calculate midpoint and stitch back to main data 
bar = bar.apply(midpoint, axis=1) 
fogazzi = np.vstack((gp[['Latitude','Longitude']].values,bar[['MidPointLatitude','MidPointLongitude']].values)) 
gp = pd.DataFrame(fogazzi,columns =['Latitude','Longitude']).sort(columns =['Latitude','Longitude']) 

    Latitude Longitude 
0 25.700000 -87.700000 
4 26.000696 -88.148851 
1 26.300000 -88.600000 
5 26.750316 -88.898812 
2 27.200000 -89.200000 
6 27.700144 -89.399084 
3 28.200000 -89.600000 

------------------------------------- 

def midpoint(cords): 
    lat1, lon1,lat2,lon2 = cords 
    assert -90 <= lat1 <= 90 
    assert -90 <= lat2 <= 90 
    assert -180 <= lon1 <= 180 
    assert -180 <= lon2 <= 180 
    lat1, lon1, lat2, lon2 = map(math.radians, (lat1, lon1, lat2, lon2)) 
    dlon = lon2 - lon1 
    dx = math.cos(lat2) * math.cos(dlon) 
    dy = math.cos(lat2) * math.sin(dlon) 
    lat3 = math.atan2(math.sin(lat1) + math.sin(lat2), math.sqrt((math.cos(lat1) + dx) * (math.cos(lat1) + dx) + dy * dy)) 
    lon3 = lon1 + math.atan2(dy, math.cos(lat1) + dx) 
    return pd.Series({'MidPointLatitude': math.degrees(lat3), 'MidPointLongitude': math.degrees(lon3)}) 

回答

0

你可以使用这样的合并:

In [54]: 

df = pd.DataFrame(np.arange(2,10).reshape((4,2)),columns=['A','B']) 
df 
Out[54]: 
    A B 
0 2 3 
1 4 5 
2 6 7 
3 8 9 

[4 rows x 2 columns] 
In [53]: 

(df + df.shift(periods=-1))/2 
Out[53]: 
    A B 
0 3 4 
1 5 6 
2 7 8 
3 NaN NaN 

[4 rows x 2 columns] 
In [59]: 

combined = df.merge((df + df.shift(periods=-1))/2, how='outer') 
combined.sort(columns=['A'],inplace=True) 
In [60]: 

combined 
Out[60]: 
    A B 
0 2 3 
4 3 4 
1 4 5 
5 5 6 
2 6 7 
6 7 8 
3 8 9 
7 NaN NaN 

[8 rows x 2 columns] 
0

说我们设定指标略有不同:

df = pd.DataFrame(np.arange(2,10).reshape((4,2)), index=range(0, 8, 2), columns=['A','B']) 

则:

res = pd.DataFrame(index=range(len(df) * 2 - 1)).join(df) 
res.interpolate() 
+0

感谢您的答案(1)和(2),并且都是基于我的例子。我将更新原始文章并输入经纬度中点的实际计算值 – Dickster