2017-07-12 24 views
2

我有一个熊猫数据框my_df有以下栏目:从大熊猫到数据帧的元组(为半正矢模块)

id lat1 lon1 lat2 lon2 
1 45 0 41 3 
2 40 1 42 4 
3 42 2 37 1 

基本上,我想做到以下几点:

import haversine 

haversine.haversine((45, 0), (41, 3)) # just to show syntax of haversine() 
> 507.20410687342115 

# what I'd like to do 
my_df["dist"] = haversine.haversine((my_df["lat1"], my_df["lon1"]),(my_df["lat2"], my_df["lon2"])) 

TypeError: cannot convert the series to < class 'float' >

使用this,我尝试了以下方法:

my_df['dist'] = haversine.haversine(
     list(zip(*[my_df[['lat1','lon1']][c].values.tolist() for c in my_df[['lat1','lon1']]])) 
     , 
     list(zip(*[my_df[['lat2','lon2']][c].values.tolist() for c in my_df[['lat2','lon2']]])) 
     ) 

File "blabla\lib\site-packages\haversine__init__.py", line 20, in haversine lat1, lng1 = point1

ValueError: too many values to unpack (expected 2)

任何想法我做错了/我如何能达到我想要的?

+0

可能的欺骗:https://stackoverflow.com/questions/25767596/vectorised-haversine-formula-with-a-pandas-dataframe – EdChum

回答

2

使用applyaxis=1

my_df["dist"] = my_df.apply(lambda row : haversine.haversine((row["lat1"], row["lon1"]),(row["lat2"], row["lon2"])), axis=1) 

要呼叫在每行上的半正矢函数,该函数理解标量值,而不是像数组值因此错误。通过调用applyaxis=1,可以逐行迭代,以便我们可以访问每个列值并以该方法期望的形式传递这些值。

此外,我不知道两者的区别是什么,但有haversine公式的矢量化version

2

怎么样使用矢量方法:

import pandas as pd 

# vectorized haversine function 
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371): 
    """ 
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002 

    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees or in radians) 

    All (lat, lon) coordinates must have numeric dtypes and be of equal length. 

    """ 
    if to_radians: 
     lat1, lon1, lat2, lon2 = pd.np.radians([lat1, lon1, lat2, lon2]) 

    a = pd.np.sin((lat2-lat1)/2.0)**2 + \ 
     pd.np.cos(lat1) * pd.np.cos(lat2) * pd.np.sin((lon2-lon1)/2.0)**2 

    return earth_radius * 2 * pd.np.arcsin(np.sqrt(a)) 

演示:

In [38]: df 
Out[38]: 
    id lat1 lon1 lat2 lon2 
0 1 45  0 41  3 
1 2 40  1 42  4 
2 3 42  2 37  1 

In [39]: df['dist'] = haversine(df.lat1, df.lon1, df.lat2, df.lon2) 

In [40]: df 
Out[40]: 
    id lat1 lon1 lat2 lon2  dist 
0 1 45  0 41  3 507.204107 
1 2 40  1 42  4 335.876312 
2 3 42  2 37  1 562.543582 
+0

'AttributeError:'numpy.float64'对象没有属性'radians'' :( –

+0

@fmalaussena,确保你没有用'float64'这个可变的名字覆盖'np' - 'numpy'的别名。如果你不使用“古典”numpy别名'np',那么你可以使用'numpy.radians'或'pd.np.radians' – MaxU