2017-08-10 38 views
0

这里分组阵列设置我的数据:想要绘制已由两个变量

dropoff_latitude dropoff_longitude 
(40.6, 40.65]  (-74.03, -73.98]  1364 
        (-73.98, -73.93]  2123 
        (-73.93, -73.88]  368 
        (-73.88, -73.83]   20 
        (-73.83, -73.78]  9564 
(40.65, 40.7]  (-74.03, -73.98]  18629 
        (-73.98, -73.93]  22453 
        (-73.93, -73.88]  4343 
        (-73.88, -73.83]  1027 
        (-73.83, -73.78]  2170 
(40.7, 40.75]  (-74.03, -73.98]  443893 
        (-73.98, -73.93]  84331 
        (-73.93, -73.88]  9658 
        (-73.88, -73.83]  4700 
        (-73.83, -73.78]  1756 
(40.75, 40.8]  (-74.03, -73.98]  249840 
        (-73.98, -73.93]  486286 
        (-73.93, -73.88]  15424 
        (-73.88, -73.83]  18957 
        (-73.83, -73.78]  911 
(40.8, 40.85]  (-74.03, -73.98]   34 
        (-73.98, -73.93]  49718 
        (-73.93, -73.88]  4283 
        (-73.88, -73.83]  1070 
        (-73.83, -73.78]  218 
(40.85, 40.9]  (-74.03, -73.98]   52 
        (-73.98, -73.93]  2295 
        (-73.93, -73.88]  4427 
        (-73.88, -73.83]  1020 
        (-73.83, -73.78]  132 

因此,数据可视化绝对不是我的强项。我努力想出一个方法来正确地绘制这个。为了让你明白我在尝试什么,我想要一个如上表所示的网格断开,并且网格中的每个部分都被加阴影以对应特定的音量。

我试着玩seaborn的热图法,但没有运气。我需要重新格式化我的数据吗?

回答

2

如果分别使用纬度和经度作为数据框索引和列名称,可能会更容易些。

import numpy as np 
import pandas as pd 
import seaborn as sns 

# sample data 
dropoff_latitude = ["(40.6, 40.65]", "(40.65, 40.7]", "(40.7, 40.75]", 
        "(40.75, 40.8]", "(40.8, 40.85]", "(40.85, 40.9]"] 

dropoff_longitude = ["(-74.03, -73.98]", "(-73.98, -73.93]", "(-73.93, -73.88]", 
        "(-73.88, -73.83]", "(-73.83, -73.78]"] 

values = np.array([1364, 2123, 368, 20, 9564, 18629, 22453, 
        4343, 1027, 2170, 443893, 84331, 9658, 4700, 
        1756, 249840, 486286, 15424, 18957, 911, 34, 
        49718, 4283, 1070, 218, 53, 2295, 4427, 1020, 132]) 
values = values.reshape(6,5) 

df = pd.DataFrame(values, index=dropoff_latitude, columns=dropoff_longitude) 

print(df) 
       (-74.03, -73.98] (-73.98, -73.93] (-73.93, -73.88] \ 
(40.6, 40.65]    1364    2123    368 
(40.65, 40.7]    18629    22453    4343 
(40.7, 40.75]   443893    84331    9658 
(40.75, 40.8]   249840   486286    15424 
(40.8, 40.85]    34    49718    4283 
(40.85, 40.9]    53    2295    4427 

       (-73.88, -73.83] (-73.83, -73.78] 
(40.6, 40.65]    20    9564 
(40.65, 40.7]    1027    2170 
(40.7, 40.75]    4700    1756 
(40.75, 40.8]    18957    911 
(40.8, 40.85]    1070    218 
(40.85, 40.9]    1020    132 

现在你可以使用Seaborn的heatmap()

sns.heatmap(df) 

heatmap

UPDATE(每评论):

从当前组织到一个我的方式去推荐是可能的。首先,我们要复制您提供的样本多指标的数据帧,使用上面定义的变量:

lat_lon = [(lat, lon) for lat in dropoff_latitude for lon in dropoff_longitude] 
lat, lon = zip(*lat_lon) 

data = {'dropoff_latitude':lat, 
     'dropoff_longitude':lon, 
     'values':values} 
df2 = pd.DataFrame(data).set_index(['dropoff_latitude','dropoff_longitude']) 

df2现在是一样的OP数据帧:

        values 
dropoff_latitude dropoff_longitude   
(40.6, 40.65] (-74.03, -73.98]  1364 
       (-73.98, -73.93]  2123 
       (-73.93, -73.88]  368 
       (-73.88, -73.83]  20 
       (-73.83, -73.78]  9564 
(40.65, 40.7] (-74.03, -73.98] 18629 
       (-73.98, -73.93] 22453 
       (-73.93, -73.88]  4343 
       (-73.88, -73.83]  1027 
       (-73.83, -73.78]  2170 
(40.7, 40.75] (-74.03, -73.98] 443893 
       (-73.98, -73.93] 84331 
       (-73.93, -73.88]  9658 
       (-73.88, -73.83]  4700 
       (-73.83, -73.78]  1756 
(40.75, 40.8] (-74.03, -73.98] 249840 
       (-73.98, -73.93] 486286 
       (-73.93, -73.88] 15424 
       (-73.88, -73.83] 18957 
       (-73.83, -73.78]  911 
(40.8, 40.85] (-74.03, -73.98]  34 
       (-73.98, -73.93] 49718 
       (-73.93, -73.88]  4283 
       (-73.88, -73.83]  1070 
       (-73.83, -73.78]  218 
(40.85, 40.9] (-74.03, -73.98]  53 
       (-73.98, -73.93]  2295 
       (-73.93, -73.88]  4427 
       (-73.88, -73.83]  1020 
       (-73.83, -73.78]  132 

下一页,重置指数回列,并从行条目pivot经度数据列名:

# plot_df is now in the same form as df in my original answer. 
plot_df = (df2.reset_index() 
       .pivot(index='dropoff_latitude', columns='dropoff_longitude')) 

从这里,sns.heatmap(plot_df)产生DES所需的热图 - 与上面显示的相同,但现在使用x轴排序从小到大的值。

+0

你知道我怎么能把我的格式翻译成你的建议?我使用了'df.groupby([pd.cut(df.dropoff_latitude,np.arange(40.60,41.0,0.05)),pd.cut(df.dropoff_longitude,np.arange(xlim [0],xlim [1], 0.05))])。count()'产生了上面的格式。 – madsthaks

+0

看到我更新的答案 - 它应该让你从你当前的表单到热图的最佳设置。 –

+0

太棒了,完美的工作 – madsthaks