在sql中像在spss中一样创建数据透视表

我在PostgreSQL中有很多数据。但我需要做一些数据透视表，就像SPSS一样。例如，我有城市和州的表。在sql中像在spss中一样创建数据透视表

create table cities 
(
    city integer, 
    state integer 
); 
insert into cities(city,state) values (1,1); 
insert into cities(city,state) values (2,2); 
insert into cities(city,state) values (3,1); 
insert into cities(city,state) values (4,1);

其实在这个表中我有4个城市和2个州。我想做的数据透视表有个像

city\state |state-1| state-2| 
city1  |33% |0%  | 
city2  |0%  |100% | 
city3  |33% |0%  | 
city4  |33% |0%  | 
totalCount |3  |1  |

我understant如何在SQL这种情况下，格外做到了这一点。但我想要的是通过另一个交叉变量（只计数不同的值，并通过“count（*）where variable_in_column_names = 1等等）使用一些存储的函数进行区分。我正在寻找plpython。我的一些问题是：

如何使用没有与形状适合输出列的数量和类型的临时表输出的记录集。
也许有可行的解决方案？

我所看到的，输入会是表名，第一个变量的列名，第二个变量的列名。在函数体中做很多查询计数（*），通过变量中的每个不同值进行循环并对其进行计数等），然后返回带有百分比的表格。

事实上，我有很多行的一个查询（约10K）和可能会做这样的事情在原蟒蛇，不plpython最好的方法是什么？

来源

2012-12-10 norecces

检查出来的'crosstab'功能'tablefunc'模块：http://www.postgresql.org/docs/current/static/tablefunc.html –

伊夫看着交叉表之前，但它不是一个完整的解决方案，它只是简化了输入。由于我无法在交叉表中添加总计并向变量添加标签。所以我认为函数会像交叉表一样返回表格，但我也必须做很多计算（总计，百分比等）。 – norecces

你可能想要给pandas一个尝试，这是一个很好的python数据分析库。

要查询的PostgreSQL数据库：

import psycopg2 
import pandas as pd 
from pandas.io.sql import frame_query 

conn_string = "host='localhost' dbname='mydb' user='postgres' password='password'" 
conn = psycopg2.connect(conn_string) 
df = frame_query('select * from cities', con=conn)

凡df是DataFrame这样的：

city state 
0 1 1 
1 2 2 
2 3 1 
3 4 1

然后可以使用pivot_table和总除以得到的百分比创建数据透视表：

totals = df.groupby('state').size() 
pivot = pd.pivot_table(df, rows='city', cols='state', aggfunc=len, fill_value=0)/totals

给哟ü结果：

state 1 2 
city   
1 0.333333 0 
2 0   1 
3 0.333333 0 
4 0.333333 0

最后得到你想要的布局，你只需要重命名索引和列，并追加总计：

totals_frame = pd.DataFrame(totals).T 
totals_frame.index = ['totalCount'] 

pivot.index = ['city%i' % item for item in pivot.index] 
final_result = pivot.append(totals_frame) 
final_result.columns = ['state-%i' % item for item in final_result.columns]

给你：

  state-1  state-2 
city1  0.333333 0 
city2  0.000000 1 
city3  0.333333 0 
city4  0.333333 0 
totalCount 3.000000 1

来源

2012-12-11 21:52:29

谢谢！熊猫适合我。工作接近完成。 – norecces

检查PostgreSQL窗口函数。可能会给你一个非（pl）python解决方案。 http://blog.hashrocket.com/posts/sql-window-functions

来源

2012-12-15 23:21:40 Carlos

在sql中像在spss中一样创建数据透视表

回答

相关问题