2016-12-15 41 views
0

我在Windows 7笔记本电脑上使用PostgreSQL 9.6.1来编译和分析来自不同来源的大型数据集。我的一位客户注意到,在我提供给他们的最终报告中,她所在州的一些人正与其他州合并。PostgreSQL:强制执行表中的行顺序

在本报告中,我创建了决赛桌:

CREATE UNLOGGED TABLE LPIS_IssuanceDetail (
    ID SERIAL PRIMARY KEY, 
    Zone TEXT DEFAULT NULL, 
    State TEXT DEFAULT NULL, 
    LastName TEXT DEFAULT NULL, 
    FirstName TEXT DEFAULT NULL, 
    Email TEXT DEFAULT NULL, 
    UPN TEXT DEFAULT NULL, 
    LincPassUsed TEXT DEFAULT NULL, 
    EmployeeID TEXT DEFAULT NULL, 
    EmploymentType TEXT DEFAULT NULL, 
    NonEmployeeCategory TEXT DEFAULT NULL, 
    EmploymentStatus TEXT DEFAULT NULL, 
    ISAComplete TEXT DEFAULT NULL, 
    ISACompletionDate TIMESTAMP WITHOUT TIME ZONE, 
    LincPassStatus TEXT DEFAULT NULL, 
    ERO TEXT DEFAULT NULL, 
    Sponsored TEXT DEFAULT NULL, 
    Enrolled TEXT DEFAULT NULL, 
    Adjudicated TEXT DEFAULT NULL, 
    ShipToSite TEXT DEFAULT NULL, 
    ValidSite TEXT DEFAULT NULL, 
    CardExpiration DATE, 
    CertExpiration DATE, 
    LastEnrollment DATE, 
    EnrollmentExpiration DATE, 
    NewEnrollment TEXT DEFAULT NULL, 
    Sponsor TEXT DEFAULT NULL, 
    ContractEnd DATE, 
    ContractID TEXT DEFAULT NULL, 
    ContractPOC TEXT DEFAULT NULL 
); 

我然后填充这个表与从主数据表中的数据:

INSERT INTO LPIS_IssuanceDetail (
    Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID, 
    EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete, 
    ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated, 
    ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration, 
    CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC 
) 
SELECT 
    Zone, StateName, MAS_LastName, MAS_FirstName, MAS_Email, MAS_UPN, 
    LincPassUsed, MAS_EmployeeID, MAS_Category, MAS_OrgRelType, 
    MAS_EmploymentStatus, ISAComplete, ISA_CompletionDate, MAS_IssuanceStatus, 
    MAS_FedEmerResponse, Sponsored, Enrolled, Adjudicated, MAS_ShipToCityState, 
    MAS_ValidShipToSite, MAS_CertExpireDate, MAS_LastEnrollmentDate, MAS_EnrollExpireDate, 
    MAS_CardExpireDate, MAS_NewEnrollment, MAS_Sponsor, MAS_PeriodofPerformanceEndDate, 
    MAS_ContractID, MAS_ContractPOC 
FROM LPIS_MasterData 
ORDER BY Zone, StateName, MAS_LastName, MAS_FirstName; 

果然,当我滚动在这张表的下面,我发现单个记录穿插在序列之外,就像这个样本,其中缅因州的一条记录不合适:

id  | zone | state | lastname | firstname 
11849 | 3 | Georgia | Roberts | George 
11850 | 3 | Georgia | Smith | Dan 
11922 | 3 | Maine | Edwards | John 
11851 | 3 | Georgia | Snowden | Ed 
11852 | 3 | Georgia | Williams | Casey 

作为测试,我甩只是前四列到一个单独的表,就像这样:

CREATE UNLOGGED TABLE LPIS_DetailTest (
    ID SERIAL PRIMARY KEY, 
    Zone TEXT DEFAULT NULL, 
    State TEXT DEFAULT NULL, 
    LastName TEXT DEFAULT NULL, 
    FirstName TEXT DEFAULT NULL 
); 

INSERT INTO LPIS_DetailTest (
    Zone, State, LastName, FirstName 
) 
SELECT 
    Zone, State, LastName, FirstName 
    FROM LPIS_IssuanceDetail 
    ORDER BY Zone, State, LastName, FirstName; 

而且所有行的都是在预期的顺序:

id  | zone | state | lastname | firstname 
11849 | 3 | Georgia | Roberts | George 
11850 | 3 | Georgia | Smith | Dan 
11851 | 3 | Georgia | Snowden | Ed 
11852 | 3 | Georgia | Williams | Casey 
11853 | 3 | Georgia | Spaid | Dennis 

为什么会这样较小表正确地使用相同的确切ORDER BY子句作为较大的表,其中一些行是无序的?

数据库和所有表都设置为UTF8。

我已经看过所有东西,并且不知道为什么ORDER BY子句产生这样奇怪的结果。我还能检查什么?

编辑:附加信息

在我的剧本,立即INSERT INTO ... SELERCT ...语句,用COPY的数据转储到CSV文件,像这样:当

-- Export data to CSV files 
COPY LPIS_IssuanceDetail (
    Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID, 
    EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete, 
    ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated, 
    ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration, 
    CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC 
) 
TO 'C:/Users/Michael.Sheaver/Documents/LincPass/Datasets/Compiled Reports/LPIS_IssuanceDetail.csv' 
WITH (
    FORMAT CSV, 
    DELIMITER ',', 
    NULL '', 
    HEADER TRUE, 
    QUOTE '"', 
    ENCODING 'UTF8' 
); 

然后我将该CSV文件导入电子表格以供最终演示,我必须手动对ID列中的数据进行排序,然后删除该列。

新问题: 有没有我可以在INSERT INTO使用声明,将严格保护行的顺序遵循什么在ORDER BY子句指定的任何选项?

+2

“*我向下滚动表*” - 又是怎样为“向下滚动”的结果产生的?如果该选择没有'order by',则行的顺序未定义。仅仅因为你以特定顺序插入行并不意味着'select'会按顺序返回它们。只有***(真的:只有**)才能获得一致的订单,就是在选择行时使用订单。您在insert语句的源代码中使用的'order by'实质上是无用的。 –

+0

@a_horse_with_no_name,我怀疑是这样,它让我陷入了一个小窘境。紧随SELECT语句之后,我使用COPY ... TO ....将已处理的数据集转储到CSV文件,并且COPY的语法不支持ORDER BY。 –

回答

1

如果你想在排序CSV文件中的数据,使用copyselect声明:

COPY (select Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID, 
    EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete, 
    ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated, 
    ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration, 
    CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC 
    from LPIS_IssuanceDetail 
    ORDER BY Zone, State, LastName, FirstName 
) 
TO 'C:/Users/Michael.Sheaver/Documents/LincPass/Datasets/Compiled Reports/LPIS_IssuanceDetail.csv' 
WITH (FORMAT CSV, DELIMITER ',', NULL '', HEADER TRUE, QUOTE '"', ENCODING 'UTF8'); 
+0

我必须说这个解决方案非常简单!在看到你的答案之后,我回到了PostgreSQL的COPY语句页面,果然,埋在语法中的是(查询)条目,当然我错过了!您的帮助最受赞赏! –