2016-08-12 32 views
0

例如,我有下列两列,分别为Address1refAddr识别两列中的相似字符串值

表中的一些示例数据如下所示。

enter image description here

我想的比较两列用于匹配。显然在这张表中,5235 JFK BLVD & 5235 John F Kennedy是一对,424 N 2ND ST & 424 NORTH SECOND是一对。

无论如何SQL或SSIS我可以用来摆脱非对结果并保留对?

+2

地址匹配和固定是特别通常不包含在数据库中的通用软件。 –

+0

购买主数据管理软件来做到这一点。 – dfundako

+0

在SSIS中使用带有正则表达式的脚本组件,并标记那些在附加列中匹配的行,然后您可以过滤这些行。 –

回答

3

一个选项是您可以使用GOOGLE API对地址进行地理编码,解析JSON结果以返回更加标准化的结果。这可能会很耗时,但您会对数据更有信心。

该API允许(我相信)每天2500次点击,但您可以购买更多。

例如,我选择了5232 JFK Blvd并添加了72116的邮政编码以缩小搜索范围。如果没有邮政编码它返回了多个地址(NY,NJ,AR,等)

https://maps.googleapis.com/maps/api/geocode/json?address=5232%20JFK%20Blvd&72116sensor=false 

的关键要素可以是:

formatted_address: "5232 J.F.K. Blvd, North Little Rock, AR 72116, USA", 
or 
long_name: "John F. Kennedy Boulevard", 

返回

{ 
results: [ 
{ 
address_components: [ 
{ 
long_name: "5232", 
short_name: "5232", 
types: [ 
"street_number" 
] 
}, 
{ 
long_name: "J.F.K. Boulevard", 
short_name: "J.F.K. Blvd", 
types: [ 
"route" 
] 
}, 
{ 
long_name: "North Little Rock", 
short_name: "North Little Rock", 
types: [ 
"locality", 
"political" 
] 
}, 
{ 
long_name: "Hill Township", 
short_name: "Hill Township", 
types: [ 
"administrative_area_level_3", 
"political" 
] 
}, 
{ 
long_name: "Pulaski County", 
short_name: "Pulaski County", 
types: [ 
"administrative_area_level_2", 
"political" 
] 
}, 
{ 
long_name: "Arkansas", 
short_name: "AR", 
types: [ 
"administrative_area_level_1", 
"political" 
] 
}, 
{ 
long_name: "United States", 
short_name: "US", 
types: [ 
"country", 
"political" 
] 
}, 
{ 
long_name: "72116", 
short_name: "72116", 
types: [ 
"postal_code" 
] 
} 
], 
formatted_address: "5232 J.F.K. Blvd, North Little Rock, AR 72116, USA", 
geometry: { 
bounds: { 
northeast: { 
lat: 34.8032656, 
lng: -92.2538364 
}, 
southwest: { 
lat: 34.8032599, 
lng: -92.2538538 
} 
}, 
location: { 
lat: 34.8032599, 
lng: -92.2538364 
}, 
location_type: "RANGE_INTERPOLATED", 
viewport: { 
northeast: { 
lat: 34.8046117302915, 
lng: -92.2524961197085 
}, 
southwest: { 
lat: 34.8019137697085, 
lng: -92.2551940802915 
} 
} 
}, 
place_id: "EjI1MjMyIEouRi5LLiBCbHZkLCBOb3J0aCBMaXR0bGUgUm9jaywgQVIgNzIxMTYsIFVTQQ", 
types: [ 
"route", 
"street_address" 
] 
}, 
{ 
address_components: [ 
{ 
long_name: "5232", 
short_name: "5232", 
types: [ 
"street_number" 
] 
}, 
{ 
long_name: "John F. Kennedy Boulevard", 
short_name: "John F. Kennedy Blvd", 
types: [ 
"route" 
] 
}, 
{ 
long_name: "West New York", 
short_name: "West New York", 
types: [ 
"locality", 
"political" 
] 
}, 
{ 
long_name: "Hudson County", 
short_name: "Hudson County", 
types: [ 
"administrative_area_level_2", 
"political" 
] 
}, 
{ 
long_name: "New Jersey", 
short_name: "NJ", 
types: [ 
"administrative_area_level_1", 
"political" 
] 
}, 
{ 
long_name: "United States", 
short_name: "US", 
types: [ 
"country", 
"political" 
] 
}, 
{ 
long_name: "07093", 
short_name: "07093", 
types: [ 
"postal_code" 
] 
} 
], 
formatted_address: "5232 John F. Kennedy Blvd, West New York, NJ 07093, USA", 
geometry: { 
bounds: { 
northeast: { 
lat: 40.78574, 
lng: -74.0231416 
}, 
southwest: { 
lat: 40.7857366, 
lng: -74.0231598 
} 
}, 
location: { 
lat: 40.78574, 
lng: -74.0231416 
}, 
location_type: "RANGE_INTERPOLATED", 
viewport: { 
northeast: { 
lat: 40.78708728029149, 
lng: -74.02180171970849 
}, 
southwest: { 
lat: 40.7843893197085, 
lng: -74.0244996802915 
} 
} 
}, 
place_id: "Ejc1MjMyIEpvaG4gRi4gS2VubmVkeSBCbHZkLCBXZXN0IE5ldyBZb3JrLCBOSiAwNzA5MywgVVNB", 
types: [ 
"route", 
"street_address" 
] 
} 
], 
status: "OK" 
}