SQL查询到一个逗号分隔的列分割成许多一对多的关系

我被赋予了的3Gb csv文件，我需要在SQL Server中导入2012SQL查询到一个逗号分隔的列分割成许多一对多的关系

我现在有500万行的数据在分段表看起来像这样（简化）。

Staging表：

+-------------------+------------+---------------+------------+ 
|  Name  | Thumbnail |  Tags  | Categories | 
+-------------------+------------+---------------+------------+ 
| History   | thumb1.jpg | history,essay | history | 
| Nutricion Lecture | thumb2.jpg | food,essay | health  | 
+-------------------+------------+---------------+------------+

的问题是关于我的临时表中的tags和categories列。

如何从我的临时表中的信息传递给我实际的表，也可以创建一个唯一记录每个标签和类别 - 和创建所需的许多一对多的关系？

需要根据现有标签检查每个标签，以创建新记录 - 或 - 获取现有标签的Id。

Programs：

+----+-----------+------------+ 
| id | Program | Thumbnail | 
+----+-----------+------------+ 
| 1 | History | thumb1.jpg | 
| 2 | Nutricion | thumb2.jpg | 
+----+-----------+------------+

Tags：

+----+---------+ 
| Id | Tag | 
+----+---------+ 
| 1 | history | 
| 2 | essay | 
| 3 | food | 
+----+---------+

（分类表省略，因为它看起来一样的标签）

的许多一对多的关系：

Programs_Tags：

+---------+-----+ 
| program | tag | 
+---------+-----+ 
|  1 | 1 | 
|  1 | 2 | 
|  2 | 2 | 
+---------+-----+

Programs_Categories：

+---------+----------+ 
| program | category | 
+---------+----------+ 
|  1 |  1 | 
|  2 |  2 | 
+---------+----------+

我认为这是纯粹的SQL更快那么这将是为它编写的工具。

来源

2014-04-20 Fred Fickleberry III

我不确定这是否在SQL中更快。但是，这是一种方法。

首先，创建五个表，你需要为这个：

程序
标签
分类
ProgramTags
ProgramCategories

有了适当的结构，包括身份标识列。

然后将数据加载到程序中。这很容易，只是一个适当的选择。

然后创建Tags和Categories表。这里是你将如何装载Tags表：

with cte as (
     select (case when tags like '%,%' 
        then left(tags, charindex(tags, ',')) 
        else tags 
       end) as tag, 
      (case when tags like '%,%' 
        then substring(tags, charindex(tags, ',') + 1, len(tags)) 
       end) as resttags 
     from staging 
     where tags is not null and tags <> '' 
     union all 
     select (case when resttags like '%,%' then left(resttags, charindex(tags, ',')) 
        else resttags 
       end) as tag, 
      (case when tags like '%,%' 
        then substring(resttags, charindex(resttags, ',') + 1, len(testtags)) 
       end) as resttags 
     from cte 
     where resttags is not NULL and resttags <> '' 
    ) 
select distinct tags 
from cte;

（显然，这需要一个insert）。

对Categories做同样的处理。

然后通过加载ProgramTags：

select p.ProgramId, t.TagId 
from staging s join 
    Programs p 
    on s.<whatever> = p.<whatever> join 
    Tags t 
    on ','+s.tags+',' like '%,'+t.tag+',%';

第一个加入是让程序ID。第二个是获取适当的标签。表现不会很好，但它可能足够满足你需要做的事情。

来源

2014-04-20 13:47:42

无法运行--->消息8116，级别16，状态1，行1 参数数据类型int对于子字符串函数的参数1无效。消息207，级别16，状态1，行12 无效的列名'标记'。消息207，级别16，状态1，行15 列名'标记'无效。消息207，级别16，状态1，行16 无效的列名'testtags'。消息207，级别16，状态1，行16 无效的列名'testtags'。 –

@FrankieYale。。。我不知道我在想什么，把第一个参数换成'substr（）'的参数。 –

SQL查询到一个逗号分隔的列分割成许多一对多的关系

回答

相关问题