2016-02-29 75 views
2

我有一个非常长且复杂的字符串,它带有新的换行符 - 我有一个困难的时间解析。我需要能够为每个下面的字段创建一个具有列的选择查询。SQL解析出多个子字符串

理想会找到new line break - 每一行 - 回到:之前的所有结肠应:new ling break之间的列名,一切都应该在字段中的数据。

所有的数据都是以字符串形式返回的,所以我只是为下面的每一行建立一个select语句。我不确定这是否可能。

第二种替代方案,硬编码并说出类似于CHARINDEX ('Home Phone:' ,notes, 0)的地方我在哪里找到家庭电话字符串,然后在指定字符串之后将:new ling break之间的所有内容取出。

在这种情况下,我的查询会说每个选择项目 - 找到字符串“家庭电话”和拉冒号后会发生什么,或者找到字符串“学校名称”等

这是该数据看起来像(在一个所有的字符串名为notes):

Home Phone: 1234567890 
Cell Phone: 1234567890 
Date of Birth: 01/01/1971 
School Name: James Jones High School 
Address:123 Main Street 
School City: Queens 
School State: PA 
School Zip: 32112 
Years Teaching: 12 
Grade Levels: Middle School 
Total Students: 120 
Subject: Music: 
How did they hear: Other, provide more info: Former partner teacher in the Middle School 
Type: Public/Charter 
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa. 
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc 

因此输出看起来是这样的(所有的长期问题,在各个领域回答为好)。

Home Phone Cell Phone Date of Birth: … Type:    Question 1 :    Question 2: Question 3: 
1234567890 1234567890 1/1/1971   Public/Charter  aaaaaaaa aaaaaaaaaaaaa.  bbb bbbbbbbbbb ccccccccccccccccccccccc 

我不确定这是否合理 - 但任何和所有建议都非常感谢。

拉动子字符串和新行char的代码 - 但是这是硬编码的。我无法弄清楚如何动态地做到这一点。

SELECT ltrim(rtrim(CHARINDEX ('Home Phone:' ,notes, 0) + LEN('Home Phone: '))) as 'beggining', 
     ltrim(rtrim(CHARINDEX (CHAR(10) ,notes, 0))) as 'ending', 
     SUBSTRING(notes,(CHARINDEX ('Home Phone:' ,notes, 0) + LEN('Home Phone: ')),(LEN('Home Phone: '))) as 'home phone',  
FROM table a 

谢谢!

+0

是你不知道如何在SQL字符串识别换行符的问题? –

+0

我不知道该怎么做。 – Elizabeth

+0

[查找SQL表中的文本字符串中的换行符?](http://stackoverflow.com/questions/13075079/finding-a-line-break-in-a-text-string-in- a-sql-table) –

回答

1

很多这种信贷(90%)应该去亚历ķ谁提供了关于寻找一个字符

SQL Server - find nth occurrence in a string

我把这个答案的第n次出现一个深入的答案,调整它为您的问题,然后应用一个PIVOT将其分解成所需的行/列。只要他们总是具有相同的逻辑(每个问题/答案由换行符分隔),该方法应该能够根据需要为尽可能多的独特问题集创建所需的输出。

--Creates temporary table for testing, ID column and second set of data 
--used to ensure query works for each unique set of questions 
IF OBJECT_ID('tempdb..#Results') IS NOT NULL 
    DROP TABLE #Results 

CREATE TABLE #Results 
    (ID INT IDENTITY(1,1) NOT NULL, 
    Notes NVARCHAR(4000) NOT NULL) 
INSERT INTO #Results 
    (Notes) 
VALUES 
    ('Home Phone: 1234567890 
    Cell Phone: 1234567890 
    Date of Birth: 01/01/1971 
    School Name: James Jones High School 
    Address:123 Main Street 
    School City: Queens 
    School State: PA 
    School Zip: 32112 
    Years Teaching: 12 
    Grade Levels: Middle School 
    Total Students: 120 
    Subject: Music: 
    How did they hear: Other, provide more info: Former partner teacher in the Middle School 
    Type: Public/Charter '), 
    ('Home Phone: test 
    Cell Phone: test 
    Date of Birth: test 
    School Name: test 
    Address:test 
    School City: test 
    School State: test 
    School Zip: test 
    Years Teaching: test 
    Grade Levels: test 
    Total Students: test 
    Subject: test 
    How did they hear: test 
    Type: test '); 

--Recursive CTE to determine the position of each successive line break 
--Used CHARINDEX to search CHAR(13) and CHAR(10) and find line breaks and carriage returns 
WITH cte 
AS 

    (SELECT ID, Notes, 1 AS Starts, CHARINDEX(CHAR(13)+CHAR(10),Notes) AS Pos 
    FROM #Results 
    UNION ALL 
    SELECT ID, Notes, Pos +1, CHARINDEX(CHAR(13)+CHAR(10),Notes,Pos+1) AS Pos 
    FROM cte 
    WHERE 
     pos >0), 

--2nd CTE breaks each question set into it's own row 
cte2 
AS 
    (SELECT ID, Notes,Starts, Pos, 
     SUBSTRING(Notes, Starts, 
      CASE 
       WHEN pos > 0 THEN (pos - starts) 
       ELSE LEN(notes) 
      END) AS Token 
    FROM cte), 

--3rd CTE cleans up the data, separating the Questions/Answers into separate columns 
--REPLACE is used to remove Line Break (CHAR(10)), output was then showing a TAB so used 
--double REPLACE and removed CHAR(9) (tab) 
--LTRIM removes leading space 
cte3 
AS 
    (SELECT ID, 
     LTRIM(REPLACE(REPLACE(SUBSTRING(Token,CHARINDEX(CHAR(13)+CHAR(10),Token),CHARINDEX(':',Token)),CHAR(10),''),CHAR(9),'')) AS Question, 
     LTRIM(SUBSTRING(Token,CHARINDEX(':',Token)+1,4000)) AS Answer 
    FROM cte2) 

--Pivot separates each Question/Answer row into it's own column 
SELECT * 
FROM 
    (SELECT ID, Question, Answer 
    FROM cte3) AS a 
PIVOT 
    (MAX(Answer) 
    FOR [Question] IN([Address],[Cell Phone],[Date of Birth],[Grade Levels],[Home Phone],[How did they hear], 
         [School City],[School Name],[School State],[School Zip],[Subject],[Total Students],[Type],[Years Teaching])) AS pvt 

我把每个部分意见,希望能解释我的逻辑,但让我知道如果您有任何问题。

编辑:动态透视

它可以使用动态SQL创建数据透视,它会自动拿起所有的“问题”列并相应地调整。因为我不得不使用多个CTE,所以我不相信它可以一步完成。我会做的是采取上述步骤来创建CTE,CTE2和CTE3(基本上所有的PIVOT查询之前)并创建这些步骤的视图,然后用该视图执行以下操作(对于我的示例,视图被称为“问卷“)

DECLARE @columns AS NVARCHAR(MAX) 
DECLARE @query AS NVARCHAR(MAX) 

SET @columns = STUFF((SELECT DISTINCT ',' + QUOTENAME(q.question) 
     FROM questionaire AS q 
     FOR XML PATH(''), TYPE 
     ).value('.','NVARCHAR(MAX)') 
     ,1,1,'') 

SET @query = 'SELECT ID, '+ @columns +' FROM 
     (
      SELECT ID, Answer, Question 
      FROM questionaire 
     ) AS a 
     PIVOT 
     (
      MAX(Answer) 
      FOR Question IN(' [email protected]+') 
     ) AS p' 
EXECUTE(@query) 
+0

这很好用 - 我想问一下,我添加/修改了一下逻辑。在最后,我创建了一个临时表'SELECT distinct('['+ Question +']')作为'Question',我将其放置在列标题中,并且试图将其添加到我的最后一个主键中。在枢轴For语句中,我有'FOR [Question] IN(从#temp选择问题))AS p',它似乎并不快乐。 – Elizabeth

+0

是的,PIVOT不允许你使用这样的SELECT语句来声明列。有可能使用动态SQL,我编辑了我的答案,以提供一个查询我将如何实现这一点。 – Jericho

+0

我的观点必须有所有三个cte的? 'CREATE VIEW问卷 AS 与 CTE1 AS(...), CTE2 AS(...), cte3 AS(...) SELECT ...' – Elizabeth

0

我知道很多周围的人不喜欢这个分离器,但它是我喜欢的。它只能处理高达8000的输入值,分隔符只能是单个字符。但是,其他分离器不具备一些优点,除非您有大量输入,否则几乎所有内容都可以使用。你可以在这里找到代码。 http://www.sqlservercentral.com/articles/Tally+Table/72993/评论(需要登录)可以运行多个页面,并且对这个分离器有很长的讨论。

然后其他人更喜欢使用枢轴这种事情,我更喜欢交叉表(也称为条件聚合),因为我觉得语法不那么钝。

我冒昧地修改了您的示例数据。我改变了手机的价值,因此与家庭手机不一样。我也缩短了对这些问题的回答,因为他们不需要数百个角色来演示这项技术。

declare @SomeValue varchar(8000) 

set @SomeValue = 'Home Phone: 1234567890 
Cell Phone: 3344556677 
Date of Birth: 01/01/1971 
School Name: James Jones High School 
Address:123 Main Street 
School City: Queens 
School State: PA 
School Zip: 32112 
Years Teaching: 12 
Grade Levels: Middle School 
Total Students: 120 
Subject: Music: 
How did they hear: Other, provide more info: Former partner teacher in the Middle School 
Type: Public/Charter 
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa. 
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc'; 

select 
    MAX(case when s.ItemNumber = 1 then x.Item end) as HomePhone 
    , MAX(case when s.ItemNumber = 2 then x.Item end) as DOB 
    , MAX(case when s.ItemNumber = 3 then x.Item end) as DOB 
    , MAX(case when s.ItemNumber = 4 then x.Item end) as SchoolName 
    , MAX(case when s.ItemNumber = 5 then x.Item end) as SchoolAddress 
    , MAX(case when s.ItemNumber = 6 then x.Item end) as SchoolCity 
    , MAX(case when s.ItemNumber = 7 then x.Item end) as SchoolState 
    , MAX(case when s.ItemNumber = 8 then x.Item end) as SchoolZip 
    , MAX(case when s.ItemNumber = 9 then x.Item end) as YearsTeaching 
    , MAX(case when s.ItemNumber = 10 then x.Item end) as GradeLevels 
    , MAX(case when s.ItemNumber = 11 then x.Item end) as TotalStudents 
    , MAX(case when s.ItemNumber = 12 then x.Item end) as Subject 
    , MAX(case when s.ItemNumber = 13 then x.Item end) as HowHeard 
    , MAX(case when s.ItemNumber = 14 then x.Item end) as SchoolType 
    , MAX(case when s.ItemNumber = 15 then x.Item end) as Question1 
    , MAX(case when s.ItemNumber = 16 then x.Item end) as Question2 
    , MAX(case when s.ItemNumber = 17 then x.Item end) as Question3 
from dbo.DelimitedSplit8K(@SomeValue, CHAR(10)) s 
cross apply dbo.DelimitedSplit8K(s.Item, ':') x 
0

你可以尝试xml这样的,但我得到musicprovide more info后去掉多余的:

DECLARE @string nvarchar(max) = ' 
Home Phone: 1234567890 
Cell Phone: 1234567890 
Date of Birth: 01/01/1971 
School Name: James Jones High School 
Address:123 Main Street 
School City: Queens 
School State: PA 
School Zip: 32112 
Years Teaching: 12 
Grade Levels: Middle School 
Total Students: 120 
Subject: Music 
How did they hear: Other, provide more info, Former partner teacher in the Middle School 
Type: Public/Charter 
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa. 
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc' 
,@xml as xml 

SELECT @xml = REPLACE ('<mystring><fieldname id="'+REPLACE(REPLACE(right(@string,LEN(@string)-2),':','" >'),CHAR(10),'</fieldname><fieldname id="')+'</fieldname></mystring>' ,CHAR(13),'') 

SELECT 
    n.v.value('(fieldname[@id="Home Phone"])[1]','NVARCHAR(11)') AS 'Home Phone', 
    n.v.value('(fieldname[@id="Cell Phone"])[1]','NVARCHAR(11)') AS 'Cell Phone', 
    n.v.value('(fieldname[@id="Date of Birth"])[1]','NVARCHAR(12)') AS 'Date of Birth', 
    n.v.value('(fieldname[@id="School Name"])[1]','NVARCHAR(30)') AS 'School Name', 
    n.v.value('(fieldname[@id="Address"])[1]','NVARCHAR(30)') AS 'Address', 
    n.v.value('(fieldname[@id="School City"])[1]','NVARCHAR(15)') AS 'School City', 
    n.v.value('(fieldname[@id="School State"])[1]','NVARCHAR(10)') AS 'School State', 
    n.v.value('(fieldname[@id="School Zip"])[1]','NVARCHAR(6)') AS 'School Zip', 
    n.v.value('(fieldname[@id="Years Teaching"])[1]','NVARCHAR(5)') AS 'Years Teaching', 
    n.v.value('(fieldname[@id="Grade Levels"])[1]','NVARCHAR(15)') AS 'Grade Levels', 
    n.v.value('(fieldname[@id="Total Students"])[1]','NVARCHAR(5)') AS 'Total Students', 
    n.v.value('(fieldname[@id="How did they hear"])[1]','NVARCHAR(100)') AS 'How did they hear', 
    n.v.value('(fieldname[@id="Type"])[1]','NVARCHAR(25)') AS 'Type', 
    n.v.value('(fieldname[@id="Question 1"])[1]','NVARCHAR(128)') AS 'Question 1', 
    n.v.value('(fieldname[@id="Question 2"])[1]','NVARCHAR(128)') AS 'Question 2', 
    n.v.value('(fieldname[@id="Question 3"])[1]','NVARCHAR(128)') AS 'Question 3' 
FROM @xml.nodes('mystring') as n(v); 

结果:

Home Phone Cell Phone Date of Birth School Name     Address      School City  School State School Zip Years Teaching Grade Levels Total Students How did they hear                     Type      Question 1                              Question 2                              Question 3 

1234567890 1234567890 01/01/1971 James Jones High School  123 Main Street     Queens   PA   32112  12    Middle School 120    Other, provide more info, Former partner teacher in the Middle School        Public/Charter   aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa aaaaaaaaaaaaaaaaa bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccc 

(1 row(s) affected)