2012-01-11 40 views
0

我在SQL Server列中有一些html内容,我想从html中读取内容。如何从SQL Server列中的html中获取数据

例如:

<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421"> 
    <ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, 'Options are required.')" onclick="design_validate_choice(1, -1, this, 'Options are required.')" onblur="design_validate_choice(1, -1, this, 'Options are required.')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical"> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" /> 
     <label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" /> 
     <label contenteditable="true" unselectable="off" for="ID5115606">2-4</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" /> 
     <label contenteditable="true" unselectable="off" for="ID477116">5-7</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" /> 
     <label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label> 
    </li> 
    </ol> 
</ektdesignns_choices><input type="submit" value="Vote" /> 

我想读这个网站的所有标签。任何人有任何想法,我该怎么办?

回答

1

如果您的HTML确实符合XHTML,如果您有存储在XML列在SQL Server表中的HTML,那么你可以从它在T-SQL中使用XQuery检索您的标签:

DECLARE @HtmlTbl TABLE (ID INT IDENTITY, Html XML) 

INSERT INTO @HtmlTbl(Html) VALUES('<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421"> 
    <ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, ''Options are required.'')" onclick="design_validate_choice(1, -1, this, ''Options are required.'')" onblur="design_validate_choice(1, -1, this, ''Options are required.'')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical"> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" /> 
     <label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" /> 
     <label contenteditable="true" unselectable="off" for="ID5115606">2-4</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" /> 
     <label contenteditable="true" unselectable="off" for="ID477116">5-7</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" /> 
     <label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label> 
    </li> 
    </ol></ektdesignns_choices><input type="submit" value="Vote" />') 

这将检索所有从你的(X)元素HTML作为一个单独的XML字符串:

SELECT 
    Html.query('//label') 
FROM @HtmlTbl 
WHERE ID = 1 

输出:

<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label> 
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label> 
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label> 
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label> 

或者这将选择<label>标签的所有内容,每行一个:

SELECT 
    C.value('(.)[1]', 'varchar(1000)') 
FROM @HtmlTbl 
CROSS APPLY Html.nodes('//label') AS T(C) 
WHERE ID = 1 

输出:

1 or fewer 
2-4 
5-7 
8 or more 
0

从数据库中提取数据,然后使用HTML解析器提取所需的信息。它会让你的生活变得更容易很多

不管你做什么,请不要尝试,除非你是寻找一个正则表达式匹配的数据使用RegExs。 (因为HTML不是常规语言,它通常会导致比解决问题更多的问题)

0

如果您拥有的所有HTML格式都与此格式相同,则可以将其转换为XML并使用一些XQuery来查找标签节点,

select T.N.value('.', 'nvarchar(100)') 
from Table 
    cross apply XMLCol.nodes('//label') as T(N) 
相关问题