保持一定的文本，并使用选择

从下面的html元素如何选择，以保持文本hi there!!和使用CSS选择放弃其他文本Cat丢弃来自某些要素的休息吗？此外，使用.text或.text.strip()我没有得到结果，但是当我使用.text_content()我得到的文本。保持一定的文本，并使用选择

from lxml.html import fromstring 

html=""" 
<div id="item_type" data-attribute="item_type" class="ms-crm-Inline" aria-describe="item_type_c"> 
    <div> 
     <label for="item_type_outer" id="Type_outer"> 
      <div class="NotVisible">Cat</div> 
     Hi there!! 
      <div class="GradientMask"></div> 
     </label> 
    </div> 
</div> 
""" 
root = fromstring(html) 
for item in root.cssselect("#Type_outer"): 
    print(item.text) # doesn't work 
    print(item.text.strip()) # doesn't work 
    print(item.text_content()) # working one

结果：

Cat 
Hi there!!

不过，我想获得的结果仅仅是hi there!!并为我的尝试是：

root.cssselect("#Type_outer:not(.NotVisible)") #it doesn't work either

并再次提问：

为什么.text_content()是工作ing但是.text或.text.strip()是不是？
我怎样才能只使用hi there!! CSS选择器？

来源

2017-10-14 SIM

在LXML树模型，你想要得到的文本是在div的tail带班“NotVisible”：

>>> root = fromstring(html) 
>>> for item in root.cssselect("#Type_outer > div.NotVisible"): 
...  print(item.tail.strip()) 
... 
Hi there!!

所以要回答第一个问题，只有文本节点，是不是元素前面是父级的text属性。具有上述兄弟元素的文本节点（如该问题中的节点）将位于该元素的tail属性中。

另一种方式来获取文本“您好！”通过查询label的直接子节点的非空文本节点。可以使用XPath表达式来查询这种详细程度：

for item in root.cssselect("#Type_outer"): 
    print(item.xpath("text()[normalize-space()]")[0].strip())

来源

2017-10-14 08:36:58 har07

没办法！你非常有帮助。最后一两件事：你能告诉我为什么'root.cssselect（ “＃Type_outer：没有（.NotVisible）”）'会失败？原谅我的无知。再次感谢。 – SIM

该表达式选择ID为“Type_outer”的*元素没有类“NotVisible”*，所以在这种情况下，它基本上返回与#Type_outer相同的元素，因为具有该ID的标签也没有类“NotVisible” – har07

保持一定的文本，并使用选择

回答

相关问题