2013-10-17 46 views
1

我想要的丰富网页摘要数据应用到我的网页,下面http://schema.org/Article标准。其中一个属性是articleBody,我期望应该包括构成文章的整个文本。如何从丰富的片段元素中排除内容?

不幸的是,该文章的HTML表示会偶尔出现按钮,广告和其他提示,其文本不应进入articleBody

例如:

<div itemscope itemtype="http://schema.org/Article"> 
    <div itemtype="articleBody"> 
    <p>1st Paragraph</p> 
    <p>2nd paragraph</p> 
    <a>A few useful links for my users</a> 
    <p>3rd paragraph</p> 
    <div>A few text ads</div> 
    <p>4th paragraph</p> 
    </div> 
</div> 

有没有办法排除从文章本身的广告/链接文本?

+0

请注意,您有一个错误在你的代码:'项目类型= “articleBody”'应该是' itemprop = “articleBody”'。 – unor

回答

1

不,微观数据不提供一种方法来排除内容。

articleBodyvalue will be the textContent of the element


丑陋“黑客”将是这个项目的指定几个articleBody属性:

<div itemscope itemtype="http://schema.org/Article"> 
    <div itemtype="articleBody"> 
    <p>1st Paragraph</p> 
    <p>2nd paragraph</p> 
    </div> 
    <a>A few useful links for my users</a> 
    <p itemtype="articleBody">3rd paragraph</p> 
    <div>A few text ads</div> 
    <p itemtype="articleBody">4th paragraph</p> 
    </div> 
</div> 

但要注意,Microdata does not define how those values should be interpreted,所以它的消费者。


再丑方法:

复制的信息,包含在meta element

<div itemscope itemtype="http://schema.org/Article"> 
    <div> 
    <p>1st Paragraph</p> 
    <p>2nd paragraph</p> 
    <a>A few useful links for my users</a> 
    <p>3rd paragraph</p> 
    <div>A few text ads</div> 
    <p>4th paragraph</p> 
    </div> 
    <meta itemtype="articleBody" content="1st Paragraph. 2nd paragraph. 3rd paragraph. 4th paragraph." /> 
</div>