2015-01-07 125 views
0

在搜索XSLT的某些分析工具时,我遇到了this后。因为那里有很多人建议只发布代码并提供反馈意见,所以我想知道是否有人可以给我一些关于我的反馈。我试过这个(http://www.saxonica.com/documentation/#!using-xsl/performanceanalysis),但输出的html不是很详细。优化XSLT代码

我是XSLT新手,通常使用python/perl,其中正则表达式的支持要好得多(但是,我不排除这只是对XSLT的基本理解)。然而,为了这个项目的目的,我必须使用XSLT。可能是因为我强迫它以非常不自然的方式做事。任何评论 - 特别是表现,但也欢迎,因为我想学习 - 欢迎!

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions"> 

<xsl:template name="my_terms"> 

<xsl:variable name="excludes" select="not (codeblock or draft-comment or filepath or shortdesc or uicontrol or varname)"/> 

<!-- leftover example of how to work with excludes var --> 
<!--<xsl:if test=".//*[$excludes]/text()[contains(.,'access management console')]"><li class="prodterm"><b>PB QA:access management console should be "AppCenter"</b></li></xsl:if>--> 


<!-- Loop through all sentences and check for deprecated stuff --> 
<xsl:for-each select=".//*[$excludes]/text()"> 
    <xsl:variable name="sentenceList" select="tokenize(., '[\.!\?:;]\s+')"/> 
    <xsl:variable name="segment" select="."/> 

    <!-- main sentence loop --> 
    <xsl:for-each select="$sentenceList"> 
     <xsl:variable name="sentence" select="."/> 
     <!-- very rudimentary sentence length check --> 
     <xsl:if test="count(tokenize(., '\W+')) &gt; 30"> <li class="prodterm"><b>Sentence too long:</b> <xsl:value-of select="."/></li></xsl:if> 

     <!-- efforts to flag the shady case of the gerund --> 
     <xsl:if test="matches(., '\w+ \w+ing (the|a)')"> 
      <!-- some extra checks to weed out the false positives --> 
      <xsl:if test="not(matches(., '\b(on|about|for|before|while|when|after|by|a|the|an|some|all|every) \w+ing (the|a)', '!i')) and not(matches(., 'during'))"> 
       <li class="prodterm"><b>Possible unclear usage of gerund. If so, consider rewriting:</b> <xsl:value-of select="."/></li> 
      </xsl:if> 
     </xsl:if> 

     <!-- comma's after certain starting phrases --> 
     <xsl:if test="matches(., '^\s*Therefore[^,]')"><li class="prodterm"><b>Use a comma after starting a sentence with 'Therefore':</b> <xsl:value-of select="."/></li></xsl:if> 
     <xsl:if test="matches(., '^\s*(If you|Before|When)[^,]+$')"><li class="prodterm"><b>Use a comma after starting a sentence with 'Before', 'If you' or 'When':</b> <xsl:value-of select="."/></li></xsl:if> 

     <!-- experimenting with phrasal verbs (if there are a lot of verbs in phrasalVerbs.xml, it will be better to have this as the main loop (and do it outside the sentence loop)) --> 
     <xsl:for-each select="document('phrasalVerbs.xml')/verbs/verb[matches($sentence, concat('.* ', ./@text, ' .*'))]"> 
      <xsl:variable name="verbPart" select="."/> 
      <xsl:for-each select="$verbPart/particles/particle/@text[matches($sentence, .) and not(matches($sentence, concat($verbPart/@text, ' ', .)))]"> 
       <xsl:variable name="particle" select="."/> 
       <li class="prodterm"><b>Separated phrasal verb found in:</b> <xsl:value-of select="$sentence"/></li>  
      </xsl:for-each> 
     </xsl:for-each> 


     <!-- checking if conditionals (should be followed by then) --> 
     <xsl:if test="matches($sentence, '^\s*If\b', '!i') and not(matches($sentence, '\bthen\b', '!i'))"><li class="prodterm"><b>Conditional If found, but no then:</b> <xsl:value-of select="."/></li></xsl:if> 


     <!-- very dodgy way of detecting passive voice --> 
     <!--<xsl:if test="matches($sentence, '\b(are|can be|must be) \w+ed\b', '!i')"><li class="prodterm"><b>PB QA:Possible passive voice. If so, consider using active voice for:</b> <xsl:value-of select="."/></li></xsl:if>--> 


     <xsl:for-each select='document("generalDeprecatedTermsAndPhrases.xml")/terms/dt'> 
      <xsl:variable name="pattern" select="./@pattern"/> 
      <xsl:variable name="message" select="./@message"/> 
      <xsl:variable name="regexFlag" select="./@regexFlag"/> 

      <!-- <xsl:if test="matches($sentence, $pattern, $regexFlag)"> --> 
      <xsl:if test="matches($sentence, concat('(^|\W)', $pattern, '($|\W)'), $regexFlag)"> <!-- This is the work around for not being able to use \b when variable is passed on inside matches() --> 
       <li class="prodterm"><b><xsl:value-of select="$message"/> in: </b> <xsl:value-of select="$sentence"/> </li> 
      </xsl:if> 
     </xsl:for-each> 


    </xsl:for-each> 
</xsl:for-each> 
</xsl:template> 
</xsl:stylesheet> 

为了得到一个想法,我的“generalDeprecatedTermsAndPhrases.xml”的精简版本是这样的:

<dt pattern='to be able to' message="Use 'to' instead of 'to be able to'" regexFlag="i"></dt> 

</terms> 

回答

0

之所以撒克逊人的个人资料不是很详细的是,你的代码是如此单一:这一切都在一个伟大的模板规则。

然而,单片化本身并不是造成任何性能问题的原因。

首先观察是功能问题:你的变量

<xsl:variable name="excludes" select="not (codeblock or draft-comment or filepath or shortdesc or uicontrol or varname)"/> 

不会做你的想法。它以根文档节点作为上下文项目进行评估,其值为布尔值,如果最外层元素的名称不是所列出的名称之一,则该值为true。所以我认为你的xsl:for-each使用[$ excludes]作为谓词适用于所有元素,而我怀疑你打算将它应用于选定的元素。我不知道这对性能有多大影响。

对性能的主要影响将是评估正则表达式的成本。找出哪些问题导致问题的最好方法是衡量一个接一个去除问题的影响。当你将问题孤立起来时,可能会重写正则表达式以使其表现更好(例如避免回溯)。

+0

谢谢,很高兴知道这是如何排除工程。我确实考虑过重写,以便分析器提供更详细的图像。但我会先检查正则表达式,然后感谢指针! – Igor

+0

有趣的细节;当我在你的帖子下面读到你的名字时,它确实响了,但没有进一步思考。到现在为止,当我意识到自己已经有了你的书(“XSLT第2版,程序员参考”)在我的桌子上放置。所以我一直在看你的脸(好吧,我假设它是你脸上的封面)。好书,已经帮了我很多场合! – Igor