2011-10-29 36 views
5

我正在做一种javascript代码的粗略解析,使用javascript。我会尽量详细说明为什么我需要这样做,但足以说我不要想整合大量的库代码,因为它对我的目的是没有必要的,重要的是我保持这个非常轻巧,相对简单。所以请不要建议我使用JsLint或类似的东西。如果答案比可以粘贴到答案中的代码多,那可能比我想要的要多。在javascript代码的字符串中查找正则表达式字面值

我的代码目前能够很好地检测带引号的部分和注释,然后匹配大括号,括号和parens(当然,不要被引号和注释混淆,或者在引号内转义) 。这是我需要它做的,它做得很好......只有一个例外:

它可能会被正则表达式文字混淆。所以我希望能够在JavaScript中检测正则表达式字面值的一些帮助,所以我可以适当地处理它们。

事情是这样的:

function getRegExpLiterals (stringOfJavascriptCode) { 
    var output = []; 
    // todo! 
    return output; 
} 

var jsString = "var regexp1 = /abcd/g, regexp1 = /efg/;" 
console.log (getRegExpLiterals (jsString)); 

// should print: 
// [{startIndex: 13, length: 7}, {startIndex: 32, length: 5}] 
+0

任何正则表达式文字开始位?如果你只是想要那些容易做到的事情。 – FailedDev

+0

我需要确定它是一个正则表达式,因此只需查找斜杠就不会这样做。 – rob

回答

5

es5-lexer是使用一个非常准确的启发,从分工表达区分JS代码的正则表达式,也是一个JS词法分析器提供了可以用它来做一个令牌平转换确保生成的程序将由解析器完整的JS解析器以相同的方式进行解释。

,其确定/是否开始一个正则表达式中guess_is_regexp.js并且测试在scanner_test.js line 401

var REGEXP_PRECEDER_TOKEN_RE = new RegExp(
    "^(?:" // Match the whole tokens below 
    + "break" 
    + "|case" 
    + "|continue" 
    + "|delete" 
    + "|do" 
    + "|else" 
    + "|finally" 
    + "|in" 
    + "|instanceof" 
    + "|return" 
    + "|throw" 
    + "|try" 
    + "|typeof" 
    + "|void" 
    // Binary operators which cannot be followed by a division operator. 
    + "|[+]" // Match + but not ++. += is handled below. 
    + "|-" // Match - but not --. -= is handled below. 
    + "|[.]" // Match . but not a number with a trailing decimal. 
    + "|[/]" // Match /, but not a regexp. /= is handled below. 
    + "|," // Second binary operand cannot start a division. 
    + "|[*]" // Ditto binary operand. 
    + ")$" 
    // Or match a token that ends with one of the characters below to match 
    // a variety of punctuation tokens. 
    // Some of the single char tokens could go above, but putting them below 
    // allows closure-compiler's regex optimizer to do a better job. 
    // The right column explains why the terminal character to the left can only 
    // precede a regexp. 
    + "|[" 
    + "!" // !   prefix operator operand cannot start with a division 
    + "%" // %   second binary operand cannot start with a division 
    + "&" // &, &&  ditto binary operand 
    + "(" // (   expression cannot start with a division 
    + ":" // :   property value, labelled statement, and operand of ?: 
      //    cannot start with a division 
    + ";" // ;   statement & for condition cannot start with division 
    + "<" // <, <<, << ditto binary operand 
    // !=, !==, %=, &&=, &=, *=, +=, -=, /=, <<=, <=, =, ==, ===, >=, >>=, >>>=, 
    // ^=, |=, ||= 
    // All are binary operands (assignment ops or comparisons) whose right 
    // operand cannot start with a division operator 
    + "=" 
    + ">" // >, >>, >>> ditto binary operand 
    + "?" // ?   expression in ?: cannot start with a division operator 
    + "[" // [   first array value & key expression cannot start with 
      //    a division 
    + "^" //^   ditto binary operand 
    + "{" // {   statement in block and object property key cannot start 
      //    with a division 
    + "|" // |, ||  ditto binary operand 
    + "}" // }   PROBLEMATIC: could be an object literal divided or 
      //    a block. More likely to be start of a statement after 
      //    a block which cannot start with a /. 
    + "~" // ~   ditto binary operand 
    + "]$" 
    // The exclusion of ++ and -- from the above is also problematic. 
    // Both are prefix and postfix operators. 
    // Given that there is rarely a good reason to increment a regular expression 
    // and good reason to have a post-increment operator as the left operand of 
    // a division (x++/y) this pattern treats ++ and -- as division preceders. 
); 
+0

感谢迈克,我很可能将来会用完整的词法分析器,这是一件令人印象深刻的工作(正如您也写过的,并且我已广泛使用的美化剂) – rob

+0

@rob,不客气。快乐乐兴。 –

相关问题