查找C++字符串中的第一个printf格式序列

我搜索最简洁高效的方式来查找C++字符串中的第一个printf格式序列（转换规范）（我不能使用std::regex，因为它们尚未在大多数情况下实现编译器）。查找C++字符串中的第一个printf格式序列

所以，问题是写一个优化的函数，将来自输入字符串str返回第printf -format序列pos的开始和其长度n：

inline void detect(const std::string& str, int& pos, int& n);

例如，对于：

%d- >pos = 0和n = 2
the answer is: %05d- >pos = 15和n = 4
the answer is: %% %4.2f haha- >pos = 18和n = 5

如何做到这一点（聪明和狡猾的方式，欢迎）？

来源

2013-07-19 Vincent

为什么不只是抓住一个开源的'printf'实现，并将解析器位从其中解压出来？ –

你不需要正则表达式。 printf格式说明符格式可以从左到右一次一个字符地解析。 –

如果您查看完整的['printf（）']（http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html）格式的POSIX规范，则会出现很多可能出现的字符在格式规范中。例如，'％100 $＃+ - 0'* 101 $。* 102 $ llX'可能是'有效'的，尽管标志的某些组合没有意义。 –

向前扫描%，然后从那里解析内容。有一些古怪的，但不是那么糟糕（不知道你想把它做成inline寿）。一般原则（我只是打字，因为我可能不是最好的代码形式 - 我没有试图编译它）。

inline void detect(const std::string& str, int& pos, int& n) 
{ 
    std::string::size_type last_pos = 0; 
    for(;;) 
    { 
     last_pos = str.find('%', last_pos) 
     if (last_pos == std::string::npos) 
      break; // Not found anythin. 
     if (last_pos == str.length()-1) 
      break;  // Found stray '%' at the end of the string. 
     char ch = str[last_pos+1]; 

     if (ch == '%') // double percent -> escaped %. Go on for next. 
     { 
      last_pos += 2; 
      continue; 
     } 
     pos = last_pos; 
     do 
     { 
      if (isdigit(ch)) || ch == '.' || ch == '-' || ch == '*' || 
       ch == '+' || ch == 'l' || ch == 'L' || ch == 'z' || 
       ch == 'h' || ch == 't' || ch == 'j' || ch == ' ' || 
       ch == '#' || ch == '\'') 
      { 
       last_pos++; 
       ch = str[last_pos+1]; 
      } 
      else 
      { 
       // The below string may need appending to depending on version 
       // of printf. 
       if (string("AacdeEfFgGiopusxX").find(ch) != std::string::npos) 
       { 
        // Do something about invalid string? 
       } 
       n = last_pos - pos; 
       return; 
       } 
     } while (last_pos < str.length()); 
    } 
}

EDIT2：该位可能是更好的写法如下：

   if (isdigit(ch)) || ch == '.' || ch == '-' || ch == '*' || 
       ch == '+' || ch == 'l' || ch == 'L' || ch == 'z' || 
       ch == 'h' || ch == 't' || ch == 'j' || ch == ' ' || 
       ch == '#' || ch == '\'') ... 

if (string(".-*+lLzhtj #'").find(ch) != std::string::npos) ...

现在，那是你完成家庭作业。请回报你得到什么等级。

编辑：应该指出的是，一些常规printf将“拒绝”的东西被上面的代码所接受，例如， “％....... 5 ...... 6f”，“％5.8d”，“％-5-6d”或“％----- 09 --- 5555555555555555llllld”。如果你想要代码拒绝这些事情，这不是一个额外的工作量，只需要一点逻辑来检查“我们看过这个字符之前”的“检查特殊字符或数字”，并且在大多数情况下，只能允许一次特殊字符。正如评论所说，我可能错过了一些有效的格式说明符。如果你还需要应对“这个''''不允许'c''或这样的规则，它会变得更加棘手。但是，如果输入不是“恶意的”（例如，你想注释在哪一行上有格式说明符在工作的C源文件中），上述应该工作得很好。

来源

2013-07-19 23:58:18

'h'是一个长度修饰符（像'L'和'l'），就像'j'和't'一样。空格和'＃'都是标志;在POSIX 2008中，'''也是一个标志。您似乎错过了作为转换说明符的'A'，'a'，'E'，'F'，'g'，'G'，'i'。 'z'是长度修饰符而不是转换说明符，所以需要在'z'后面加上转换说明符。 POSIX支持'5 $'等来按位置指定参数。实际上验证这些东西实际上是非常艰苦的工作，而不是接受可能合法的字符序列。是否有必要取决于你要做什么。 –

好吧，我只是没有查看它，所以我想可能会更糟糕。我已经更新了您提到的额外内容，并且移动了'z'。是的，验证完成确定似乎有点棘手，所以我只是决定“是否可能是一个有效的格式说明符”。 –

我刚刚超过400行代码（不包括注释和测试程序），它将'printf（）'格式的字符串解析为结构，或将其中一个结构转换为格式字符串。我没有表现出来，因为它对于SO来说太长了。它是C代码，而不是C++代码。 –

查找C++字符串中的第一个printf格式序列

回答

相关问题