从demangled符号中提取类

-2

我正在尝试使用boost::regex从nm的demangled符号输出中提取（完整）类名。此示例程序从demangled符号中提取类

#include <vector> 

namespace Ns1 
{ 
namespace Ns2 
{ 
    template<typename T, class Cont> 
    class A 
    { 
    public: 
     A() {} 
     ~A() {} 
     void foo(const Cont& c) {} 
     void bar(const A<T,Cont>& x) {} 

    private: 
     Cont cont; 
    }; 
} 
} 

int main() 
{ 
    Ns1::Ns2::A<int,std::vector<int> > a; 
    Ns1::Ns2::A<int,std::vector<int> > b; 
    std::vector<int> v; 

    a.foo(v); 
    a.bar(b); 
}

将产生类中的下列符号A

Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::A() 
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::bar(Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > > const&) 
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::foo(std::vector<int, std::allocator<int> > const&) 
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::~A()

我想优选使用单一的正则表达式模式来提取类（实例）名称Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >，但是我有问题解析<>对中递归发生的类说明符。

有谁知道如何使用正则表达式模式（这是由boost::regex支持）做到这一点？

我的解决方案（基于David Hammen的答案，因此接受）：

我不使用（单）正则表达式来提取类和命名空间的符号。我已经创建了一个简单的函数，剥掉从符号串的尾部包围字符对（例如<>或()）：

std::string stripBracketPair(char openingBracket,char closingBracket,const std::string& symbol, std::string& strippedPart) 
{ 
    std::string result = symbol; 

    if(!result.empty() && 
     result[result.length() -1] == closingBracket) 
    { 
     size_t openPos = result.find_first_of(openingBracket); 
     if(openPos != std::string::npos) 
     { 
      strippedPart = result.substr(openPos); 
      result = result.substr(0,openPos); 
     } 
    } 
    return result; 
}

这在从符号提取命名空间/类其他两种方法中使用：

std::string extractNamespace(const std::string& symbol) 
{ 
    std::string ns; 
    std::string strippedPart; 
    std::string cls = extractClass(symbol); 
    if(!cls.empty()) 
    { 
     cls = stripBracketPair('<','>',cls,strippedPart); 
     std::vector<std::string> classPathParts; 

     boost::split(classPathParts,cls,boost::is_any_of("::"),boost::token_compress_on); 
     ns = buildNamespaceFromSymbolPath(classPathParts); 
    } 
    else 
    { 
     // Assume this symbol is a namespace global function/variable 
     std::string globalSymbolName = stripBracketPair('(',')',symbol,strippedPart); 
     globalSymbolName = stripBracketPair('<','>',globalSymbolName,strippedPart); 
     std::vector<std::string> symbolPathParts; 

     boost::split(symbolPathParts,globalSymbolName,boost::is_any_of("::"),boost::token_compress_on); 
     ns = buildNamespaceFromSymbolPath(symbolPathParts); 
     std::vector<std::string> wsSplitted; 
     boost::split(wsSplitted,ns,boost::is_any_of(" \t"),boost::token_compress_on); 
     if(wsSplitted.size() > 1) 
     { 
      ns = wsSplitted[wsSplitted.size() - 1]; 
     } 
    } 

    if(isClass(ns)) 
    { 
     ns = ""; 
    } 
    return ns; 
}

std::string extractClass(const std::string& symbol) 
{ 
    std::string cls; 
    std::string strippedPart; 
    std::string fullSymbol = symbol; 
    boost::trim(fullSymbol); 
    fullSymbol = stripBracketPair('(',')',symbol,strippedPart); 
    fullSymbol = stripBracketPair('<','>',fullSymbol,strippedPart); 

    size_t pos = fullSymbol.find_last_of(':'); 
    if(pos != std::string::npos) 
    { 
     --pos; 
     cls = fullSymbol.substr(0,pos); 
     std::string untemplatedClassName = stripBracketPair('<','>',cls,strippedPart); 
     if(untemplatedClassName.find('<') == std::string::npos && 
     untemplatedClassName.find(' ') != std::string::npos) 
     { 
      cls = ""; 
     } 
    } 

    if(!cls.empty() && !isClass(cls)) 
    { 
     cls = ""; 
    } 
    return cls; 
}

的buildNamespaceFromSymbolPath()方法简单地串接有效命名空间部分：

std::string buildNamespaceFromSymbolPath(const std::vector<std::string>& symbolPathParts) 
{ 
    if(symbolPathParts.size() >= 2) 
    { 
     std::ostringstream oss; 
     bool firstItem = true; 
     for(unsigned int i = 0;i < symbolPathParts.size() - 1;++i) 
     { 
      if((symbolPathParts[i].find('<') != std::string::npos) || 
       (symbolPathParts[i].find('(') != std::string::npos)) 
      { 
       break; 
      } 
      if(!firstItem) 
      { 
       oss << "::"; 
      } 
      else 
      { 
       firstItem = false; 
      } 
      oss << symbolPathParts[i]; 
     } 
     return oss.str(); 
    } 
    return ""; 
}

至少isClass()方法使用正则表达式来扫描一个构造方法中的所有码元（不幸似乎不为类只含有成员函数工作）：

std::set<std::string> allClasses; 

bool isClass(const std::string& classSymbol) 
{ 
    std::set<std::string>::iterator foundClass = allClasses.find(classSymbol); 
    if(foundClass != allClasses.end()) 
    { 
     return true; 
    } 

std::string strippedPart; 
    std::string constructorName = stripBracketPair('<','>',classSymbol,strippedPart); 
    std::vector<std::string> constructorPathParts; 

    boost::split(constructorPathParts,constructorName,boost::is_any_of("::"),boost::token_compress_on); 
    if(constructorPathParts.size() > 1) 
    { 
     constructorName = constructorPathParts.back(); 
    } 
    boost::replace_all(constructorName,"(","[\\(]"); 
    boost::replace_all(constructorName,")","[\\)]"); 
    boost::replace_all(constructorName,"*","[\\*]"); 

    std::ostringstream constructorPattern; 
    std::string symbolPattern = classSymbol; 
    boost::replace_all(symbolPattern,"(","[\\(]"); 
    boost::replace_all(symbolPattern,")","[\\)]"); 
    boost::replace_all(symbolPattern,"*","[\\*]"); 
    constructorPattern << "^" << symbolPattern << "::" << constructorName << "[\\(].+$"; 
    boost::regex reConstructor(constructorPattern.str()); 

    for(std::vector<NmRecord>::iterator it = allRecords.begin(); 
     it != allRecords.end(); 
     ++it) 
    { 
     if(boost::regex_match(it->symbolName,reConstructor)) 
     { 
      allClasses.insert(classSymbol); 
      return true; 
     } 
    } 
    return false; 
}

如所提到的如果类没有提供任何构造函数，则last方法不能安全地找到类名，并且在大的符号表上很慢。但至少这似乎涵盖了你可以从nm的符号信息中得到什么。

我已经离开regex这个问题的标签，其他用户可能会发现正则表达式不是正确的方法。

来源

2012-09-16 πάντα ῥεῖ

不'nm'配备了'--demangle'的选择吗？为什么要重新发起全面挂钩？ –

@KerrekSB我已经使用了demangled符号，我想从它们中提取类名。 –

哦，好的。但是，看起来模板语法没有描述常规语言。这更像是XML（[我们都知道这是怎么回事]]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454））。 –

这很难用perl的扩展正则表达式来实现，它比C++中的任何东西都强大得多。我建议不同的粘性：

首先摆脱看起来不像数据等功能的东西（寻找D标志符）。像virtual thunk to this，virtual table for that等东西，也会挡你的路;在你做主要解析之前摆脱它们。这个过滤是正则表达式可以提供帮助的地方。你应该留下的是功能。对于每个功能，

摆脱最后右括号后的东西。例如，Foo::Bar(int,double) const变成Foo::Bar(int,double)。
剥离函数参数。这里的问题是你可以在圆括号中有括号，例如，函数指针作为参数的函数，这又可能将函数指针作为参数。不要使用正则表达式。使用括号匹配的事实。在此步骤之后，Foo::Bar(int,double)变为Foo::Bar，而a::b::Baz<lots<of<template>, stuff>>::Baz(int, void (*)(int, void (*)(int)))变成a::b::Baz<lots<of<template>, stuff>>::Baz。
现在工作在前端。使用类似的方案来解析该模板的内容。有了这个，那个混乱的a::b::Baz<lots<of<template>, stuff>>::Baz变成了a::b::Baz::Baz。
在这个阶段，你的功能看起来像a::b:: ... ::ClassName::function_name。在某些命名空间中，这里有一个小问题，那就是自由函数。破坏者是一个阶级的死亡赠品;毫无疑问，如果函数名以代字号开头，那么您有一个类名。只要你没有一个名称空间Foo，你可以在其中定义一个函数Foo，构造函数是一个近似的赠品。
最后，您可能需要重新插入您剪下的模板内容。

来源

2012-09-16 13:17:18

感谢您的回答，我现在已经遵循了这个方向，但仍然使用RE。我试图避免为'（）'''''对匹配的东西写一个解析器，但它似乎是更好的方法，而不是RE。 –

它确实不是。看看perl模块Text :: Balanced。它具有perl正则表达式的全部功能，但它仍然使用计数机制。 –

感谢David的提示。似乎简单剥离包围字符对对分析来说更有希望。只要我对结果感到满意，我就会发布解决方案。我也试图提取名称空间，所以我至少要考虑查找类构造函数方法来区分（嵌套）类和命名空间的问题。 –

我用简单的C++ function进行了提取。

见链接，完整的代码，背后的想法是：

有由::分开的基本级别的令牌。
如果有N个基本级别的令牌，第一N-1描述的className，最后是功能
我们通过(或<
上去水平（+1）的收盘)或>我们进入下一层（ - 1）
基本水平当然，这意味着 - level == 0

我有强烈的感觉，这不能用正则表达式来完成，因为我们有括号的无限的水平。我在我的功能255 - 可以切换到std::stack<char>无限级。

功能：

std::vector<std::string> parseCppName(std::string line) 
{ 
    std::vector<std::string> retVal; 
    int level = 0; 
    char closeChars[256]; 

    size_t startPart = 0; 
    for (size_t i = 0; i < line.length(); ++i) 
    { 
     if (line[i] == ':' && level == 0) 
     { 
      if (i + 1 >= line.length() || line[i + 1] != ':') 
      throw std::runtime_error("missing :"); 
      retVal.push_back(line.substr(startPart, i - startPart)); 
      startPart = ++i + 1; 
     } 
     else if (line[i] == '(') { 
     closeChars[level++] = ')'; 
     } 
     else if (line[i] == '<') { 
     closeChars[level++] = '>'; 
     } 
     else if (level > 0 && line[i] == closeChars[level - 1]) { 
     --level; 
     } 
     else if (line[i] == '>' || line[i] == ')') { 
     throw std::runtime_error("Extra)>"); 
     } 
    } 
    if (level > 0) 
     throw std::runtime_error("Missing)>"); 
    retVal.push_back(line.substr(startPart)); 
    return retVal; 
}

来源

2012-09-16 17:52:26 PiotrNycz

我同意你使用正则表达式来实现这一点的感觉。到目前为止，我已经开发了一个基于@David Hammen的提示的解析器。可能是当我看到需要改进当前的解决方案时，我会回到您的提案。另请注意我对命名空间提取用例的评论。 –

我认为这将很难区分嵌套类和命名空间而不记住所有行。解析所有行后 - 每个N-1部分（由我的函数给出）命名一个类。其他是命名空间。但是这将被空课程打破，我的意思是没有功能的课程，c-tors和d-tors。 – PiotrNycz

其实我会存储所有输入行并查找构造函数符号以在其中找到“真实”类。这不会涵盖任何只包含静态函数并且没有（甚至没有默认）构造函数的类。但是这对我的目的来说还是很好的。 –

从demangled符号中提取类

回答

相关问题