在运行时编译函数C++编译时编译的函数的数量

我正在创建脚本语言，首先解析代码，然后将函数（执行代码）复制到一个缓冲区\内存作为解析的代码。在运行时编译函数C++编译时编译的函数的数量

有一种方法来复制功能的二进制代码来缓冲，然后执行整个缓冲区？我需要一次执行所有的功能以获得更好的性能。

要明白我的问题，以最好的，我想要做这样的事情：

#include <vector> 
using namespace std; 

class RuntimeFunction; //The buffer to my runtime function 

enum ByteCodeType { 
    Return, 
    None 
}; 

class ByteCode { 
    ByteCodeType type; 
} 

void ReturnRuntime() { 
    return; 
} 

RuntimeFunction GetExecutableData(vector<ByteCode> function) { 
    RuntimeFunction runtimeFunction=RuntimeFunction(sizeof(int)); //Returns int 
    for (int i = 0 ; i < function.size() ; i++) { 
     #define CurrentByteCode function[i] 
     if (CurrentByteCode.Type==Return) { 
      runtimeFunction.Append(&ReturnRuntime); 
     } //etc. 
     #undef 
    } 
    return runtimeFunction; 
} 

void* CallFunc(RuntimeFunction runtimeFunction,vector<void*> custom_parameters) { 
    for (int i=custom_parameters-1;i>=0;--i) { //Invert parameters loop 
     __asm { 
      push custom_parameters[i] 
     } 
    } 
    __asm { 
     call runtimeFunction.pHandle 
    } 
}

来源

2012-07-29 Super File

麦克·鲍尔（LuaJIT2的）评论（http://article.gmane.org/gmane.comp.lang.lua.general/75426）都非常值得一读。 – ephemient 2012-07-29 01:29:57

谢谢，但我不需要在运行时生成代码。我需要结合一些函数（也使用const参数）。我只是不想见到大会，我讨厌它。 – 2012-07-29 01:32:20

你的目标（更好的表现，不学会集会）是相互矛盾的。实际上，为了获得良好的性能，您需要比汇编更低的级别，并了解高速缓存行为，流水线，数据依赖性等。 – 2012-07-29 03:41:42

有许多这样做，这取决于你想有多深，以进入发电运行时代码的方法，但一个相对简单的方法是使用线程代码和线程代码解释器。

基本上，螺纹代码由函数指针阵列的，且解释器穿过阵列调用每个指向功能。棘手的部分是你通常每个函数都返回数组元素的地址，它包含一个指向下一个要调用的函数的指针，这允许你在解释器中毫不费力地实现分支和调用等事情。

通常情况下，如：

typedef void *(*tc_func_t)(void *, runtime_state_t *); 

void *interp(tc_func_t **entry, runtime_state_t *state) { 
    tc_func_t *pc = *entry; 
    while (pc) pc = (*pc)(pc+1, state); 
    return entry+1; 
}

这就是整个解释器。 runtime_state_t是某种包含一些运行时状态（通常是一个或多个堆栈）的数据结构。通过创建的tc_func_t函数指针阵列并与函数指针（以及可能的数据）填充它们，用一个空指针结束调用它，然后调用interp与含有数组的开始的变量的地址。所以，你可能有类似：

void *add(tc_func_t *pc, runtime_state_t *state) { 
    int v1 = state->data.pop(); 
    int v2 = state->data.pop(); 
    state->data.push(v1 + v2); 
    return pc; } 
void *push_int(tc_func_t *pc, runtime_state_t *state) { 
    state->data.push((int)*pc); 
    return pc+1; } 
void *print(tc_func_t *pc, runtime_state_t *state) { 
    cout << state->data.pop(); 
    return pc; } 

tc_func_t program[] = { 
    (tc_func_t)push_int, 
    (tc_func_t)2, 
    (tc_func_t)push_int, 
    (tc_func_t)2, 
    (tc_func_t)add, 
    (tc_func_t)print, 
    0 
}; 

void run_prgram() { 
    runtime_state_t state; 
    tc_func_t *entry = program; 
    interp(&entry, &state); 
}

调用run_program运行的小程序，增加了2 + 2和输出结果。

现在，您可能会对interp的稍微奇怪的调用设置感到困惑，entry参数中有一个额外的间接级别。这使您可以使用interp自己作为一个线程代码数组的函数，随后指向另一个数组，它会做一个线程代码调用。

编辑

像这样的线程代码与性能的最大问题 - 螺纹编码的解释是非常不友好的分支预测，所以性能是相当多，每一个线程指令调用锁定分支错误预测恢复时间。

如果你想要更多的性能，你很可能要到全运行时代码生成。 LLVM提供了一个很好的，与机器无关的接口，以及用于常见平台的相当不错的优化器，可以在运行时生成相当不错的代码。在[实施口译]

来源

2012-07-29 03:15:54

谢谢克里斯，但它是（性能相似）我目前的执行方式。当我解析代码时，它给了地址和一个对象，然后推动对象并调用地址。顺便说一句，你在这里给了很酷的主意。 – 2012-07-29 08:39:35

在运行时编译函数C++编译时编译的函数的数量

回答

相关问题