顺序和障碍：x86上'lwsync'的等效指令是什么？

我的代码很简单，如下所示。我发现人民币和wmb用于读写，但没有找到一般用途。 lwsync在PowerPC上可用，但什么是x86的替代品？提前致谢。顺序和障碍：x86上'lwsync'的等效指令是什么？

#define barrier() __asm__ volatile ("lwsync") 
... 
    lock() 
    if(!pInst); 
    { 
     T* temp=new T; 
     barrier(); 
     pInst=temp; 
    } 
    unlock();

来源

2010-08-14 schemacs

'lwsync'做什么？ – 2010-08-17 10:52:48

rmb()和wmb（）是Linux内核函数。还有mb()。

x86指令是lfence,sfence和mfence，IIRC。

来源

2010-08-15 22:46:04

rmb（）和wmb是汇编代码中的宏而不是函数。我只想看看如果没有设置屏障，gcc将如何优化它。 – schemacs 2010-08-16 09:10:14

如果你想特别偏执，你可以使用'asm volatile（“whatever”::: memory）;'这会告诉GCC任意内存地址可能被破坏了。如果一个负载被GCC缓存在一个寄存器中，我不认为发出这些指令就足够了。 – 2010-08-16 21:44:58

有一个在Cilk的运行特定文件，你可能会发现有趣的即Cilk的-sysdep.h中，其中包含系统特定的映射w.r.t记忆障碍。我抽出一小部分w.r.t乌尔在x86即I386

 
    file:-- cilk-sysdep.h (the numbers on the LHS are actually line numbers) 

    252  * We use an xchg instruction to serialize memory accesses, as can 
    253  * be done according to the Intel Architecture Software Developer's 
    254  * Manual, Volume 3: System Programming Guide 
    255  * (http://www.intel.com/design/pro/manuals/243192.htm), page 7-6, 
    256  * "For the P6 family processors, locked operations serialize all 
    257  * outstanding load and store operations (that is, wait for them to 
    258  * complete)." The xchg instruction is a locked operation by 
    259  * default. Note that the recommended memory barrier is the cpuid 
    260  * instruction, which is really slow (~70 cycles). In contrast, 
    261  * xchg is only about 23 cycles (plus a few per write buffer 
    262  * entry?). Still slow, but the best I can find. -KHR 
    263  * 
    264  * Bradley also timed "mfence", and on a Pentium IV xchgl is still quite a bit faster 
    265  * mfence appears to take about 125 ns on a 2.5GHZ P4 
    266  * xchgl apears to take about 90 ns on a 2.5GHZ P4 
    267  * However on an opteron, the performance of mfence and xchgl are both *MUCH MUCH BETTER*. 
    268  * mfence takes 8ns on a 1.5GHZ AMD64 (maybe this is an 801) 
    269  * sfence takes 5ns 
    270  * lfence takes 3ns 
    271  * xchgl takes 14ns 
    272  * see mfence-benchmark.c 
    273  */ 
    274  int x=0, y; 
    275  __asm__ volatile ("xchgl %0,%1" :"=r" (x) :"m" (y), "0" (x) :"memory"); 
    276 }

问题，我喜欢怎么样这是xchgl似乎更快:)虽然你应该真正落实和检查出来的事实。

来源

2011-08-24 03:48:28

在P6上更快？ mfence比AMD64上的xchg更快。 – doug65536 2012-09-05 13:12:00

你不说什么锁定和解锁在这段代码。我假设他们是互斥操作。在powerpc上，一个互斥锁获取函数将使用一个isync（没有这个函数，硬件可以在lock（）之前评估你的if（！pInst），并且在unlock（）中将会有一个lwsync（或同步，如果你的互斥体实现是古代的话）。

因此，假设所有访问（读取和写入）Pinst表示是由你的锁保护和解锁方法的屏障的使用是多余的。解锁将有足够的屏障来确保在解锁操作完成之前pInst存储是可见的（以便在任何后续锁获取之后，假定使用相同的锁，它将可见）。

在x86和x64的锁（）会使用某种形式的LOCK前缀的指令，自动拥有双向击剑行为。

你在x86和x64解锁只需将存储指令（除非你用你的一些CS中的特殊字符串的指令，在这种情况下，你需要一个SFENCE）。

手册：

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

对所有围栏良好的信息以及锁定前缀（和时所隐含的）的影响。

ps。在你的解锁码，你还必须有一些强制执行的编译器排序（所以如果它仅仅是一个商店为零，你也需要像海合会风格ASM _ 挥发性 _（“” ::: “内存”））。

来源

2012-01-27 19:36:33

顺序和障碍：x86上'lwsync'的等效指令是什么？

回答

相关问题