pause 指令与 rep;nop

beyes · 发表于 2011-4-14 11:37:42

rep;nop 指令是执行多个 nop 还是 1 个 nop?
本来，加上 rep 前缀是一直执行 rep 后的指令直到 ECX 中的值为 0 。在内核代码中，如在 spin_lock 的实现里，会看到 rep;nop 这样的语句，很容易想到会执行多个 nop 。但事实上它不是这样。看下面的演示程序：

[C++] 纯文本查看 复制代码

#include <stdio.h>

#define nops(times) __asm__ __volatile__("rep;nop":"=c"(result):"c"(times))
#define movstr(src,des) __asm__ __volatile__(   "cld\n\t"       \
                                                "rep;movsb"     \
                                                :"=c"(result)           \
                                                :"S"(src),"D"(des),"c"(times))
int main()
{
        unsigned int times, result;
        char src[5] = {'a', 'b', 'c', 'd', 'e'};
        char des[5];
        int i;

        times = 5;
        result = 5;

        movstr(src,des);
        printf ("result = %d\n", result);

        for (i = 0; i < times; i++)
                printf ("%c  ", des[i]);

        printf ("\n");

        times = 5;
        nops(times);
        printf ("result = %d\n", result);

        return (0);
}

运行输出：

[beyes@SLinux C]$ ./rep
result = 0
a b c d e
result = 5

上面程序中， movstr() 宏用来演示一般的 rep 前缀的使用方式，这里用一段内联汇编将数组 src 中的 5 个字符都拷贝到 des 数组中去。刚开始时，result 变量的值为 5 。但在执行完 movstr() 后，它变为 0 ，这个值是通过 ecx 寄存器传递过去的，因为正是使用了 rep 前缀，到复制最后，ecx 的值就会减为 0 。接着，我们执行 nops() 宏。nops 宏就是用来测试 rep;nop 的，看 rep;nop 是不是会执行 5 次，如果是的话，那么 result 到最后会变为 0 ，但最终结果不是，而是 5 。由此可见，rep;nop 并不等同于执行了 5 个 nop 。那么 rep;nop 是什么呢？通过反汇编程序可以看到，rep;nop 被翻译成 pause 指令，且两者的指令码都是 f3 90 。那么 pause 指令是做什么的呢？这在 Intel 手册里有解释：

PAUSE—Spin Loop Hint
Description
Improves the performance of spin-wait loops. When executing a “spin-wait loop,” a Pentium 4
processor suffers a severe performance penalty when exiting the loop because it detects a
possible memory order violation. The PAUSE instruction provides a hint to the processor that
the code sequence is a spin-wait loop. The processor uses this hint to bypass the memory order
violation in most situations, which greatly improves processor performance. For this reason, it
is recommended that a PAUSE instruction be placed in all spin-wait loops.
提升 spin-wait loops(自旋锁循环等待)的性能。在执行一个 spin-wait loop 时，Pentium4 处理器会
遇到严重的性能损失.PAUSE 指令会向处理器提供一种提示：告诉处理器所执行的代码序列是一个 spin-wait loop。
处理器会根据这个提示而避开内存序列冲突(memory order violation)，也就是说对 spin-wait loop 不做缓存，不做指令
重新排序等动作。这样就可以大大的提高了处理器的性能。正是基于此，才建议在 spin-wait loops 中使用 pasuse 指令。
An additional function of the PAUSE instruction is to reduce the power consumed by a Pentium
4 processor while executing a spin loop. The Pentium 4 processor can execute a spin-wait loop
extremely quickly, causing the processor to consume a lot of power while it waits for the
resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop
greatly reduces the processor’s power consumption.
PAUSE指令的另外一个功能是让 Pentium4 处理器在执行 spin-wait loop 时可以减少电源的消耗。
在等待资源而执行自旋锁等待时，Pentium4 处理器以极快的速度执行自旋等待时，将会消耗很多电能，
但使用 pause 指令则可以极大的减少处理器的电能消耗。
This instruction was introduced in the Pentium 4 processors, but is backward compatible with
all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP
instruction.
PAUSE 指令在 Pentium4 处理器中引入，但它也是向前兼容的。在早先的 IA-32 处理器里，PAUSE 指令实际上就相当于 NOP 指令。
The Pentium 4 processor implements the PAUSE instruction as a pre-defined delay. The delay
is finite and can be zero for some processors. This instruction does not change the architectural
state of the processor (that is, it performs essentially a delaying no-op operation).
Pentium4 处理器以一种预延迟(pre-defined delay)的技术来实现 PAUSE 指令。这种延迟也是有限度的，并且在一些处理器上是零延迟。该指令不会改变处理器的处理器的状态。

内核中的 rep_nop() 函数对 rep;nop 指令进行了包装：

[C++] 纯文本查看 复制代码

/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
static inline void rep_nop(void)
{
    __asm__ __volatile__("rep;nop": : :"memory");
}

使用 rep_nop() 函数在内核中亦有样例：

[C++] 纯文本查看 复制代码

static void delay_tsc(unsigned long loops)
{
    unsigned long bclock, now;

    preempt_disable();        /* TSC's are per-cpu */
    rdtscl(bclock);
    do {
        rep_nop();
        rdtscl(now);
    } while ((now-bclock) < loops);
    preempt_enable();
}

上面函数，通过不断的读取 TSC 的值来比较是否已经达到要求的 loops 来延迟，在延迟的过程中不断的执行 rep_nop() 函数进行 pause 。

		自动登录	找回密码
密码			立即注册