CPUID 的一个汇编实例

beyes · 发表于 2009-11-27 11:24:45

使用 CPUID 查看 CPUID 生成的厂商 ID 字符串，目的是了解汇编程序编写的基本框架，编译，连接，调试。程序代码：

#cpuid.s Sample program to extract the processor Vendor ID

.section .data
output:
    .ascii    "The processor Vendor ID is 'xxxxxxxxxxxx'\n"

.section .text
.global    _start

_start:
    movl $2, %eax
    cpuid

    movl    $output, %edi
    movl     %ebx,    28(%edi)
    movl    %edx,    32(%edi)
    movl    %ecx,    36(%edi)
    movl    $4,     %eax
    movl    $1,     %ebx
    movl    $output, %ecx
    movl    $42,     %edx
    int    $0x80
    movl    $1,     %eax
    movl    $0,     %ebx
    int    $0x80

编译：

as -o cpuid.o cpuid.s

连接：

ld -o cpuid cpuid.o

执行输出：

$ ./cpuid
The processor Vendor ID is 'GenuineIntel'

如果是使用 gcc 来编译程序，那么要把程序 cpuid.s 中的 _start 改成 main ，因为 gcc 超找的标签是 main ，而 as 查找的是 _start 。改了之后，可以如下编译并生成可执行文件：

beyes@beyes-groad:~/programming/assembly/cpuid$ gcc -o cpuid cpuid.s
beyes@beyes-groad:~/programming/assembly/cpuid$ ./cpuid
The processor Vendor ID is 'GenuineIntel'

使用 gdb 来调试程序
使用 gdb 调试汇编程序，首先必须使用 -gstabs 参数重新编译汇编源代码：

$ as -gstabs -o cpuid.o cpuid.s
ld -o cpuid cpuid.o

需要注意的是，调试完程序后，要重新用不使用 -gstabs 参数的方式再编译生成程序。因为有了 -gstabs 参数，程序里会添加进调试信息，整个程序会变得庞大，如上面的程序使用 -gstabs 和不使用 -gstabs 选项所生成的程序大小差别为：

-rwxr-xr-x 1 beyes beyes 991 2009-11-27 10:51 cpuid #使用了 -gstabs 选项
-rwxr-xr-x 1 beyes beyes 663 2009-11-27 10:53 cpuid #不使用 -gstabs 选项

可见，用的比不用的差不多比不用的大了它的 1/2 大小。假如程序越大，那么产生的调试信息也就越多。

经过 -gstabs 编译后，现在可以用 gdb 进行调试了：

beyes@beyes-groad:~/programming/assembly/cpuid$ gdb cpuid
GNU gdb (GDB) 7.0-ubuntu
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/beyes/programming/assembly/cpuid/cpuid...done.
(gdb)

在第 10 行，也就是在 _start 处下断 (break *_start)，然后再使用 run 命令，但会看到程序不会在断点处停下，而是直接运行完整个程序。可以说，这是一个长期的缺陷，也可能不算是一个缺陷。解决这个问题的办法是，在 _start 标签后面加一条 NOP 指令，如：

_start:
nop
movl $0, %eax
cpuid

加了 NOP 之后，那么就可以如 break *_start+1 这样的方式来下断：

(gdb) b *_start+1
Breakpoint 1 at 0x8048075: file cpuid.s, line 12.
(gdb) r
Starting program: /home/beyes/programming/assembly/cpuid/cpuid

Breakpoint 1, _start () at cpuid.s:12
12 movl $0, %eax

后面，就可以使用 next 或者 step 来跟踪调试。

要查看相关的数据内容，经常会使用到以下 3 个命令：

数据命令	描述
info registers	显示所有寄存器的值
print	显示特定寄存器或来自程序的变量的值
x	显示特定内存位置的内容

使用 info registers 命令来查看指令如何影响所有寄存器是很方便的：

(gdb) info registers
eax            0xa    10
ecx            0x6c65746e    1818588270
edx            0x49656e69    1231384169
ebx            0x756e6547    1970169159
esp            0xbffff480    0xbffff480
ebp            0x0    0x0
esi            0x0    0
edi            0x80490ac    134516908
eip            0x8048081    0x8048081 <_start+13>
eflags         0x212    [ AF IF ]
cs             0x73    115
ss             0x7b    123
ds             0x7b    123
es             0x7b    123
fs             0x0    0
gs             0x0    0

如果要查看单个寄存器：

(gdb) info registers edi #查看 edi 这个寄存器
edi 0x80490ac 134516908

print 命令也可以用于显示各个寄存器的值，还可以在 print 后加一个修饰符就可以修改 print 命令输出格式：

print/d 显示十进制的值

print/t 显示二进制的值

print/x 显示十六进制的值

如下所示：

(gdb) print/x $ebx
$1 = 0x756e6547
(gdb) print/x $edx
$2 = 0x49656e69
(gdb) print/x $ecx
$3 = 0x6c65746e

x 命令用于显示特定内存位置的值。和 print 命令类似，可以使用修饰符修改 x 命令的输出。x 命令的格式是：

x/nyz

其中，n 是要显示的字段数，y 是输出格式，它可以是：

c 用于字符

d 用于十进制

x 用于十六进制

z 是要显示的字段的长度：

b 用于字节

h 用于16位字 (半字)

w 用于 32 位字

下面为使用 x 命令显示位于 output 标签的内存位置的值：

(gdb) x/42cb &output
0x80490ac <output>:    84 'T'    104 'h'    101 'e'    32 ' '    112 'p'    114 'r'    111 'o'    99 'c'
0x80490b4 <output+8>:    101 'e'    115 's'    115 's'    111 'o'    114 'r'    32 ' '    86 'V'    101 'e'
0x80490bc <output+16>:    110 'n'    100 'd'    111 'o'    114 'r'    32 ' '    73 'I'    68 'D'    32 ' '
0x80490c4 <output+24>:    105 'i'    115 's'    32 ' '    39 '\''    71 'G'    101 'e'    110 'n'    117 'u'
0x80490cc <output+32>:    105 'i'    110 'n'    101 'e'    73 'I'    110 'n'    116 't'    101 'e'    108 'l'
0x80490d4 <output+40>:    39 '\''    10 '\n'

上面，42 表示显示 output 变量 (&符号用于表明它是一个内存位置) 的前 42 个字节；c 表示一次显字符；b 表示一次显示一个字节。在输出的内容里，字符前面的数字是这个字符的 ASCII 码。当跟踪对内存位置进行操作的指令时，这个命令的特性价值无法衡量。

beyes · 发表于 2009-11-27 19:03:50

movl $0, %eax
给 eax 送入 0 值，目的在于这个输入可以使 cpuid 输出厂商 ID 字符串。

cpuid
执行 CPUID 指令。

执行 cpuid 指令后，就要用 3 个输出寄存器 ebx, edx, ecx 来收集指令的相应信息：
movl    $output, %edi
movl     %ebx,    28(%edi)
movl    %edx,    32(%edi)
movl    %ecx,    36(%edi)

上面的28, 32, 36 对应的是 "The processor Vendor ID is 'xxxxxxxxxxxx'\\n" 中的 'xxxxxxxxxxxx 的偏移位置。
movl    $output, %edi 是创建了一个指针 edi ，output 标签的内存位置被加载到 EDI 中。这里需要注意，EBX, EDX, ECX 的顺序，这个顺序并不是按照 B,C,D 来排序，比较奇怪。

经过上面的几条指令后，就在内存中放置好厂商的 ID 字符串了，下面的几条指令用来显示这些信息：
movl    $4,    %eax
movl    $1,    %ebx
movl    $output, %ecx
movl    $42,    %edx
int    $0x80

第一条指令，把 4 放到 eax 寄存器中。这个 4 表示系统调用值。在较新的内核中，这些系统调用定义在：
arch/x86/include/asm/unistd_32.h
文件中。比如 4 号系统调用是：

#define __NR_write 4

这里，每个系统调用都被定义为一个名称 (前面加上 __NR_) ，和它的系统调用号。

第二条指令，把 1 放到 ebx 寄存器中。这里的 1 是文件描述符，表示的标准输出 (STDOUT) .
所以，第 1 ，第 2 条指令合起来的意思是，调用 write() 系统调用，往标准输出上输出要显示的内容。

第三条指令，是字符串的开头送往 ecx 中。

第四条指令，是把字符串的长度送往 edx 中。

对比 write() 函数的原型：
write(int fd, const void *buf, size_t count);
从左到右，3 个参数分别对应着 ebx, ecx, edx 这 3 个寄存器中的值。可见：

EAX 包含系统调用值

EBX 包含要写入的文件描述符

ECX 包含字符串的开头

EDX 包含字符串的长度

最后，在显示了厂商的 ID 信息之后，就干净的退出程序。同样，Linux 系统调用可以为此提供帮助，这时使用系统 1 号调用 (#define __NR_exit 1) ，程序被正确的终止，并且返回到命令提示符。在使用 exit() 时，exit() 也有一个作为退出代码的参数，这个参数装载 ebx 寄存器中。

Linux 内核提供了许多可以很容易地从汇编应用程序访问的预置函数(如系统调用)，为了访问这些内核函数，必须使用 int 指令码，它生成具有 0x80　值的软件中断。执行的具体函数由 EAX 寄存器中的值来确定。如果没有这个内核函数，就必须自己把每个输出字符发送到正确的显示器 I/O 地址，这无疑要花费很多的时间。

beyes · 发表于 2009-12-1 13:44:25

在汇编中使用 C 库函数，对程序开发达到简洁省时的目的，代码如下：

#cpuid2.s View the CPUID Vendor ID string using C library calls
.section .data
output:
        .asciz "The processor Vendor ID is '%s'\\n"

.section .bss
        .lcomm buffer, 12

.section .text
.global _start

_start:
        movl $0, %eax
        cpuid
        movl $buffer, %edi
        movl %ebx, (%edi)
        movl %edx, 4(%edi)
        movl %ecx, 8(%edi)
        pushl $buffer
        pushl $output
        call printf
        addl $8, %esp
        pushl $0
        call exit

在程序中，使用了 C 库中函数 printf 。如果直接编译连接，那么编译通过，而连接时则会出错：

beyes@beyes-groad:~/programming/assembly/cpuid$ ld -o cpuid2 cpuid2.o
cpuid2.o: In function `_start':
(.text+0x1f): undefined reference to `printf'
cpuid2.o: In function `_start':
(.text+0x29): undefined reference to `exit'

为了连接 C 库，首先要确保这个 C 库在系统上可用。把 C 函数连接到汇编程序有两种方法。

第一种是静态连接 ( static linking ) 。静态连接把函数的目标代码直接连接到应用程序的可执行程序文件中，这样创建的可执行文件会很大；而且，如果同时运行程序的多个实例，就会造成内存浪费( 每个实例都有其自己的相同函数的拷贝 )。

第二种是动态连接 ( dynamic linking ) 。动态连接使用库的方式在程序运行时，由操作系统调用动态链接库，并且多个程序可以共享动态链接库。标准的 C 动态库位于 libc.so.x 文件中。我的系统是 ubuntu 9.10，所使用的是 libc.so.6 版本，其实这是一个软连接，指向真实的库名是 libc-2.10.1.so 。

如果程序使用 gcc 来编译(在程序中 _start 改成 main)，则文件会自动连接到 C 程序。

在用 ld 连接目标文件时，为了连接 libc.so 文件，必须使用 -l 参数，在指定库时，并不需要指定完整的库名称：

$ ld -o cpuid2 -lc cpuid2.o

上面，-l 选项后面紧接着一个 c 。c 表示 /lib/libc ，c 的前面部分 /lib/lib 是所有函数动态库的默认命名前缀。

经过上面的连接后，生成了可执行文件，但这个可执行文件却不能正确执行：

$ ./cpuid2
bash: ./cpuid2: 没有该文件或目录

原因是，连接器能够解析 C 函数，但函数本身并没有包含在最终的可执行文件中(注意，我们使用的是动态链接库)。连接器假设运行时能够找到必须的库文件，但实际中却不会。解决这个问题的办法是，在连接时必须明确指定在运行时加载动态库。这里，动态库的名字是 /lib 目录下的 ld-linux.so.2 。实际上，它也是一个指向 libc-2.10.1.so 的软连接。为了指定动态库，连接器使用 -dynamic-linker 参数：

beyes@beyes-groad:~/programming/assembly/cpuid$ ld -dynamic-linker /lib/ld-linux.so.2 -o cpuid2 -lc cpuid2.o
beyes@beyes-groad:~/programming/assembly/cpuid$ ./cpuid2
The processor Vendor ID is 'GenuineIntel'

由上可见，现在可以正常执行程序了。

使用 gcc 编译器时，会自动连接必须的 C 库，无须进行任何特殊的操作，但 gcc 是静态连接；而在 ld 里指定 -dynamic-linker 参数则是动态连接，动态连接是在运行时加载，所以最后产生的可执行文件比较小，如：

beyes@beyes-groad:~/programming/assembly/cpuid/temp$ ls -l cpuid2
-rwxr-xr-x 1 beyes beyes 8394 2009-11-28 21:27 cpuid2

beyes@beyes-groad:~/programming/assembly/cpuid$ ls -l cpuid2
-rwxr-xr-x 1 beyes beyes 2007 2009-12-01 13:39 cpuid2

或者用 objdump -d cpuid2 命令来查看 dump 出的文件信息，也可以明显看到 gcc 编译生成的文件里有很多内容。

程序说明：
调用 printf 函数，事先要把 printf() 函数参数压入堆栈，参数入栈的顺序是从右到左。

		自动登录	找回密码
密码			立即注册

CPUID 的一个汇编实例

程序说明

使用动态链接库