ELF转二进制(2/4)：允许把 Binary 文件加载到任意位置

Wu Zhangjin 创作于 2020/02/09

By Falcon of TinyLab.org Dec 09, 2019

背景简介

有一天，某位同学在讨论群聊起来：

除了直接把 C 语言程序编译成 ELF 运行以外，是否可以转成二进制，然后通过第三方程序加载到内存后再运行。

带着这样的问题，我们写了四篇文章，这是其二。

上篇介绍了如何把 ELF 文件转成二进制文件，并作为一个新的 Section 加入到另外一个程序中执行。

这个代码包括两个段，一个 text 段，一个 data 段，默认链接完以后，text 中是通过绝对地址访问 data 的，ELF 转成 Binary 后，这个地址也写死在 ELF 中，如果要作为新的 Seciton 加入到另外一个程序，那么链接时必须确保 Binary 文件的加载地址跟之前的 ELF 加载地址一致，否则数据存放的位置就偏移了，访问不到，所以上篇文章用了一个客制化的 ld script，在里头把 Binary Seciton 的加载地址（运行时地址）写死的。

让数据地址与加载地址无关

本篇来讨论一个有意思的话题，那就是，是否可以把这个绝对地址给去掉，只要把这个 Binary 插入到新程序的 Text 中，不关心加载地址，也能运行？

想法是这样：data 应该跟 text 关联起来，也就是说，用相对 .text 的地址，因为 Binary 里头的 .rodata 是跟在 .text 后面，在文件中的相对位置其实是固定的，是否可以在运行时用一个偏移来访问呢？也就是在运行过程中，获取到 .text 中的某个位置，然后通过距离来访问这个数据？

在运行时获取 eip

由于加载地址是任意的，用 .text 中的符号也不行，因为在链接时也一样是写死的（用动态链接又把问题复杂度提升了），所以，唯一可能的办法是 eip，即程序地址计数器。

但是 eip 是没有办法直接通过寄存器获取的，得通过一定技巧来，下面这个函数就可以：

eip2ecx:
    movl   (%esp), %ecx
    ret

这个函数能够把 eip 放到 ecx 中。

原理很简单，那就是调用它的 call 指令会把 next eip 放到 stack，并跳到 eip2ecx。所以 stack 顶部就是 eip。这里也可以直接用 pop %ecx。

所以这条指令能够拿到 .here 的地址，并且存放在 ecx 中：

    call   eip2ecx
.here:
    ...

    .section .rodata
.LC0:
    .string "Hello World\xa\x0"

通过 eip 与数据偏移计算数据地址

然后接下来，由于汇编器能够算出 .here 离 .LC0（数据段起始位置）：.LC0 - .here，对汇编器而言，这个差值就是一个立即数。如果在 ecx 上加上（addl）这个差值，是不是就是数据在运行时的位置？

我们在 .here 放上下面这条指令：

  call   eip2ecx
.here:
    addl   $(.LC0 - .here), %ecx
    ...

    .section .rodata
.LC0:
    .string "Hello World\xa\x0"

同样能够拿到数据的地址，等同于：

movl   $.LC0, %ecx              # ecx = $.LC0, the addr of string

下面几个综合一起回顾：

addl 这条指令的位置正好是运行时的 next eip （call 指令的下一条）
.here 在汇编时确定，指向 next eip
.LC0 也是汇编时确定，指向数据开始位置
.LC0 - .here 刚好是 addl 这条指令跟数据段的距离/差值
call eip2ecx 返回以后，ecx 中存了 eip
addl 这条指令把 ecx 加上差值，刚好让 ecx 指向了数据在内存中的位置

完整代码如下：

# hello.s
#
# as --32 -o hello.o hello.s
# ld -melf_i386 -o hello hello.o
# objcopy -O binary hello hello.bin
#

    .text
.global _start
_start:
    xorl   %eax, %eax
    movb   $4, %al                  # eax = 4, sys_write(fd, addr, len)
    xorl   %ebx, %ebx
    incl   %ebx                     # ebx = 1, standard output

    call   eip2ecx
.here:
    addl   $(.LC0 - .here), %ecx    # ecx = $.LC0, the addr of string
                                    # equals to: movl   $.LC0, %ecx

    xorl   %edx, %edx
    movb   $13, %dl                 # edx = 13, the length of .string
    int    $0x80
    xorl   %eax, %eax
    movl   %eax, %ebx               # ebx = 0
    incl   %eax                     # eax = 1, sys_exit
    int    $0x80

eip2ecx:
    movl   (%esp), %ecx
    ret

    .section .rodata
.LC0:
    .string "Hello World\xa\x0"

链接脚本简化

这个生成的 hello.bin 链接到 run-bin，就不需要写死加载地址了，随便放，而且不需要调整 run-bin 本身的加载地址，所以 ld.script 的改动可以非常简单：

$ git diff ld.script ld.script.new
diff --git a/ld.script b/ld.script.new
index 91f8c5c..e14b586 100644
--- a/ld.script
+++ b/ld.script.new
@@ -60,6 +60,11 @@ SECTIONS
     /* .gnu.warning sections are handled specially by elf32.em.  */
     *(.gnu.warning)
   }
+  .bin          :
+  {
+    bin_entry = .;
+    *(.bin)
+  }
   .fini           :
   {
     KEEP (*(SORT_NONE(.fini)))

直接用内联汇编嵌入二进制文件

在这个基础上，可以做一个简化，直接用 .pushsection 和 .incbin 指令把 hello.bin 插入到 run-bin 即可，无需额外修改链接脚本：

$ cat run-bin.c
#include <stdio.h>

asm (".pushsection .text, \"ax\" \n"
     ".globl bin_entry \n"
     "bin_entry: \n"
     ".incbin \"./hello.bin\" \n"
     ".popsection"
);

extern void bin_entry(void);

int main(int argc, char *argv[])
{
	bin_entry();
	return 0;
}

这个内联汇编的效果跟上面的链接脚本完全等价。

把数据直接嵌入代码中

进一步简化汇编代码把 eip2ecx 函数去掉：

# hello.s
#
# as --32 -o hello.o hello.s
# ld -melf_i386 -o hello hello.o
# objcopy -O binary hello hello.bin
#

    .text
.global _start
_start:
    xorl   %eax, %eax
    movb   $4, %al                  # eax = 4, sys_write(fd, addr, len)
    xorl   %ebx, %ebx
    incl   %ebx                     # ebx = 1, standard output

    call   eip2ecx
eip2ecx:
    pop    %ecx
    addl   $(.LC0 - eip2ecx), %ecx  # ecx = $.LC0, the addr of string
                                    # equals to: movl   $.LC0, %ecx

    xorl   %edx, %edx
    movb   $13, %dl                 # edx = 13, the length of .string
    int    $0x80
    xorl   %eax, %eax
    movl   %eax, %ebx               # ebx = 0
    incl   %eax                     # eax = 1, sys_exit
    int    $0x80

.LC0:
    .string "Hello World\xa\x0"

再进一步，直接把数据搬到 next eip 所在位置：

# hello.s
#
# as --32 -o hello.o hello.s
# ld -melf_i386 -o hello hello.o
# objcopy -O binary hello.o hello
#

    .text
.global _start
_start:
    xorl   %eax, %eax
    movb   $4, %al                  # eax = 4, sys_write(fd, addr, len)
    xorl   %ebx, %ebx
    incl   %ebx                     # ebx = 1, standard output
    call   next                     # push eip; jmp next
.LC0:
    .string "Hello World\xa\x0"
next:
    pop    %ecx                     # ecx = $.LC0, the addr of string
                                    # eip is just the addr of string, `call` helped us
    xorl   %edx, %edx
    movb   $13, %dl                 # edx = 13, the length of .string
    int    $0x80
    xorl   %eax, %eax
    movl   %eax, %ebx               # ebx = 0
    incl   %eax                     # eax = 1, sys_exit
    int    $0x80

小结

本文通过 eip + 偏移地址实现了运行时计算数据地址，不再需要把 Binary 文件装载到固定的位置。

另外，也讨论到了如何用 .pushsection/.popsection 替代 ld script 来添加新的 Section，还讨论了如何把数据直接嵌入到代码中。

更多用法欢迎订阅吴老师的 10 小时 C 语言进阶视频课：《360° 剖析 Linux ELF》，课程提供了超过 70 多份实验材料，其中 15 个例子演示了 15 种程序执行的方法。

[置顶] 泰晓 RISC-V 实验箱，配套 30+ 讲嵌入式 Linux 系统开发公开课