上手 9 套工具，玩转二进制文件

Wu Zhangjin 创作于 2019/09/11

By Falcon of TinyLab.org Sep 05, 2019

前言

文件的终极存储方式是一堆二进制（01）串，在这个基础上，如果内容都能按照 8 位的 ASCII 文本表达，那就是纯文本文件，用各种文本编辑工具处理即可，如果内容是结构化的程序数据，比如可执行文件，那么得从二进制层面去操作。

对于特定的结构化数据，一般都有配套的操作 API，比如说 ELF，有专属的 binutils, elfutils 等，本文主要介绍通用的二进制操作工具，并以 ELF 为例，比照介绍相应的专属工具，这些工具在 Ubuntu 中都可以直接安装。

准备工作

先准备一个具体的小汇编程序，这个作为本文二进制操作演示的材料。

# hello.s
#
# as --32 -o hello.o hello.s
# ld -melf_i386 -o hello hello.o
# objcopy -O binary hello hello.bin
#

    .text
.global _start
_start:
    xorl   %eax, %eax
    movb   $4, %al                  # eax = 4, sys_write(fd, addr, len)
    xorl   %ebx, %ebx
    incl   %ebx                     # ebx = 1, standard output
    movl   $.LC0, %ecx              # ecx = $.LC0, the addr of string
    xorl   %edx, %edx
    movb   $13, %dl                 # edx = 13, the length of .string
    int    $0x80
    xorl   %eax, %eax
    movl   %eax, %ebx               # ebx = 0
    incl   %eax                     # eax = 1, sys_exit
    int    $0x80

    .section .rodata
.LC0:
    .string "Hello World\xa\x0"

把上面这份代码汇编、链接，并执行：

$ as --32 -o hello.o hello.s
$ ld -melf_i386 -o hello hello.o
$ ./hello
Hello World

仅保留代码和数据：

$ objcopy -O binary hello hello.bin

经过上面两步，得到了两个二进制文件，一个是 ELF 可执行文件 hello，另外一个是只包含了代码和数据的二进制文件 hello.bin。

二进制查看

比较常用的二进制读取工具有：

hexdump
xxd
od

查看 hello.bin

以读取 hello.bin 为例，三者可以输出类似的数据样式：

$ hexdump -C hello.bin
00000000  31 c0 b0 04 31 db 43 b9  6d 80 04 08 31 d2 b2 0d  |1...1.C.m...1...|
00000010  cd 80 31 c0 89 c3 40 cd  80 48 65 6c 6c 6f 20 57  |..1...@..Hello W|
00000020  6f 72 6c 64 0a 00 00                              |orld...|
00000027

$ xxd -g 1 hello.bin
00000000: 31 c0 b0 04 31 db 43 b9 6d 80 04 08 31 d2 b2 0d  1...1.C.m...1...
00000010: cd 80 31 c0 89 c3 40 cd 80 48 65 6c 6c 6f 20 57  ..1...@..Hello W
00000020: 6f 72 6c 64 0a 00 00                             orld...

$ od -A x -t x1z hello.bin
000000 31 c0 b0 04 31 db 43 b9 6d 80 04 08 31 d2 b2 0d  >1...1.C.m...1...<
000010 cd 80 31 c0 89 c3 40 cd 80 48 65 6c 6c 6f 20 57  >..1...@..Hello W<
000020 6f 72 6c 64 0a 00 00                             >orld...<
000027

上面以字节流的方式一个一个打印出来，不用关心字节序，直接按照文件的字节存储顺序打印出来即可。

查看 hello ELF 的 .text 节区

下面通过上述工具从 hello ELF 中直接打印代码和数据。

首先，需要通过 binutils 提供的 readelf -S 先获取到代码段和数据段在文件中的偏移：

$ readelf -S hello
There are 6 section headers, starting at offset 0x130:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        08048054 000054 000019 00  AX  0   0  1
  [ 2] .rodata           PROGBITS        0804806d 00006d 00000e 00   A  0   0  1
  [ 3] .shstrtab         STRTAB          00000000 000105 000029 00      0   0  1
  [ 4] .symtab           SYMTAB          00000000 00007c 000070 10      5   3  4
  [ 5] .strtab           STRTAB          00000000 0000ec 000019 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

上面拿到了 .text 和 .rodata 两个 Section 在文件中的偏移（Off）和大小（Size）。

.text 0x54 0x19
.rodata 0x6d 0x0e

以 .text 为例，告知起始位置和长度就行了：

$ hexdump -C -s $((0x54)) -n $((0x19)) hello
00000054  31 c0 b0 04 31 db 43 b9  6d 80 04 08 31 d2 b2 0d  |1...1.C.m...1...|
00000064  cd 80 31 c0 89 c3 40 cd  80                       |..1...@..|
0000006d

$ xxd -g 1 -seek $((0x54)) -l $((0x19)) hello
00000054: 31 c0 b0 04 31 db 43 b9 6d 80 04 08 31 d2 b2 0d  1...1.C.m...1...
00000064: cd 80 31 c0 89 c3 40 cd 80                       ..1...@..

$ od -A x -t x1z -j $((0x54)) -N $((0x19)) hello
000054 31 c0 b0 04 31 db 43 b9 6d 80 04 08 31 d2 b2 0d  >1...1.C.m...1...<
000064 cd 80 31 c0 89 c3 40 cd 80                       >..1...@..<
00006d

使用 ELF 专属工具 objdump

针对 ELF，objdump 可以实现同样功能，并且更有针对性，其中 -d 为反汇编，-j 指定目标 Section：

$ objdump -d -j .text hello
hello:     file format elf32-i386


Disassembly of section .text:

08048054 <_start>:
 8048054:	31 c0                	xor    %eax,%eax
 8048056:	b0 04                	mov    $0x4,%al
 8048058:	31 db                	xor    %ebx,%ebx
 804805a:	43                   	inc    %ebx
 804805b:	b9 6d 80 04 08       	mov    $0x804806d,%ecx
 8048060:	31 d2                	xor    %edx,%edx
 8048062:	b2 0d                	mov    $0xd,%dl
 8048064:	cd 80                	int    $0x80
 8048066:	31 c0                	xor    %eax,%eax
 8048068:	89 c3                	mov    %eax,%ebx
 804806a:	40                   	inc    %eax
 804806b:	cd 80                	int    $0x80

而 -s 则能输出类似 hexdump 等三样工具的输出格式：

$ objdump -s -j .text hello

hello:     file format elf32-i386

Contents of section .text:
 8048054 31c0b004 31db43b9 6d800408 31d2b20d  1...1.C.m...1...
 8048064 cd8031c0 89c340cd 80                 ..1...@..

hexdump 输出样式客制化

最后补充 hexdump 的更复杂功能，这个功能允许灵活调整数据显示样式，下面两个样式分别对齐 xxd -g 1 和 od -A x -t x1z。

$ hexdump -v -e '"%08.8_ax: " 16/1 "%02x " "  "' -e '16/1 "%_p""\n"' hello.bin
00000000: 31 c0 b0 04 31 db 43 b9 6d 80 04 08 31 d2 b2 0d  1...1.C.m...1...
00000010: cd 80 31 c0 89 c3 40 cd 80 48 65 6c 6c 6f 20 57  ..1...@..Hello W
00000020: 6f 72 6c 64 0a 00 00                             orld...

$ hexdump -v -e '"%6.6_ax " 16/1 "%02x ""  "' -e '">"16/1 "%_p""<""\n"'  hello.bin
000000 31 c0 b0 04 31 db 43 b9 6d 80 04 08 31 d2 b2 0d  >1...1.C.m...1...<
000010 cd 80 31 c0 89 c3 40 cd 80 48 65 6c 6c 6f 20 57  >..1...@..Hello W<
000020 6f 72 6c 64 0a 00 00                             >orld...<

这意味着 hexdump 在数据展示上相比 xxd 和 od 更为灵活强大，很适合需要丰富样式的数据分析场景。

二进制编辑

大家推荐的 3 种二进制编辑工具：

hexedit
vim + xxd/xxd -r
echo + dd

下面以一个具体例子来演示三者的用法，那就是直接在二进制中改掉要打印的字符串，把 ‘Hello World’ 改为 ‘nihao world’。需要确保两者长度一样，不然就会把其他内容覆盖了。

使用 hexedit：支持 Hex/ASCII 两种模式

hexedit 支持十六进制和 ASCII 两种编辑模式，通过下面几步完成修改：

$ hexedit hello

1. TAB：切换 Hex 到 Ascii 模式
2. CTRL+S：搜索 Hello
3. 然后直接输入 `nihao world`，覆盖掉 Hello World
4. CTRL+X：保存并退出
5. CTRL+C：退出不保存

$ ./hello
nihao world

hexedit 的用法不是很复杂，可以看看 man hexedit。不过对于 vim 用户来说，有一个适应过程。

使用 vim + xxd：完美兼容 vim

接下来，用 vim 和 xxd 配合做编辑，这个是在 vim 中，调用 xxd 把文件转换为十六进制，然后编辑，之后再转换为二进制。遗憾地是，这种方式不支持直接编辑文本，需要编辑十六进制，当然，好处是可以直接在 vim 中使用。

这里演示把 ‘nihao world’ 改回 ‘Hello World’，先要搜索到 ‘nihao world’ 的十六进制：

$ echo "nihao world" | hexdump -C
00000000  6e 69 68 61 6f 20 61 62  63 64 65 0a              |nihao world.|
0000000c
$ echo "Hello World" | hexdump -C
00000000  48 65 6c 6c 6f 20 57 6f  72 6c 64 0a              |Hello World.|
0000000c

然后，开启编辑过程，先切换为十六进制，找到 ‘6e 69 68 …’ 所在位置，替换为 ‘48 65 6c …‘：

$ vim hello
:%!xxd -g 1

编辑完，用 xxd -r 转换回二进制，之后，保存退出即可完成编辑。

:%!xxd -r

保存完执行：

$ ./hello
Hello World

使用 echo + dd：方便自动化，无需交互

下面，使用更为直观的非交互式方式改写这个字符串，也就是用 echo + dd 来完成：

$ readelf -S hello | egrep ".rodata|Name"
[Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
[ 2] .rodata           PROGBITS        0804806d 00006d 00000e 00   A  0   0  1
$ echo 'nihao world' | dd of=hello bs=1 seek=$((0x6d)) count=$((0x0e)) conv=notrunc status=none
$ ./hello
nihao world

对于非纯字符，需要先获取十六进制编码，也可以对照 man ascii 查表：

$ echo -n "nihao world" | hexdump -v -e '"\\""x"1/1 "%02x"' ; echo
\x6e\x69\x68\x61\x6f\x20\x61\x62\x63\x64\x65
$ echo "\x6e\x69\x68\x61\x6f\x20\x61\x62\x63\x64\x65" | dd of=hello bs=1 seek=$((0x6d)) count=$((0x0e)) conv=notrunc status=none
$ ./hello
nihao world

使用 ELF 专属工具 objcopy

最后，针对 ELF，也可以用专属工具 objcopy 来完成 .rodata section 的直接更新：

$ echo 'nihao world' > nihao.txt
$ objcopy --update-section .rodata=nihao.txt hello
$ ./hello
nihao world

二进制补丁

这里介绍 3 种二进制补丁制作和应用工具，分别是：

rdiff
bsdiff / bspatch
git diff / apply

首先推出 git diff/apply --binary，不过这个仅限 git 仓库中使用，这里不做深入介绍。下面简单演示另外两组工具。

准备两个二进制文件

以上面用到的 hello 和 hello.nihao 为例，制作 patch 文件并打上 patch。

准备两个不同的二进制文件：

$ as --32 -o hello.o hello.s
$ ld -melf_i386 -o hello hello.o
$ cp hello hello.nihao

$ echo 'nihao world' > nihao.txt
$ objcopy --update-section .rodata=nihao.txt hello.nihao

$ ./hello
Hello World
$ ./hello.nihao
nihao world

rdiff

制作差分 patch：

$ rdiff signature hello hello.sig
$ rdiff delta hello.sig hello.nihao hello.patch

打上差分 patch：

$ rdiff patch hello hello.patch hello.new

验证：

$ chmod a+x hello.new
$ ./hello.new
nihao world

bsdiff / bspatch