泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

儿童Linux系统,可打字编程学数理化
请稍侯

LWN 91829: 重新组织地址空间(address space)的布局

Wang Chen 创作于 2018/10/13

原文:Reorganizing the address space 原创:By corbet @ June 30, 2004 翻译:By unicornx of TinyLab.org 校对:By Xiaojie Yuan

The traditional organization of the virtual address space (as seen from user space, on x86 systems) is as shown in the diagram to the right. The very bottom part of the address space is unused; it is there to catch NULL pointers and such. Starting at 0x8000000 is the program text - the read-only, executable code. The text is followed by the heap region, being the memory obtainable via the brk() system call. Typically functions like malloc() obtain their memory from this area; non-automatic program data is also stored there.

虚拟地址空间(virtual address space,以 x86 系统为例)的传统布局方式(从用户空间的角度来看)如下图所示(译者注,原文的图在本段文字的右边,这里由于排版的原因,放在本段文字的下方)。地址空间的最低部分未使用;目的是利用该段地址捕获 NULL 指针等异常地址。从 0x8000000 开始存放的是程序的代码(text)段,即只读的可执行指令。代码段之上是堆(heap),这部分区域的内存通过系统调用 brk() 进行分配。应用程序经常使用的函数中,像 malloc() 这样的函数也是从这个区域申请内存(译者注,实际上内部调用的也是 brk() );静态(non-automatic)程序数据也存储在该区域中(译者注,即程序的数据(data)段部分,应该是在程序的 text 段之上、我们通常理解的 heap 区之下,但在本文的图中并没有特别列出。另外对于下文中的 heap,stack 等专有名词,为了行文简洁,除了第一次会给出中文翻译,后面出现的地方均直接以英文原单词列出,不再翻译)。

The heap differs from the first two regions in that it grows in response to program needs. A program like cat will not make a lot of demands on the heap (one hopes), while running a yum update can grow the heap in a truly disturbing way. The heap can expand up to 1GB (0x40000000), at which point it runs into the mmap area; this is where shared libraries and other regions created by the mmap() system call live. The mmap area, too, grows upward to accommodate new mappings.

heap 与前两个区域(译者注,指程序的 text 段和 data 段)的不同之处在于它的范围会随着程序运行中不断执行内存分配而增长。虽然像 cat 这样的程序不会对 heap 提出太多的内存分配要求,而运行 yum update 则会以一种真正 “令人不安” 的方式向 heap 申请内存。heap 区可以向上(高地址方向)扩展,最高到 1GB(0x40000000)处,从此地址开始往上是内存映射(mmap)区; 之所以叫 mmap 区是因为该区域的内存映射了共享库(shared libraries)和其他内容,而映射行为是通过 mmap() 系统调用实现的(译者注,具体映射方式请参考 man 2 mmap)。随着映射内容的增加,mmap 区向上(高地址方向)增长。

Meanwhile, the kernel owns the last 1GB of address space, up at 0xc0000000. The kernel is inaccessible to user space, but it occupies that portion of the address space regardless. Immediately below the kernel is the stack region, where things like automatic variables live. The stack grows downward. On a really bad day, the stack and the mmap area can run into each other, at which point things start to fail.

从 0xc0000000 开始往上,最后的 1GB 地址空间由内核使用,用户态指令无法访问这段区域的地址范围。紧挨在内核地址空间之下的是栈(stack)区,用于存放自动变量(automatic variables)。stack 区向下(低地址方向)增长。最糟糕的情形下,stack 区和 mmap 区可能会相互重叠,导致程序运行失败。

This organization has worked for some time, but it does have a couple of disadvantages. It fragments the address space, such that neither the heap nor the mmap area can make use of the entire space. If one program makes heavy use of the heap, it could run out of memory, even though a large chunk of space is available between the mmap area and the stack. Normally, not even yum can occupy that much heap, but there are other applications out there which are up to that challenge.

这样的地址空间布局已经施行有一段时间了,但它确实存在一些缺点。特别地它对地址空间的划分方式会造成一定程度的碎片化,使得 heap 区和 mmap 区都不能充分利用整个地址空间。譬如,当一个程序向 heap 区大量申请内存时很可能会耗尽 heap 区的地址空间,即使此时 mmap 区和 stack 区之间还有大量地址空间可用。通常情况下,即使是 yum 也不会从 heap 区申请如此多的内存,但是不能保证其他的应用程序不会这么做。

As a way of making life safer for the true memory hogs out there, Ingo Molnar has posted a patch which rearranges user space along the lines of the revised diagram on the left. The mmap area has been moved up to the top of the address space, and it now grows downward toward the heap. As a result, the bulk of the address space is preserved in a single, contiguous chunk which can be allocated to either the heap or mmap, as the application requires.

为了避免上述地址空间耗尽的问题,Ingo Molnar 提交了一个补丁,该补丁按照下图方式重新安排用户地址空间的布局(译者注,原文的图在本段文字的左边,这里由于排版的原因,放在本段文字的下方)。mmap 区被移动到地址空间的顶部,其区域的扩展方向改为向下,往 heap 区的方向增长。采用这种方式后,地址空间中间的那块空闲的连续区域,既可以分配给 heap 也可以分配给 mmap。

revised memory layout

As an added bonus, this organization reduces the amount of kernel memory required to hold each process’s page tables, since the fragment at 0x40000000 is no longer present.

作为额外的好处,该方式下还减少了每个进程存储页表所需的内存,因为 0x40000000 地址处不会存在一段独立的地址空间(译者注,指对应原方式下的 mmap 区域)。

There are a couple of disadvantages to this approach. One is that the stack area is rather more confined than it used to be. The actual size of the stack area is determined by the process’s stack size resource limit, with a sizable cushion added, so problems should be rare. The other problem is that, apparently, a very small number of applications get confused by the new layout. Any application which is sensitive to how virtual memory is laid out is buggy to begin with; according to Arjan van de Ven, the most common case is applications which store pointers in integer variables and then do the wrong thing when they see a “negative” value.

可是这种新方法也存在一些问题。一个是 stack 区的大小比以前受到更大的限制。但 stack 区的实际大小受进程的 stack 资源限制所限定,考虑到实际配置时还包含了相当大的冗余量,所以出问题(译者注,指 stack 耗尽)的可能性几乎没有。另外一个问题是,存在很少的一些应用程序会受到新布局的影响。对于一个应用程序来说,代码中存在对虚拟内存布局的任何假设都是有问题的;根据 Arjan van de Ven 的说法,最常见的一种错误就是应用程序使用一个整形(int)变量存放申请内存后返回的指针值,然后判断如果该值小于等于 0 则认为内存分配失败(译者注,采用新方法后 malloc() 返回的地址可能会高于 0x80000000,而该值对于 int 变量来说将是一个负数)。

The fact is that most users will never notice the change; for a demonstration, consider that Fedora kernels have been shipping with this patch for some time. Even a vanilla Fedora Core 1 system has it; a command like “cat /proc/self/maps” will show the new layout at work. The patch is currently part of the -mm kernel, and will probably find its way into the mainline before too long.

事实上,该改动对于大多数用户来说是无感的;要知道 Fedora 所使用的内核中已经合入该补丁有一段时间了,并没有造成什么问题。基于内核主线版本(vanilla)制作的 Fedora Core 1 系统中也包含了这个改动;我们可以通过运行 “cat /proc/self/maps” 这个命令查看地址空间的新布局。该补丁目前已经合入 -mm 代码库,相信在不久的将来就会进入内核主线(译者注:该补丁随 v2.6.9 合入主线)。



Read Album:

Read Related:

Read Latest: