泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!

泰晓 Linux 实验盘全系 6 大 Linux 发行版已全部支持出厂系统恢复功能

LWN 387083: 针对 x86 平台移植 LMB(Logical Memory Block)内存分配器

Wang Chen 创作于 2018/09/08

原文:Moving x86 to LMB 原创:By corbet @ May. 11, 2010 翻译:By unicornx 校对:By Wu Zhangjin

The early days of the 2.6.34 development cycle were made more difficult for some testers by difficulties in the NO_BOOTMEM patches which came in during the merge window. The kinks in that code were eventually ironed out, but things might just get interesting again in 2.6.35 - Yinghai Lu is back with another set of patches which continues the process of completely reworking how early memory allocation is done on the x86 architecture. The potential for trouble with this kind of work is always there, but the end result does indeed seem worth aiming for.

早些日子,在内核 2.6.34 版本开发的集成期间,引入了一个 NO_BOOTMEM 补丁 ,该补丁存在的问题给部分测试人员造成了很大的麻烦。庆幸的是该问题最终被解决了,但事情可能会在下一个版本 2.6.35 中再次变得有趣,因为 Yinghai Lu (译者注:NO_BOOTMEM 补丁的作者)又带来了另一组补丁,这个新补丁将继续针对 x86 架构重构内核初始化期间内存分配的处理过程。这种工作潜在地总会带来一些问题,但改造最终所带来的好处仍然值得大家期待。

Some review: in a running kernel, memory management is handled by the buddy allocator (at the page level), with the slab allocator on top. These allocators are complex pieces of code which cannot run in the absence of a mostly functional kernel, so they cannot be used in the early stages of the bootstrap process. What is used, instead, is an architecture-specific chain of simple allocators. For x86, things start with a brk()-like mechanism which yields to the “e820” early reservation code, which, in turn, gives way to the bootmem allocator. Once the bootstrap has gotten far enough, the slab allocator can take over from the bootmem code. Yinghai’s 2.6.34 changes were meant to short out the bootmem stage, allowing the system to use the early reservation code until slab can run.

让我们先来回顾一下内核初始化期间的内存管理运行机制:我们知道,内核在正常运行期间,内存管理由伙伴(buddy)分配器(在页面级别)进行处理,slab 分配器基于伙伴系统之上运行。这些分配器都十分复杂,在大部分内核的功能还未就绪的情况下是无法运行的,因此它们不能在早期阶段,譬如初始化过程中被使用。相反,初始化期间内核使用的是一套特定于体系架构,相对简单的内存分配系统。对于x86,其情形类似于执行系统调用 brk() 一样,内核先通过 “e820” 机制(译者注:参考维基百科的 e820 词条)从系统中申请预留一段内存进行早期的内存管理(early reservation),然后让位于 bootmem 分配器。初始化过程进展到一定程度后,接下来再由 slab 分配器从 bootmem 手中接管内存管理。Yinghai 在 2.6.34 中所提交的补丁的目的就是希望略去 bootmem 阶段,允许系统在 slab 运行之前仅依靠早期预留(early reservation)机制对内存进行管理。

During the review process for that code, some reviewers asked why x86 did not use the “logical memory block” (LMB) allocator instead of its own early reservation code. LMB is currently used by the Microblaze, PowerPC, SuperH, and SPARC architectures, so it has the look of a generic solution. There are obvious advantages to using generic code over architecture-specific variants; there are more eyes to look at the code and the overall maintenance cost is reduced. So the idea of moving to LMB made obvious sense.

在对该补丁的代码审查过程中,一些人询问为什么 x86 没有使用 “逻辑内存块”(logical memory block,简称 LMB)分配器,而是使用了它自己的所谓早期预留机制。LMB 目前在 Microblaze,PowerPC,SuperH 和 SPARC 架构上被使用,因此它很有可能成为一套通用的解决方案。一般来说,通用的代码相对于特定于体系架构的解决方法会具有明显的优势;通用,意味着会有更多的人共同审阅同一份代码,整体的维护成本也会显著降低。因此,转向使用 LMB 的想法显而易见。

LMB is, as might be expected, a truly simplistic memory manager. Low-level architecture code gives it blocks of memory to manage as it discovers them with:

正如读者您所想的那样,LMB 的确是一个非常简单的内存管理器。底层体系架构相关代码可以通过调用以下接口向 LMB 报告内存块以供其管理:

long lmb_add(u64 base, u64 size);

The LMB allocator will duly store that region into a fixed-length array of known memory blocks, coalescing it with existing blocks if need be. Memory may then be allocated with:

LMB 分配器维护一个固定长度的数组,用于管理其发现的内存块,它会将底层报告的内存块区域适当地保存到该数组中,如果需要,还会将其与现有块进行合并。使用者如果需要内存块可以通过以下接口向 LMB 发起申请:

u64 lmb_alloc(u64 size, u64 align);

Allocated blocks are tracked in a second array which looks just like the first; an allocation is satisfied by iterating through the available blocks, trying to find a sufficiently large chunk which is not already reserved by somebody else. There are other functions for reserving specific regions of memory, allocating memory on specific NUMA nodes, etc. But, at its core, LMB is a simple allocator which is meant to do a good-enough job until something more sophisticated can take over.

LMB 通过另外一个相同类型的数组记录哪些内存块已经被分配(译者注,具体数据结构参考struct lmb);一旦有人申请内存,LMB 会遍历所有已知的块,从而试图找到并返回一个尚未被其他人使用(预留)的足够大的块。LMB 还提供其他的函数,包括可以申请特定的内存区域,在特定的 NUMA 节点上分配内存等等。但是,LMB 从本质上来说是如此的简单,它要做的仅仅是确保足够够用,直到更复杂的内存管理机制有能力接管它。

Yinghai’s patch set makes a number of changes to the LMB code itself, starting with a move from the lib directory over to mm with the rest of the memory-management code. Some new functions are added to match the different semantics supported by the early reservation code, which works in a two-step, “find a memory block, then reserve it” mode. There is also a new function to transfer LMB reservations into the bootmem allocator for configurations where bootmem is still in use. The 22-part series culminates with a switch to LMB calls for early allocations and the removal of the now-unused early reservation code.

Yinghai 的补丁集对 LMB 代码本身也进行了一些更改,譬如将代码从 lib 目录转移到 mm,和其他内存管理代码放在一起。他还添加了一些新函数以便适配 x86 平台上的早期预留(early reservation)部分的代码以支持其特定的场景需求,这部分代码基本上分两步工作,即 “先找到一个内存块,然后预留它”。另外还定义了一个新函数用于将 LMB 预留的内存转换给 bootmem 分配器,以便用于支持那些仍然使用 bootmem 的内核。这组由 22 个补丁修改组成的补丁集将原来使用早期分配(early reservation)管理内存的方式完全改为通过 LMB 方式,同时清除了现在不再使用的早期预留(early reservation)机制部分的代码。

There has been surprisingly little discussion for a patch series which makes such fundamental changes. It seems that most kernel developers pay relatively little attention to what happens at the architecture-specific levels. One exception is Ben Herrenschmidt, who keeps an eye on LMB from the PowerPC perspective. Ben disagrees with a number of the LMB-level changes, feeling that they complicate the API and potentially introduce problems. Instead, it looks like Ben would like to fix up the LMB code himself, letting Yinghai work on the x86-specific side of things.

令人惊讶的是,对于这次关键性的改动,社区竟然出奇地冷静。看起来大多数内核开发人员对特定于体系结构级别的改动的关注度相对较少。但有个例外,Ben Herrenschmidt,他从对 PowerPC 架构的影响的角度出发,提出了对改进 LMB 的看法。Ben 表示他并不同意对 LMB 自身代码的那些修改,他认为这些修改使得 API 变得更复杂并有可能引入问题。看起来 Ben 想要亲自操刀,来修复 LMB 部分的代码,同时好让 Yinghai 更关注于特定于 x86 架构方面的改造工作。

To that end, Ben has posted a patch series of his own, saying:

My aim is still to replace the bottom part of Yinghai's patch series rather than build on top of it, and from there, add whatever he needs to successfully port x86 over and turn `NO_BOOTMEM` into something half decent without adding a ton of unneeded crap to the core lmb.

为此,Ben 发布了自己的补丁系列,他说:

对 LMB 核心部分添加的大量修改(译者注,特指 Yinghai 的补丁集对 LMB 部分的修改)是不必要的。所以我的目标是重写这部分,而不是基于 Yinghai 的修改继续进行开发,Yinghai 可以基于我的修改添加他所需要的东西,他可以更关注于针对 x86 架构的移植以及改进 `NO_BOOTMEM` 功能。

Some of the changes simply clean up the LMB code, adding, for example, a for_each_lmb() macro for iterating through the array of memory blocks. The fixed-length arrays are made variable, phys_addr_t is used to represent physical addresses, and the code is substantially reorganized. There is much that Ben still plans to do, including, happily, the addition of actual documentation to the API, but even without all that, it’s a significant cleanup for the LMB code.

Ben 的一部分更改只是对 LMB 代码进行清理,例如添加一个 for_each_lmb() 宏来遍历内存块的数组。其余部分的改动基本上是对原有代码的重新组织,包括将原来固定长度数组修改为支持可变长,增加了一个新的结构体类型 phys_addr_t 用于表示物理地址等。Ben 的计划很多,甚至包括,给 API 添加文档说明,虽然还没有实现所有的计划,但目前的修改对于 LMB 代码来说已经是一次很重要的清理了。

As with Yinghai’s patches there has been little in the way of discussion. It may be that these changes will remain below the radar while the two patch sets are integrated and - maybe - merged for 2.6.35. With luck, they’ll remain below the radar thereafter as well, with few people even noticing the difference.

与 Yinghai 的补丁一样,对于 Ben 提交的补丁,社区的反应寥寥。看起来这方面的修改并没有引起大家的兴趣。很快这两个个补丁集会被集成并且很有可能被合入 2.6.35。只要不引发问题,估计即使合入主线后,也很少会有人注意到内核内部已经发生了如此巨大的改变。

Read Album:

Read Related:

Read Latest: