泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!


LWN 116810: 对四级页表设计的再思考

Wang Chen 创作于 2018/09/22

原文:Rethinking four-level page tables 原创:By corbet @ Dec. 22, 2004 翻译:By unicornx of TinyLab.org 校对:By Yang Hubin

Andi Kleen’s four-level page table patch has been in the -mm tree for some time; it is widely understood to be one of the first things in the queue to be merged once 2.6.10 is out. For those who are not familiar with this patch, why it matters, and how it works, a look at this LWN Kernel Page article from last October might be helpful in understanding the following discussion.

Andi Kleen 的四级页表补丁在 “-mm” 树中已经有一段时间了;大部分人都同意将该补丁作为下一个发布版本 2.6.10 中首先合入的第一批补丁之一。如果读者还不熟悉这个补丁,以及为什么它如此重要,它又是如何工作的,请先浏览一下10 月份发表的这篇文章,可能有助于您理解以下的讨论内容。

The three levels of page table currently implemented by the kernel are, from top to bottom, the PGD, PMD, and PTE. Andi’s patch extends the hierarchy by adding a new top-level directory called PML4 (from the x86-64 specification). A system which currently has a single PGD (per virtual address space) will have, instead, a single PML4 directory which may contain pointers to many PGD directories. In the current implementation, the PMD vanishes transparently on systems which only have two-level page tables; as a result, the kernel can treat all systems as if they had three-level page tables. Andi’s four-level patch works in a similar way, causing the new PML4 level to be optimized out on hardware which does not support it.

目前内核实现的三级页表,从上到下依次为 PGD,PMD 和 PTE。Andi 的补丁通过在 PGD 之上新增加一个名为 PML4 的新级别(该名词来自 x86-64 体系架构的专业术语,译者注,即 Page Map Level 4 的简称)扩展了原来的层次结构。也就是说,原三级模型中,每一个进程所拥有的那个 PGD 表被替换为一个 PML4 表,该 PML4 表中包含了指向多个 PGD 表的指针。在当前的实现(译者注,指原先支持三级页表模型的内核版本)中,当一个具体的体系架构只支持两级页表时,三级中的 PMD 那一级页表会在内部被 “透明” 地优化掉;通过这种方式,内核可以统一对待所有的体系架构,就好像它们采用的都是三级页表模式一样。在 Andi 的四级页表补丁中以类似的方式工作,所以如果一个硬件平台不支持第四级,则内核中的 PML4 级页表同样会被优化掉。

Nick Piggin has recently posted a new, alternative four-level patch. Nick is not hugely upset by Andi’s patch set, but he thinks he has a better way. Essentially, Nick thinks that it would be better to keep the PGD as the top-level page directory, and to insert the new level in the middle, next to the PMD. With this organization, all architectures would have an active PGD at the top of the hierarchy, and active PTEs at the bottom, but the PMD and the PUD (Nick’s name for the new level) would be optimized out on systems which do not use them.

Nick Piggin 最近发布了一个新的四级补丁实现。Nick 认为 Andi 的补丁方式并没有什么本质上的问题,而他的补丁则提供了一种更好的解决方案。Nick 认为最好将 PGD 依然保留为顶层的页表,同时将新级别插入在中间,即 PMD 级别的左边或者右边。基于该设计,对于所有的体系结构来说,在层次结构上所看到的最顶层都是 PGD,最底层也都是 PTE,而中间的 PMD 和 PUD(Nick 给新级别起的名字,即 Page Upper Directory)可以根据体系架构所支持的实际页表层级情况进行优化。

Andi would prefer to stick with the current patches; he sees Nick’s approach as being mainly an exercise in renaming which could delay the merging of the four-level capability. The current patches have been shaken down well in the -mm tree and seem to work; thrashing them up now would require a new round of testing before they had the same level of confidence. Andi has other work which is waiting for the four-level patch to be merged, so he would rather not see the whole process slowed down.

Andi 更倾向于坚持他自己提交的补丁方案;他认为 Nick 的方法看上去只是在层级的命名上做文章,没有必要为此推迟合入他的补丁。况且他的补丁已经在 “-mm” 代码仓库中经过了大量的测试并证明其可以有效地工作;如果此时引入新的修改将需要重新测试并验证改动的正确性。Andi 还有很多工作需要等到其补丁合入内核主线之后才能继续,所以他不希望为此耽误整个集成的进度。

Others are in less of a hurry, however, and see merit in Nick’s patches. In particular, Linus prefers placing the new level below the PGD as the least intrusive way of extending the page table hierarchy.

然而,其他似乎人并不着急,而且看上去他们更欣赏 Nick 的补丁。尤其是,Linus 先生也倾向于将新的层级放在 PGD 的下面,因为他认为这么做对整个内核页表层级设计的影响更小。

Basically, by doing the new folded table in the middle, it only affects code that actually walks the page tables. Basically, what I wanted in the original 2->3 level expansion was that people who don’t use the new level should be able to conceptually totally ignore it. I think that is even more true in the 3->4 level expansion.

如果只是在层级模型的中间增加层级,则受到影响的 “仅仅” 局限于实际遍历页表的代码。当初内核从二级页表扩展为三级时,我们修改的初衷就是希望,如果用户不使用新层级,则设计上最好能够让他完全感受不到其存在。我认为这个原则对于当前内核从三级页表向四级页表扩展的修改中也是适用的。

Andi has not yet given in, but there seems to be a strong wind blowing in favor of Nick’s page table arrangement. So four-level page tables might not be the first thing to go into 2.6.11 after all.

Andi 还没有最终放弃他的观点,但貌似 “选择的天平” 正在向有利于 Nick 的方向倾斜。看起来还需要一段时间 2.6.11 版本中才可能增加对四级页表的支持。

Read Album:

Read Related:

Read Latest: