泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!

泰晓 Linux 实验盘全系 6 大 Linux 发行版已全部支持出厂系统恢复功能

LWN 235164: 按需预读(On-demand readahead)

Wang Chen 创作于 2019/01/20

请点击 LWN 中文翻译计划,了解更多详情。

原文:On-demand readahead 原创:By corbet @ May. 21, 2007 翻译:By unicornx of TinyLab.org 校对:By Xiaojie Yuan

“Readahead” is the act of speculatively reading a portion of a file’s contents into memory in the expectation that a process working with that file will soon want that data. When readahead works well, a data-consuming process will find that the information it needs is available to it when it asks, and that waiting for disk I/O is not necessary. The Linux kernel has done readahead for a long time, but that does not mean that it cannot be done better. To that end, Fengguang Wu has been working on a set of “adaptive readahead” patches for a couple of years.

所谓 “预读”(“Readahead”,译者注,下文直接使用不再翻译)是指基于 “打开了一个文件的进程将马上读取该文件内容” 这样一个假设,于是提前将文件内容的一部分读入内存。当 readahead 运行良好时,读取数据的进程会发现它获取到数据的速度是非常快的,几乎不存在磁盘读写延迟。Linux 内核对 readahead 的支持已经有很长一段时间了,但这并不意味着针对该特性已经不存在优化的可能。为此,Fengguang Wu 几年来一直在开发一套 “自适应预读”(”adaptive readahead”,译者注,下文直接使用 “adaptive readahead” 不再翻译)补丁(试图对该特性加以改进)。

Adaptive readahead was covered here in 2005. The patches have been languishing in the -mm tree for one simple reason: their complexity is at such a level that few people are able to review them in any useful way. The new on-demand readahead patch is a response to a request from Andrew Morton for a simpler patch to help get the merge process going. The new code is indeed simpler, having dispensed with much of the logic found in the full adaptive readahead mechanism.

我们曾经在 2005 年 给大家介绍过 “Adaptive readahead”。这个补丁一直逗留在 -mm 代码仓库中(没有被合入内核主线),原因很简单:就是它实在是太复杂了,以至于很少有人能够完全理解。由于 Andrew Morton 希望对原补丁进行简化后再考虑将其合入主线,所以 Fengguang 新提了一个 “按需预读(on-demand readahead)” 补丁(译者注,下文直接使用 “on-demand readahead” 不再翻译)。新的代码看上去确实更简单些,略去了原 “adaptive readahead” 机制中的大部分逻辑。

To a great extent, the on-demand patch reimplements what Linux readahead does currently, but in a simpler and more flexible way. Like the current code, the on-demand patch maintains a “readahead window” consisting of a portion of the file starting with the application’s last read. Pages inside the readahead window should already be in the page cache - or, at least, under I/O to get there as soon as possible. The window moves forward as the application reads data from the file.

在很大程度上,“on-demand readahead” 补丁重新实现了 Linux 中 的 readahead 功能,只是实现的方式更简单,更灵活。与当前代码类似(译者注,不是相同,具体原因见下文描述),“on-demand readahead” 补丁维护了一个 “预读窗口”(“readahead window”,译者注,这个窗口应该被理解成一个逻辑上的区间,下文直接使用不再翻译成中文),该 “readahead window” 覆盖了文件中从上次应用读取结束的位置开始的一部分长度。“readahead window” 内的数据应该已经被提前读取到缓存页(page cache)中,或者正在被从磁盘读入缓存。这个 “readahead window” 随着上层应用不断从文件中读取数据而向文件末尾滑动。

The current code actually implements two windows, being the “current window” (a set of in-cache pages which includes the application’s current position) and the “ahead window,” being the pages most recently read in by the kernel. Once the application’s position crosses from the current window into the ahead window, a new I/O operation is started to make a new ahead window. In this way, the kernel tries to always keep sufficiently far ahead of the application that the file data will be available when requested.

当前内核的 readhead 代码中实际上实现了两个窗口,即 “当前窗口”(”current window”,对应着用于存放应用当前正在读取数据的一组缓存页)和 “提前窗口”(”ahead window”),对应着内核最近预读取的页面。一旦应用程序读取的位置从 “current window” 跨越到 “ahead window”,内核就会启动新的磁盘读写并创建一个新的 “ahead window”。通过这种方式,内核尝试始终为应用程序维护一个足够的提前读取量,以便在其请求读取时可以立即得到数据。

The on-demand patch, instead, has a single readahead window. Rather than maintain a separate “ahead window,” the new readahead code marks a page (using a flag in the page structure) as being at the “lookahead index.” When an application reads its way into the marked page, the readahead window is extended and a new I/O operation is started. There is some resistance to the idea of using a page flag, since those bits are perennially in short supply. Andrew Morton has suggested using some more approximate heuristics instead. That approach might occasionally make the wrong decision, but the penalty is low and does not affect the correctness of the system’s operation as a whole.

不同的是,“on-demand readahead” 补丁只实现了一个 “readahead window”。新的预读代码并没有维护一个单独的 “ahead window”,而是将 “readahead window” 所对应的缓存区中的某个物理页框标记为 “前瞻索引” (“lookahead index”,标记的方法是使用 struct page 中的 flags 字段)。当应用程序执行读操作过程中一旦遇到该标记页,则 readahead 算法就扩展 “readahead window” 并启动新的磁盘读写对数据进行预读。使用页标记的想法遭到了一些人的反对,因为 struct page 中的 flags 字段的比特位实在太过于珍贵。Andrew Morton 建议 使用一些更粗略的预测方法。这种方法可能偶尔会预测失败,但影响很小,并不会影响整个系统运作的正确性。(译者注,最终提交到主线的代码对该 “前瞻索引”(即 PG_readahead)复用了另一个已有的比特位 PG_reclaim,具体参考 mm: share PG_readahead and PG_reclaim。)

While the on-demand patch appears to do relatively little, it does have the advantage of removing a bunch of complexity from the current readahead code. It is able to make its decisions without the overhead of trying to track events like an attempted readahead of pages which are already in the cache. The checks for sequential access are made less strict as well, causing readahead to stay active in situations where the current code would turn it off. The result, according to some benchmarks posted with the patch, is improvements in application speed between 0.1% and 8% or so - with some performance regressions in some cases. Interestingly, some of the best results come with a benchmark running on a MySQL database, which is not where one would normally expect to see a lot of sequential activity.

“on-demand readahead” 补丁采用相对简单的实现代码代替了原先 readahead 逻辑中的一大堆复杂处理逻辑。新补丁的优势在于其在对预读进行判断时无需依赖于跟踪很多事件,例如跟踪缓存中已经被尝试预读过的页框,从而节省了开销。对顺序访问的检查也没有原先严格,所以相比于当前的代码逻辑,加入补丁后的 readahead 活动更加积极,甚至在一些对于当前逻辑下并不会执行 readahead 的地方加入补丁后也会进行 readahead 。从随补丁发布的一些 基准测试结果可以看出,虽然在某些情况下仍然会出现一些性能衰退,但应用程序运行的速度却提高了 0.1% 到 8% 不等。有趣的是,一些最好的结果是出现在使用 MySQL 数据库运行基准测试的情况下,而通常人们认为 MySQL 运行过程中的顺序读取操作并不多。

This patch set is clearly simple enough to be reviewed; in the absence of any strong objections, it could conceivably be ready for 2.6.23. Then, perhaps, Fengguang can start working on adding some of the more complex logic which makes up the full adaptive readahead mechanism.

这个补丁集很简单,非常方便审查;如果没有什么非常强烈的反对意见的话,相信它可以被合入 2.6.23 版本。如果是这样的话,也许 Fenghuang 可以开始着手添加一些更复杂的逻辑,进一步完善他的自适应预读(“adaptive readahead”)机制。(译者注,“on-demand readahead” 补丁随 2.6.23 版本合入内核主线,具体可以参考这里的介绍。)

请点击 LWN 中文翻译计划,了解更多详情。

Read Album:

Read Related:

Read Latest: