泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

泰晓Linux实验盘,即刻上手内核与嵌入式开发
请稍侯

LWN 83588: 内核 2.6 版本的交换(swapping)行为

Wang Chen 创作于 2019/03/06

请点击 LWN 中文翻译计划,了解更多详情。

原文:2.6 swapping behavior 原创:By corbet @ May. 5, 2004 翻译:By unicornx 校对:By Shaolin Deng

There has, recently, been a new round of complaints about how the 2.6 kernel swaps out memory. Some users have been very vocal in their belief that, if they have sufficient physical memory, their applications should never be swapped out. These people get annoyed when they sit down at their display in the morning and find that their office suite or web browser is unresponsive, and stays that way for some time. They get even more annoyed when they look and see how much memory the kernel is using for caching file contents rather than process memory. The obvious question to ask is: couldn’t the kernel cut back a bit on the file caches and keep applications in memory?

最近,社区中又有人在抱怨 2.6 版本的内核中和 “交换”(swap,译者注,下文直接使用不再翻译)有关的问题。一些用户坚称,如果物理内存足够,他们的应用程序(所占用的内存)应该永远不被换出。最令他们恼火的事情莫过于,一早打开显示器,却发现他们的办公软件或网络浏览器毫无动静,过了好一会才会有所反应。更让他们生气的是,通过查看内核的内存使用情况,他们会发现内核将很多内存都用于缓存文件数据,而没有留给应用进程。他们不禁会问,内核为何不能减少文件缓存呢,这样我们不就会有更多的内存可以将应用程序驻留在内存中了吗?

The answer is that the kernel can be made to behave that way by tweaking a runtime parameter, but it is not necessarily a good idea. Before getting into that, however, it’s worth noting that recent 2.6 kernels have a memory management problem which can cause serious problems after an application which reads through entire filesystems (updatedb, say, or a backup) has run. The problem is the slab cache’s tendency to request allocations of multiple, contiguous pages; these allocations, when done at the behest of filesystem code, can bring the system to a halt. A patch has been merged which fixes this particular problem for 2.6.6.

对以上问题的回答是:内核实际上是支持通过调整运行时参数来满足以上需求的,但一般来说这么做并不是个好主意。另外,在开始详细介绍这个参数之前,提请大家注意的是,最近的 2.6 内核中存在另一个内存管理上的问题,上层应用一旦执行读取整个文件系统的操作后(譬如运行 updatedb 命令,或者执行备份),可能会导致一个严重的问题。具体原因有关 slab 缓存,它会倾向于请求分配多个连续的物理页;一旦通过文件系统相关代码触发其执行后,就有可能会导致系统被挂起。针对该问题的 补丁 已经解决并被合入 2.6.6。

The bigger issue remains, however: should the kernel swap out user applications in order to cache more file contents? There are plenty of arguments in favor of this behavior. Quite a few large applications set up big areas of memory which they rarely, if ever use. If application memory is occasionally forced to disk, the unused parts will remain there, and that much physical memory will be freed for more useful contents. Without swapping application memory to disk and seeing what gets faulted back in, it is almost impossible to figure out which pages are not really needed. A large file cache is also a performance enhancer. The speedups that come from having frequently-accessed data in memory are harder to see than the slowdowns caused by having to fault in a large application, but they can lead to better system throughput overall.

但这个补丁并没有解决本文一开始所提到的用户提出的问题:内核换出用户的应用程序以便缓存更多文件内容,这么做究竟对不对?关于这一点存在很多的争论。相当多的大型应用程序分配了很大的内存区域,但却很少使用它们。如果我们不将这些应用程序的内存强制换出,则未使用的部分将驻留在内存中,导致我们为了加载其他有用的内容而不得不额外释放大量物理内存。换句话说,我们只有先将应用程序的内存交换到磁盘,然后通过实际发生的缺页调入,才可以确定哪些内存页才是真正需要的。大的文件缓存对改善性能也是有好处的。针对一个大型应用程序的缺页异常处理所导致的系统变慢会很容易被我们感受到,而相较起来。将频繁访问的数据缓存在内存中所带来的速度上的提升却没有那么明显(译者注,其实和没有缓存相比还是很明显的,只不过我们已经习以为常),但它们的确从整体上给系统带来了更好的吞吐性能。

Still, there are users who insist that, for example, a system backup should never force OpenOffice out to disk. They don’t care how quickly a system maintenance application runs at 3:00 in the morning, but they care a lot about how the system responds when they are at the keyboard. This wish was expressed repeatedly until Andrew Morton exclaimed:

I'm gonna stick my fingers in my ears and sing "la la la" until people tell me "I set swappiness to zero and it didn't do what I wanted it to do".

但是,仍然有些用户坚持认为文件缓存应该为应用进程的内存让路,例如,系统执行备份操作时不应该导致 OpenOffice (译者注,一种 Linux 上常用的办公软件)被强制换出到磁盘。他们只关心桌面系统的响应速度,而不关心那些系统维护应用在凌晨 3 点运行时的速度。这样的声音此起彼伏,直到 Andrew Morton 先生(译者注,当时 Linux 内核社区的关键维护者之一)忍不住出面 发话说

我现在唯一想做的就是把手指塞进我的耳朵里然后唱 “啦 啦 啦”,除非有人告诉我 “我已经把 swappiness 这个控制参数设置为零,但我的问题依然没有解决”。(译者注,看上去 Andrew 对这类抱怨已经很不耐烦,而且他认为内核现有的机制已经可以满足他们的要求,下文将详细介绍的就是这个调节参数。)

This helped quiet the debate as the parties involved looked more closely at this particular parameter. Or, perhaps, it was just fear of Andrew’s singing. Either way, it has become clear that most people are unaware of what the “swappiness” parameter does; the fact that it has never been documented may have something to do with that.

终于有关各方不再无休止地争论,大家开始进一步关注这个特定参数。也或许,他们只是有些害怕 Andrew 唱(发)歌(飙)。但无论怎样,很明显大多数人都还不知道 “swappiness” 这个参数的作用;这可能也和内核中对这个参数并没有正式的文档说明有关。

So… swappiness, which is exported to /proc/sys/vm/swappiness, is a parameter which sets the kernel’s balance between reclaiming pages from the page cache and swapping out process memory. The reclaim code works (in a very simplified way) by calculating a few numbers:

  • The “distress” value is a measure of how much trouble the kernel is having freeing memory. The first time the kernel decides it needs to start reclaiming pages, distress will be zero; if more attempts are required, that value goes up, approaching a high value of 100.
  • mapped_ratio is an approximate percentage of how much of the system’s total memory is mapped (i.e. is part of a process’s address space) within a given memory zone.
  • vm_swappiness is the swappiness parameter, which is set to 60 by default.

具体的方法是通过 /proc/sys/vm/swappiness 所导出的 swappiness 这个参数,可以设置内核如何在回收页缓存中的内存数量和换出进程内存数量之间进行比例调节,从而保持一定的平衡。回收代码通过以下几个参数(以非常简单的方式)计算对内存执行换出操作的力度:

  • distress ” 这个值用于衡量内核释放内存时的困难程度。内核第一次尝试回收页面时,distress 的值为零;随着尝试次数变多,该值会变大,最大不超过 100。
  • mapped_ratio 是一个近似的百分比值,用于表示在给定内存域(zone)中系统总内存有多少被映射到进程的地址空间。
  • vm_swappiness 就是 swappiness 参数,默认设置为 60。

With those numbers in hand, the kernel calculates its “swap tendency”:

基于这些参数,内核采用以下公式计算出它的 “交换趋势(swap tendency)”(译者注,具体代码参考 内核 2.6 系列早期版本的 refill_inactive_zone() 函数。从 2.6.28 开始,修改采用的新的处理方式,具体参考 这里的说明。但对于设置 swappiness 参数来说,取值依然是在 0 到 100 之间。):

swap_tendency = mapped_ratio/2 + distress + vm_swappiness;

If swap_tendency is below 100, the kernel will only reclaim page cache pages. Once it goes above that value, however, pages which are part of some process’s address space will also be considered for reclaim. So, if life is easy, swappiness is set to 60, and distress is zero, the system will not swap process memory until it reaches 80% of the total. Users who would like to never see application memory swapped out can set swappiness to zero; that setting will cause the kernel to ignore process memory until the distress value gets quite high.

如果 swap_tendency 的值低于 100,则内核将仅回收页缓存中的页框。否则,一旦该值超过 100,则部分映射为进程地址空间的页框也将被考虑回收。因此,缺省情况下,swappiness 的值设置为 60,由于 distress 初始值为零,在 mapped_ratio 达到总数的 80% 之前,系统是不会交换进程的内存的。如果永远不想看到应用程序的内存被换出,用户可以将 swappiness 设置为零;这样内核就会忽略进程的内存,除非 distress 的值变得非常高。

The swappiness parameter should do what a lot of users want, but it does not solve the whole problem. Swappiness is a global parameter; it affects every process on the system in the same way. What a number of people would like to see, however, is a way to single out individual applications for special treatment. Possible approaches include using the process’s “nice” value to control memory behavior; a low-priority process would not be able to push out significant amounts of a high-priority process’s memory. Alternatively, the VM subsystem and the scheduler could become more tightly integrated. The scheduler already makes an effort to detect “interactive” processes; those processes could be given the benefit of a larger working set in memory. That sort of thing is 2.7 work, however; in the mean time, people who are unhappy with the kernel’s swap behavior may want to try playing with the knobs which have been provided.

swappiness 参数应该基本上解决了许多用户的需求,但它还不能解决所有问题。作为一个全局参数,swappiness 会以相同的方式影响系统上的每个进程。但有些人希望看到的是对某些单个应用进程采取特殊的处理。为此,可能的方法包括设置进程的 “nice” 值来控制其使用内存的行为;高优先级的进程相对于低优先级的进程,其所使用的内存更不容易被换出。或者,虚拟内存子系统和调度器可以更紧密地合作。调度器可以针对那些 “交互式(interactive)” 的进程进行特殊处理;确保它们获得更多的内存。当然,这些工作要留到 2.7 版本的内核中才会被考虑;所以,如果当前的用户不满意内核的 swap 行为,可以尝试利用前文介绍的 swappiness 控制参数进行调节。

请点击 LWN 中文翻译计划,了解更多详情。



Read Album:

Read Related:

Read Latest: