LWN 520076: 软中断对实时性的影响

Wang Chen 创作于 2019/12/04

了解更多有关 “LWN 中文翻译计划”，请点击这里

原文：Software interrupts and realtime 原创：By Jonathan Corbet @ Oct. 17, 2012 翻译：By unicornx 校对：By Lei Li

The Linux kernel’s software interrupt (“softirq”) mechanism is a bit of a strange beast. It is an obscure holdover from the earliest days of Linux and a mechanism that few kernel developers ever deal with directly. Yet it is at the core of much of the kernel’s most important processing. Occasionally softirqs make their presence known in undesired ways; it is not surprising that the kernel’s frequent problem child — the realtime preemption patch set — has often run afoul of them. Recent versions of that patch set embody a new approach to the software interrupt problem that merits a look.

Linux 内核中的 “软中断（software interrupt，简称 softirq）” 机制就好比是一头怪兽。虽然该机制很早就被引入内核，但一直以来，大家对它的运行机制其实并不是很清楚，也很少有内核开发人员会直接使用它。作为内核中最重要的组成部分之一，软中断所带来的问题偶尔还是会给某些开发人员带来困扰；譬如内核社区中开发和使用实时抢占（PREEMPT_RT）补丁的人（这个补丁项目经常在性能问题上对内核提出更高的要求）。该补丁集最新的版本中提出了一个处理软中断的新方法，值得在这里给大家介绍一下。

软中断的简短回顾（A softirq introduction）

In the announcement for the 3.6.1-rt1 patch set, Thomas Gleixner described software interrupts this way:

First of all, it's a conglomerate of mostly unrelated jobs, which run in the context of a randomly chosen victim w/o the ability to put any control on them.

在 3.6.1-rt1 版本实时补丁集的公告中，Thomas Gleixner 用以下方式描述了他对软中断的看法：

首先，它将很多毫不相关的工作纠缠在一起，实际运行时会随机抢占一个任务（译者注，原文称其为一个 “被随机选中的受害者（randomly chosen victim）” 并借用其上下文执行软中断自身的工作，而且我们也无法对它的这种行为施加控制。

The softirq mechanism is meant to handle processing that is almost — but not quite — as important as the handling of hardware interrupts. Softirqs run at a high priority (though with an interesting exception, described below), but with hardware interrupts enabled. They thus will normally preempt any work except the response to a “real” hardware interrupt.

之所以要提供软中断这种工作机制，其目的是为了处理那些虽然不及硬中断重要，但比其他任务都重要的事件。软中断在开中断的状态下以高优先级运行（但有一个有趣的例外，详见下文有关 ksoftirqd 内核线程的描述）。因此，通常情况下，只有 “真正的” 硬中断会抢占它的运行，除此之外，软中断则可以抢占其他任何任务的运行。

Once upon a time, there were 32 hardwired software interrupt vectors, one assigned to each device driver or related task. Drivers have, for the most part, been detached from software interrupts for a long time — they still use softirqs, but that access has been laundered through intermediate APIs like tasklets and timers. In current kernels there are ten softirq vectors defined; two for tasklet processing, two for networking, two for the block layer, two for timers, and one each for the scheduler and read-copy-update processing. The kernel maintains a per-CPU bitmask indicating which softirqs need processing at any given time. So, for example, when a kernel subsystem calls tasklet_schedule(), the TASKLET_SOFTIRQ bit is set on the corresponding CPU and, when softirqs are processed, the tasklet will be run.

内核中曾经静态定义了 32 个软中断向量，每个对应一个设备驱动程序或相关任务。时至今日，大部分的驱动程序已经不再直接使用软中断，虽然它们仍然会利用该机制，但是对软中断的使用已经通过诸如 tasklet 和计时器之类的机制进行了封装。在当前的内核中，静态定义的软中断向量还剩下 10 个（译者注，具体参考内核对软中断向量枚举值的定义。另、为对应本译文的实际发表时间，所有参考代码皆基于内核 3.6 版本）：两个用于 tasklet 处理，两个用于网络处理，两个用于块（block）层，两个用于定时器，一个用于调度器，一个用于 RCU（read-copy-update）。内核为每一个处理器维护一个位掩码，指示在任何给定时间有哪些软中断向量正等待处理。例如，一旦某个内核子系统调用了 tasklet_schedule()，则相应处理器上的 TASKLET_SOFTIRQ 软中断标志位被设置，当实际处理软中断时，将执行 tasklet（译者注，在软中断处理术语中，设置某个软中断向量的标志位称之为 “raise”，下文翻译为 “提交”，而真正运行对应该软中断向量的处理函数被称之为 “execution”，下文翻译为 “执行”）。

There are two places where software interrupts can “fire” and preempt the current thread. One of them is at the end of the processing for a hardware interrupt; it is common for interrupt handlers to raise softirqs, so it makes sense (for latency and optimal cache use) to process them as soon as hardware interrupts can be re-enabled. The other possibility is anytime that kernel code re-enables softirq processing (via a call to functions like local_bh_enable() or spin_unlock_bh()). The end result is that the accumulated softirq work (which can be substantial) is executed in the context of whichever process happens to be running at the wrong time; that is the “randomly chosen victim” aspect that Thomas was talking about.

内核中存在两个地方会触发软中断 “执行” 并抢占当前线程。其中一处是在硬中断处理结束时；大部分软中断会在硬中断处理函数中被 “提交（raise）”，其想法是一旦恢复开启硬件中断，就可以立即检查并 “执行” 软中断（这有助于最优化 “缓存（cache）” 使用从而改善延迟）。内核代码中另一处可能 “执行” 软中断的地方在所有重新启用软中断处理时（通过调用像 local_bh_enable() 或者 spin_unlock_bh() 这类函数，译者注，这些函数最终都是进入 _local_bh_enable_ip() 函数并 “执行” 软中断）。总而言之，累积的软中断（可能会很多）会在我们无法确定的某个任务的上下文中被 “执行（execute）”；而这个任务就是 Thomas 所指的 “被随机选中的受害者”。（译者注，我们无法预期这个任务是谁，因为上述两处可能触发软中断执行的情况都是不可控的，中断随时会发生；local_bh_enable() 或者 spin_unlock_bh() 何时会被调用也同样和系统实际运行上下文有关，同样无法确定其具体发生的时间。）

Readers who have looked at the process mix on their systems may be wondering where the ksoftirqd processes fit into the picture. These processes exist to offload softirq processing when the load gets too heavy. If the regular, inline softirq processing code loops ten times and still finds more softirqs to process (because they continue to be raised), it will wake the appropriate ksoftirqd process (there is one per CPU) and exit; that process will eventually be scheduled and pick up running softirq handlers. Ksoftirqd will also be poked if a softirq is raised outside of (hardware or software) interrupt context; that is necessary because, otherwise, an arbitrary amount of time might pass before softirqs are processed again. In older kernels, the ksoftirqd processes ran at the lowest possible priority, meaning that softirq processing was, depending on where it is being run, either the highest priority or the lowest priority work on the system. Since 2.6.23, ksoftirqd runs at normal user-level priority by default.

在实际运行的系统上查看过进程状态的用户可能想知道 ksoftirqd 内核线程是如何参与软中断的实际运行的。这些线程负责在软中断处理负载过重时分担其压力。大多数处理过程（译者注，指上文介绍的内核中存在两处会触发软中断 “执行”）中如果发现连续处理了十个软中断向量后还有更多的向量需要处理（因为它们可能会被不停地被提交），内核将唤醒相应的 ksoftirqd 线程（每个 CPU 有一个）并退出当前软中断的处理（译者注，以上描述可以参考 __do_softirq() 函数中调用 wakeup_softirqd() 的处理）；该线程最终将被调度并继续对软中断进行处理。如果软中断是在（硬中断或软中断）的上下文之外被 “提交” 的，内核也会唤醒 ksoftirqd 来执行相关处理（译者注，以上描述可以参考 raise_softirq_irqoff() 函数中调用 wakeup_softirqd() 的逻辑）；这么做是必要的，否则我们在该软中断被实际 “执行” 之前可能会被延误一段很不确定的时间。在老版本的内核中，ksoftirqd 线程（默认）以最低优先级运行，这意味着软中断是否能够及时处理取决于它被执行的时机，有可能以系统的最高优先级运行（译者注，譬如在硬中断处理结束时被执行），也有可能以最低优先级工作（译者注，即由 ksoftirqd 运行）。从 2.6.23 开始，ksoftirqd 默认情况下按照普通用户优先级运行（译者注，参考提交 commit “sched: do not set softirqs to nice +19”）。

实时 Linux 中的软中断（Softirqs in the realtime setting）

On normal systems, the softirq mechanism works well enough that there has not been much motivation to change it, though, as described in “The new visibility of RCU processing,” read-copy-update work has been moved into its own helper threads for the 3.7 kernel. In the realtime world, though, the concept of forcing arbitrary processes to do random work tends to be unpopular, so the realtime patches have traditionally pushed all softirq processing into separate threads, each with its own priority. That allowed, for example, the priority of network softirq handling to be raised on systems where networking needed realtime response; conversely, it could be lowered on systems where response to network events was less critical.

在普通的系统（译者注，指非实时系统）上，当前的软中断机制运行良好，所以大家并没有很强的意愿对其进行修改，但是，正如 “RCU 处理的新方法” 一文中所述，从 3.7 版本的内核开始，（出于改进实时延迟的目的）RCU 将原先利用软中断机制实现的部分功能替换为采用线程化方式加以改进。在实时世界中，类似于像随机选择一个任意的任务执行一些不确定量的计算这样的做法是非常不受大家待见的，实时补丁传统上的做法是将所有的软中断处理都转移到单独的线程中去，每个线程都有自己的优先级。这么做的好处在于，举个例子来说，假设系统针对网络处理时需要实时响应，则我们可以相应提高网络软中断处理任务的优先级；相反，在对网络事件的响应不太敏感的系统上我们则可以降低相应软中断任务的优先级。

Starting with the 3.0 realtime patch set, though, that capability went away. It worked less well with the new approach to per-CPU data adopted then, and, as Thomas said, the per-softirq threads posed configuration problems:

It's extremely hard to get the parameters right for a RT system in general. Adding something which is obscure as soft interrupts to the system designers todo list is a bad idea.

但是，从对应 3.0 版本的实时补丁集开始，处理发生了变化。其原因是该做法与当时采用的针对 Per-CPU 数据的新方法一起工作时的效果不太好，而且正如 Thomas 所说的，针对每个软中断线程进行配置会有很多问题：

针对一个实时系统，要找到从整体上来说合适的参数是相当困难的。特别地，软中断相关的配置在概念上又显得模糊不清，这对系统管理员来说相当麻烦。

So, in 3.0, softirq handling looked very similar to how things are done in the mainline kernel. That improved the code and increased performance on untuned systems (by eliminating the context switch to the softirq thread), but took away the ability to finely tweak things for those who were inclined to do so. And realtime developers tend to be highly inclined to do just that. The result, naturally, is that some users complained about the changes.

因此，在 3.0 版本对应的实时补丁中，软中断的处理变得与主线内核中的操作非常相似。代码进过这番改进后提高了系统缺省情况下的性能（不再唤醒软中断线程从而避免了上下文切换），代价是失去了（通过线程化和调整优先级）对内核进行精细化调优的灵活性，而这种灵活性对于实时开发人员来说却是非常看重的。不出所料，改动的结果招致了部分用户的反对。

In response, in 3.6.1-rt1, the handling of softirqs has changed again. Now, when a thread raises a softirq, the specific interrupt in question (network receive processing, say) is remembered by the kernel. As soon as the thread exits the context where software interrupts are disabled, that one softirq (and no others) will be run. That has the effect of minimizing softirq latency (since softirqs are run as soon as possible); just as importantly, it also ties processing of softirqs to the processes that generate them. A process raising networking softirqs will not be bogged down processing some other process’s timers. That keeps the work local, avoids nondeterministic behavior caused by running another process’s softirqs, and causes softirq processing to naturally run with the priority of the process creating the work in the first place.

为了平息大家的抱怨，在实时补丁的 3.6.1-rt1 版本中，软中断的处理再次发生了变化。现在，当一个线程 “提交（raise）” 软中断时，内核会记住该特定中断向量的类型（比如该类型是网络接收处理）。一旦该线程退出禁用软中断的上下文，就会立刻 “执行” 该软中断向量（而不是其他向量）。这么做可以达到最小化软中断执行延迟的效果（因为软中断会被尽快执行）；同样重要的是，它还赋予 “提交” 一个软中断的线程优先 “执行” 自己所 “提交” 的软中断的权利。一个 “提交” 了网络软中断的线程不会因为需要处理其他线程提交的（譬如和定时器相关的）软中断而被拖累。由于软中断处理直接在提交该中断的线程上下文中执行（其执行优先级即该线程的优先级），避免了因为需要 “执行” 其他线程 “提交” 的软中断而可能引入的 “不确定性（nondeterministic）” 。

There is an exception, of course: softirqs raised in hardware interrupt context cannot be handled in this way. There is no general way to associate a hardware interrupt with a specific thread, so it is not possible to force the responsible thread to do the necessary processing. The answer in this case is to just hand those softirqs to the ksoftirqd process and be done with it.

当然有一个例外：硬件中断上下文中 “提交” 的软中断无法以这种方式处理。不存在通用的方法建立硬中断与特定线程的关联，因此无法强制选择一个线程来执行软中断处理。在这种情况下的解决方法依然是唤醒 ksoftirqd 线程进行处理。

A logical next step, hinted at by Thomas, is to move from an environment where all softirqs are disabled to one where only specific softirqs are. Most code that disables softirq handling is only concerned with one specific handler; all the others could be allowed to run as usual. Going further, he adds: “the nicest solution would be to get rid of them completely.” The elimination of the softirq mechanism has been on the “todo” list for a long time, but nobody has, yet, felt the pain strongly enough to actually do that work.

Thomas 暗示，合乎逻辑的下一步是减少（任务上下文）代码中对禁用软中断的使用，不需要对所有类型的软中断向量都实施禁用，只需要针对一些特定的软中断向量进行禁用就好了。大多数禁用软中断处理的代码只涉及一个特定的软中断处理函数；其他的地方完全可以按正常方式处理。更进一步，他补充说：“最好的解决办法就是完全摆脱软中断。” 长期以来，移除软中断机制这个想法一直被列在 “待办事项” 清单上，但是还没有人有足够的动力去完成实际的修改工作。

The nature of the realtime patch set has often been that its users feel the pain of mainline kernel shortcomings before the rest of us do. That has caused a great many mainline fixes and improvements to come from the realtime community. Perhaps that will eventually happen again for softirqs. For the time being, though, realtime users have an improved softirq mechanism that should give the desired results without the need for difficult low-level tuning. Naturally, Thomas is looking for people to test this change and report back on how well it works with their workloads.

使用实时补丁的那些用户总会先于其他人之前感受到主线内核中的某些缺陷所带来的痛苦。这也导致大量来自实时内核团队的问题修复被内核主线所接纳。也许这次同样的事情会再次发生在软中断上。更何况，当前实时补丁的用户们已经拥有了一个经过改进的软中断机制，不仅可以提供所需的效果，而且也无需对内核的底层机制进行复杂的配置。当然，Thomas 正在寻找志愿者来测试这一修改，并报告该补丁在实际运行环境中的使用效果。

了解更多有关 “LWN 中文翻译计划”，请点击这里