泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!


LWN 167897: 高精度定时器编程接口

Wang Chen 创作于 2018/06/05

原文:The high-resolution timer API 原创:By corbet @ Jan 16, 2006 翻译:By unicornx of TinyLab.org 校对:By guojian-at-wowo

Last September, this page featured an article on the ktimers patch by Thomas Gleixner. The new timer abstraction was designed to enable the provision of high-resolution timers in the kernel and to address some of the inefficiencies encountered when the current timer code is used in this mode. Since then, there has been a large amount of discussion, and the code has seen significant work. The end product of that work, now called “hrtimers,” was merged for the 2.6.16 release.

去年 9 月,LWN 刊登了一篇文章,介绍了由 Thomas Gleixner 开发的 ktimers 补丁。该补丁在内核中实现了高分辨率定时器功能,解决了当前定时器代码对高精度模式支持的不足。从那以后,围绕该补丁展开了大量的讨论,代码也被重新改写了很多。最终的代码被称为 “hrtimers”(译者注:即 high-resolution timers 的缩写),并合入了内核的 2.6.16 版本。

At its core, the hrtimer mechanism remains the same. Rather than using the “timer wheel” data structure, hrtimers live on a time-sorted linked list, with the next timer to expire being at the head of the list. A separate red/black tree is also used to enable the insertion and removal of timer events without scanning through the list. But while the core remains the same, just about everything else has changed, at least superficially.

补丁的核心部分,即实现高精度定时器的机制仍保持不变。对定时器对象的组织并没有使用当前的 “timer wheel” 数据结构,而是以按时间排序的方式保存在一个链表中,下一个即将到期的定时器位于表的头部。为了在插入和删除定时器对象时避免扫描整个链表,算法上辅助利用了一个单独的红黑树结构对定时器进行管理。尽管算法的核心部分保持不变,但在代码的其他方面都有修改,至少在表面上看起来是这样。

There is a new type, ktime_t, which is used to store a time value in nanoseconds. This type, found in <linux/ktime.h>, is meant to be used as an opaque structure. And, interestingly, its definition changes depending on the underlying architecture. On 64-bit systems, a ktime_t is really just a 64-bit integer value in nanoseconds. On 32-bit machines, however, it is a two-field structure: one 32-bit value holds the number of seconds, and the other holds nanoseconds. The order of the two fields depends on whether the host architecture is big-endian or not; they are always arranged so that the two values can, when needed, be treated as a single, 64-bit value. Doing things this way complicates the header files, but it provides for efficient time value manipulation on all architectures.

新代码中提供了一个新的结构体类型,ktime_t,用于保存以纳秒为单位的时间值。该类型定义在头文件 <linux / ktime.h> 中。定义该结构体类型的目的完全是为了封装,使时间值的表达和体系架构无关,其实际定义会根据底层架构的不同而变化。在 64 位系统上,该 ktime_t 是一个以纳秒为单位的 64 位整数值。然而,在 32 位机器上,该结构体包含两个字段:一个 32 位的值保存秒数,另一个保存纳秒数。这两个字段的定义顺序取决于主机的字节序类型;从而确保这两个值在需要时可以被作为一个 64 位值整体进行读取。这样做使得头文件的实现变得复杂,但优化了针对不同体系结构下时间值的访问效率。(译者注:从 3.17 版本开始, ktime_t 已经最终被统一为一个单纯的 64 位的数了。)

A whole set of functions and macros has been provided for working with ktime_t values, starting with the traditional two ways to declare and initialize them:

为了方便处理 ktime_t 类型的值,补丁提供了一整套相应的函数和宏,先看一下内核编码中常用的对象定义和初始化的方法:

DEFINE_KTIME(name);   /* Initialize to zero */

ktime_t kt;
kt = ktime_set(long secs, long nanosecs);

Various other functions exist for changing ktime_t values; all of these treat their arguments as read-only and return a ktime_t value as their result:

补丁还提供了一些其他的函数用于对 ktime_t 类型的变量进行操作; 所有这些函数的参数都是只读的并通过返回类型为 ktime_t 的值作为结果:

ktime_t ktime_add(ktime_t kt1, ktime_t kt2);
ktime_t ktime_sub(ktime_t kt1, ktime_t kt2);  /* kt1 - kt2 */
ktime_t ktime_add_ns(ktime_t kt, u64 nanoseconds);

Finally, there are some type conversion functions:


ktime_t timespec_to_ktime(struct timespec tspec);
ktime_t timeval_to_ktime(struct timeval tval);
struct timespec ktime_to_timespec(ktime_t kt);
struct timeval ktime_to_timeval(ktime_t kt);
clock_t ktime_to_clock_t(ktime_t kt);
u64 ktime_to_ns(ktime_t kt);

The interface for hrtimers can be found in <linux/hrtimer.h>. A timer is represented by struct hrtimer, which must be initialized with:

高精度定时器的操作接口定义在头文件 <linux / hrtimer.h> 中。定时器对象由 struct hrtimer 表示,必须使用以下函数进行初始化:

void hrtimer_init(struct hrtimer *timer, clockid_t which_clock);

Every hrtimer is bound to a specific clock. The system currently supports two clocks, being:

  • CLOCK_MONOTONIC: a clock which is guaranteed always to move forward in time, but which does not reflect “wall clock time” in any specific way. In the current implementation, CLOCK_MONOTONIC resembles the jiffies tick count in that it starts at zero when the system boots and increases monotonically from there.
  • CLOCK_REALTIME which matches the current real-world time.

每个高精度定时器对象都绑定到一个特定的时钟。补丁目前支持两种时钟(译者注,相关概念在LWN 152436: 一种实现内核定时器的新方法有介绍,可以参考),即:

  • CLOCK_MONOTONIC:保证时间值总是单调递增,和 “墙上时间”(”wall clock time”)无关。在当前的实现中,CLOCK_MONOTONIC 的实现类似于系统维护的 jiffies 计数,系统启动时从零开始一直单调增加。
  • CLOCK_REALTIME 会与真实世界的实际时间保持一致。

The difference between the two clocks can be seen when the system time is adjusted, perhaps as a result of administrator action, tweaking by the network time protocol code, or suspending and resuming the system. In any of these situations, CLOCK_MONOTONIC will tick forward as if nothing had happened, while CLOCK_REALTIME may see discontinuous changes. Which clock should be used will depend mainly on whether the timer needs to be tied to time as the rest of the world sees it or not. The call to hrtimer_init() will tie an hrtimer to a specific clock, but that clock can be changed with:

这两种时钟之间的区别在系统时间发生调整时会显现出来,系统时间的变化发生在如下场景,譬如系统管理员的人为调整,或者是系统基于网络时间协议栈(network time protocol)的运行对系统时间进行同步,或者由于系统的挂起和恢复导致时间发生改变。在以上例子中的任何一种情况下,基于 CLOCK_MONOTONIC 模式的时钟都会严格地单调递增,而 CLOCK_REALTIME 模式的时钟则可能会发生不连续的更改。应该使用哪种时钟主要取决于定时器的使用场景。可以通过调用 hrtimer_init() 将高精度定时器绑定到指定的时钟,但也可以通过以下方式对绑定进行更改:

void hrtimer_rebase(struct hrtimer *timer, clockid_t new_clock);

Most of the hrtimer fields should not be touched. Two of them, however, must be set by the user:

struct hrtimer 的大部分字段我们不用关心。但是,其中有两个必须由用户进行设置:

int  (*function)(void *);
void *data;

As one might expect, function() will be called when the timer expires, with data as its parameter.

显而易见,当定时器到期时,function() 将被内核回调,data 是其参数。

Actually setting a timer is accomplished with:


int hrtimer_start(struct hrtimer *timer, ktime_t time,
                  enum hrtimer_mode mode);

The mode parameter describes how the time parameter should be interpreted. A mode of HRTIMER_ABS indicates that time is an absolute value, while HRTIMER_REL indicates that time should be interpreted relative to the current time.

mode 参数描述了如何解释 time 参数。其取值为 HRTIMER_ABS 时表示 time 是一个绝对值,而取 HRTIMER_REL 则表示 time 应被解释为一个相对于当前时间的相对值。

Under normal operation, function() will be called after (at least) the requested expiration time. The hrtimer code implements a shortcut for situations where the sole purpose of a timer is to wake up a process on expiration: if function() is NULL, the process whose task structure is pointed to by data will be awakened. In most cases, however, code which uses hrtimers will provide a callback function(). That function has an integer return value, which should be either HRTIMER_NORESTART (for a one-shot timer which should not be started again) or HRTIMER_RESTART for a recurring timer.

正常情况下,function() (最快)将在所请求的时间到期之后被调用。hrtimer 补丁针对超时情况下唤醒某个进程的情况实现了一个特殊处理:如果 function() 为 NULL,则会根据 data 所指向的 task 结构唤醒所对应的进程。当然,在大多数情况下,回调函数 function() 都会被提供。该函数具有一个整数类型的返回值,该值可以是 HRTIMER_NORESTART(表明该定时器是只执行一次的单次定时器)或 HRTIMER_RESTART(表明该定时器会重复启动)。

In the restart case, the callback must set a new expiration time before returning. Usually, restarting timers are used by kernel subsystems which need a callback at a regular interval. The hrtimer code provides a function for advancing the expiration time to the next such interval:

对于重复启动的定时器,实现回调函数时必须在返回之前设置新的到期时间。通常情况下,该类型的定时器用于内核子系统中需要以固定的间隔回调函数的场景。hrtimer 补丁提供了一个函数可以对指定的定时器在已经到期的时间值上增加一个指定的间隔(interval):

unsigned long hrtimer_forward(struct hrtimer *timer, ktime_t interval);

This function will advance the timer’s expiration time by the given interval. If necessary, the interval will be added more than once to yield an expiration time in the future. Generally, the need to add the interval more than once means that the system has overrun its timer period, perhaps as a result of high system load. The return value from hrtimer_forward() is the number of missed intervals, allowing code which cares to detect and respond to the situation.

该函数将根据指定的 interval 参数计算定时器新的超时时间。如有必要,该 interval 值将被累加多次以确保到期时间一定发生在将来的某个时刻。通常,之所以要这么做的原因是由于系统的高负载造成定时器没有被及时触发(译者注:该现象在内核术语上称之为 overrun,譬如定时器的回调函数原本设定在系统时刻 7 被触发,但实际在系统时刻 12 才被触发。之所以会发生这种情况是由于高精度定时器的回调函数是在软中断中被执行的(本文背景为 v2.6.16),系统在高负载情况下,软中断的执行的不确定性可能导致定时器的回调函数无法被及时触发。在新的内核中(v4.2以后),高精度定时器的回调函数改为在硬中断上下文中执行,但也还是会存在没有及时触发(overrun)的情况,这往往是和关中断时间太长有关)。hrtimer_forward() 的返回值是内核实际累加的 interval 的次数,调用者可以通过它来检查是否发生了 overrun 并根据自己的需要做出相应处理。

Outstanding timers can be canceled with either of:


int hrtimer_cancel(struct hrtimer *timer);
int hrtimer_try_to_cancel(struct hrtimer *timer);

When hrtimer_cancel() returns, the caller can be sure that the timer is no longer active, and that its expiration function is not running anywhere in the system. The return value will be zero if the timer was not active (meaning it had already expired, normally), or one if the timer was successfully canceled. hrtimer_try_to_cancel() does the same, but will not wait if the timer function is running; it will, instead, return -1 in that situation.

hrtimer_cancel() 返回时,内核保证该定时器已经终止,而且其回调函数也不再运行。如果取消时该定时器已经过期,则返回值为 0;如果定时器取消成功,则返回 1。 hrtimer_try_to_cancel() 的功能相同,但如果调用该函数时定时器的回调函数正在运行,则该函数不会等待并立即返回,且返回值为 -1。

A canceled timer can be restarted by passing it to hrtimer_restart().

已经取消的定时器可以通过 hrtimer_restart() 重新启动。

Finally, there is a small set of query functions. hrtimer_get_remaining() returns the amount of time left before a timer expires. A call to hrtimer_active() returns nonzero if the timer is currently on the queue. And a call to:

最后,还有几个用于查询的函数。 hrtimer_get_remaining() 返回定时器到期之前剩余的时间。如果定时器还未到期,则 hrtimer_active() 返回非零值。调用如下函数:

int hrtimer_get_res(clockid_t which_clock, struct timespec *tp);

will return the true resolution of the given clock, in nanoseconds.


Read Album:

Read Related:

Read Latest: