实时分析工具 rtla timerlat 介绍（一）：交叉编译及使用

王杰迅创作于 2024/11/04

Corrector: TinyCorrect v0.2-rc2 - [spaces header] Author: 王杰迅 wangjiexun@foxmail.com Date: 2024/09/03 Revisor: falcon falcon@tinylab.org Project: RISC-V Linux 内核剖析 Sponsor: PLCT Lab, ISCAS

前言

rtla timerlat 是一个 Linux 调度延迟测试及分析工具。

传统的 Linux 调度延迟测试工具（如 cyclictest）有一个局限性：只能测试出整体延迟，无法提供更详细的延迟分析，典型结果如下：

# cyclictest --mlockall --priority=97
policy: fifo: loadavg: 0.33 0.16 0.09 1/51 102
T: 0 (  102) P:97 I:1000 C:  10000 Min:    114 Act:  181 Avg:  185 Max:     417

而 rtla timerlat 可以将调度延迟划分为多个部分，便于开发者进行分析，找到造成较大延迟的原因，典型结果如下：

# rtla timerlat top -T 500 -s 500 -t -k -P f:95
  0 00:00:01   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max
  0 #1015      |      236        37        42       236 |      594       213       227       594
---------------|----------------------------------------|---------------------------------------
ALL #1015   e0 |                 37        42       236 |                213       227       594
rtla timerlat hit stop tracing
## CPU 0 hit stop tracing, analyzing it ##
  IRQ handler delay:                                       157.68 us (26.52 %)
  IRQ latency:                                             236.24 us
  Timerlat IRQ duration:                                   242.08 us (40.72 %)
  Blocking thread:                                         101.84 us (17.13 %)
                            rtla:145                       101.84 us
    Blocking thread stack trace
                -> stack_trace_save
                -> timerlat_save_stack.constprop.0
                -> timerlat_irq
                -> trace_event_buffer_commit
                -> __hrtimer_run_queues.constprop.0
                -> hrtimer_interrupt
                -> riscv_timer_interrupt
                -> handle_percpu_devid_irq
                -> ring_buffer_map
                -> handle_irq_desc
                -> riscv_intc_irq
                -> handle_riscv_irq
------------------------------------------------------------------------
  Thread latency:                                          594.48 us (100%)

可以看到，除了提供整体延迟（Thread latency）外，rtla timerlat 还提供了各个阶段产生的延迟，后续我们会对各个延迟的具体含义进行解释及分析。

交叉编译

本文主要介绍 rtla 交叉编译的方法，在本地编译及运行的步骤基本一致，编译时不指定交叉编译器即可。

rtla 的源代码位于 Linux 内核源代码的 tools/tracing/rtla 目录下，其依赖于 libtracefs 和 libtraceevent，因此需要先交叉编译这两个库：

$ git clone git://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git
$ cd libtraceevent/
$ make CROSS_COMPILE=/path/to/cross/compile/riscv64-xx- CC=/path/to/cross/compile/riscv64-xx-gcc LD=/path/to/cross/compile/riscv64-xx-ld
$ sudo make install
$ cd ..
$ git clone git://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git
$ cd libtracefs/
$ make CROSS_COMPILE=/path/to/cross/compile/riscv64-xx- CC=/path/to/cross/compile/riscv64-xx-gcc LD=/path/to/cross/compile/riscv64-xx-ld

随后进入 rtla 源码目录下进行编译即可：

make CROSS_COMPILE=/path/to/cross/compile/riscv64-xx- CC=/path/to/cross/compile/riscv64-xx-gcc

编译完成后将其拷贝到开发板的根文件系统即可。

编译问题

在编译 rtla 前，Linux 源码首先会检查依赖的 libtracefs 和 libtraceevent 是否被成功安装。而在 6.11 之前的源码中，检测代码有微小 Bug，导致已经安装库后仍会检测不到，6.11-rc1 已进行了部分修复，可参见 commit1、commit2。

除此之外，还需修改 tools/build/features/test-libtraceevent.c 中的 #include <traceevent/trace-seq.h> 为 #include <trace-seq.h>（类似 commit1）。

如果编译时报错未定义类型，如：

tracefs.h:55:7: error: unknown type name 'cpu_set_t'

可能是结构体的定义位于交叉编译器的库中，需要在 rtla 相应源码中添加：

#define _GNU_SOURCE

使用方法

内核配置

rtla timerlat 之所以能够获取更详细的延迟数据，是因为其不仅仅是一个应用程序，还使用了内核提供的 tracer 功能。因此需要使能内核的 timerlat tracer：

Kernel hacking  --->
  Tracers(FTRACE [=y])  --->
    Timerlat tracer(TIMERLAT_TRACER [=y])

常用参数

rtla timerlat 有两种模式：top 模式和 hist 模式。其中 top 模式记录测试过程中的最大延迟和函数调用栈，而 hist 模式则记录了测试过程中所有延迟的直方图。以下为常用的测试参数及其含义：

-p, –period us：设置延迟的测量周期（以 us 为单位）。

-i, –irq us：当 IRQ 延迟大于该设定值时将会停止测试。

-T, –thread us：当 Thread 延迟大于该设定值时将会停止测试。

-s, –stack us：当 Thread 延迟大于该设定值时将会保存函数调用栈。

-t, –trace [file]：保存测试结果到指定文件（默认为当前目录下的 timerlat_trace.txt）。

-q, –quiet：仅在测试结束后输出总结数据。

-d, –duration time[s

d]：设置测试过程的持续时间。

-P, –priority o:prio

r:prio

f:prio

d:runtime:period：设置测试线程的调度参数，例如 -P f:95 代表使用 SCHED_FIFO 调度策略且优先级为 95。o 代表 SCHED_OTHER，r 代表使用 SCHED_RR，f 代表使用 SCHED_FIFO。

更详尽的介绍可自行查阅 rtla timerlat 文档。

使用案例

一个典型命令为 rtla timerlat top -T 500 -s 500 -t -k -P f:95，该命令使用设置测试线程调度策略为 SCHED_FIFO，并且优先级为 95，当 Thread 延迟大于 500us 时将会停止测试，并打印函数调用栈。

  0 00:00:01   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max
  0 #1015      |      236        37        42       236 |      594       213       227       594
---------------|----------------------------------------|---------------------------------------
ALL #1015   e0 |                 37        42       236 |                213       227       594
rtla timerlat hit stop tracing
## CPU 0 hit stop tracing, analyzing it ##
  IRQ handler delay:                                       157.68 us (26.52 %)
  IRQ latency:                                             236.24 us
  Timerlat IRQ duration:                                   242.08 us (40.72 %)
  Blocking thread:                                         101.84 us (17.13 %)
                            rtla:145                       101.84 us
    Blocking thread stack trace
                -> stack_trace_save
                -> timerlat_save_stack.constprop.0
                -> timerlat_irq
                -> trace_event_buffer_commit
                -> __hrtimer_run_queues.constprop.0
                -> hrtimer_interrupt
                -> riscv_timer_interrupt
                -> handle_percpu_devid_irq
                -> ring_buffer_map
                -> handle_irq_desc
                -> riscv_intc_irq
                -> handle_riscv_irq
------------------------------------------------------------------------
  Thread latency:                                          594.48 us (100%)

上面是在某单核 RISC-V 开发板上的执行结果，由测试数据和延迟分析两部分构成。

测试数据以表格形式出现，由于该开发板仅有单核，因此只有一行数据。对于多核开发板，每个 CPU 都有一行测试数据。测试的延迟数据分为 IRQ Timer Latency 和 Thread Timer Latency 两类，IRQ Timer Latency 代表进入定时器中断处理程序的延迟，Thread Timer Latency 代表进入测试线程的延迟，后者在数值上包含前者。cur、min、avg、max 分别代表测试数据中的当前值、最小值、平均值、最大值。

延迟分析提供了更细化的各阶段延迟，并打印出了可能造成阻塞的线程名称和函数调用栈，便于开发者进行延迟分析。

总结

本文介绍了 rtla timerlat 的基本功能、编译过程和使用方法，后续文章将介绍其原理及各个延迟的具体含义。

[置顶] 泰晓 RISC-V 实验箱，配套 30+ 讲嵌入式 Linux 系统开发公开课

实时分析工具 rtla timerlat 介绍（一）：交叉编译及使用

前言

交叉编译

编译问题

使用方法

内核配置

常用参数

使用案例

总结

参考资料

猜你喜欢：

Read Album:

Read Related:

Read Latest:

支付宝打赏￥9.68元		微信打赏￥9.68元
	请作者喝杯咖啡吧