[置顶] 泰晓 RISC-V 实验箱,配套 30+ 讲嵌入式 Linux 系统开发公开课
RISC-V Linux 内核及周边技术动态第 84 期
时间:20240324
编辑:晓怡
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS
内核动态
RISC-V 架构支持
v2: arch/riscv: Enable kprobes when CONFIG_MODULES=n
Tracing with kprobes while running a monolithic kernel is currently impossible due the kernel module allocator dependency.
v6: riscv: add initial support for Canaan Kendryte K230
K230 is an ideal chip for RISC-V Vector 1.0 evaluation now. Add initial support for it to allow more people to participate in building drivers to mainline for it.
v1: riscv: merge two if-blocks for KBUILD_IMAGE
In arch/riscv/Makefile, KBUILD_IMAGE is assigned in two separate if-blocks.
v2: bpf: verifier: reject addr_space_cast insn without arena
The verifier allows using the addr_space_cast instruction in a program that doesn’t have an associated arena. This was caught in the form an invalid memory access in do_misc_fixups() when while converting addr_space_cast to a normal 32-bit mov, env->prog->aux->arena was dereferenced to check for BPF_F_NO_USER_CONV flag.
GIT PULL: RISC-V Patches for the 6.9 Merge Window
The following changes since commit e0fe5ab4192c171c111976dbe90bbd37d3976be0:
riscv: Fix pte_leaf_size() for NAPOT (2024-02-29 10:21:23 -0800)
v1: RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2
Commit 0de65288d75f (“RISC-V: selftests: cbo: Ensure asm operands match constraints”) attempted to ensure MK_CBO() would always provide to a compile-time constant when given a constant, but cpu_to_le32() isn’t necessarily going to do that. Switch to manually shifting the bytes, when needed, to finally get this right.
v1: i2c: reword i2c_algorithm according to newest specification
Start changing the wording of the I2C main header wrt. the newest I2C v7, SMBus 3.2, I3C specifications and replace “master/slave” with more appropriate terms. This first step renames the members of struct i2c_algorithm. Once all in-tree users are converted, the anonymous union will go away again. All this work will also pave the way for finally seperating the monolithic header into more fine-grained headers like “i2c/clients.h” etc.
v1: riscv: Improve sbi_ecall() code generation by reordering arguments
The sbi_ecall() function arguments are not in the same order as the ecall arguments, so we end up re-ordering the registers before the ecall which is useless and costly.
v2: riscv: Add tracepoints for SBI calls and returns
These are useful for measuring the latency of SBI calls. The SBI HSM extension is excluded because those functions are called from contexts such as cpuidle where instrumentation is not allowed.
v1: scripts/package: buildtar: Output as vmlinuz for riscv
This matches the behavior for arm64 [1] and prevents clobbering of vmlinux-${KERNELRELEASE}.
v1: RISC-V: selftests: cbo: Use exported __cpu_to_le32() with uapi header
cpu_to_le32 is not defined in uapi headers, and it could cause an error of impossible constraint in ‘asm’ during compilation. However, the reason is due to undefined reference to cpu_to_le32. __cpu_to_le32() defined from byteorder.h should be used instead.
v2: clk: starfive: jh7100: Use clk_hw for external input clocks
The Starfive JH7100 clock driver does not use the DT “clocks” property to find the external main input clock, but instead relies on the name of the actual clock provider (“osc_sys”). This is fragile, and caused breakage when sanitizing clock node names in DTS.
v1: riscv: Userspace pointer masking and tagged address ABI
RISC-V defines three extensions for pointer masking[1]:
- Smmpm: configured in M-mode, affects M-mode
- Smnpm: configured in M-mode, affects the next lower mode (S or U-mode)
- Ssnpm: configured in S-mode, affects the next lower mode (U-mode)
v2: riscv: use KERN_INFO in do_trap
Print the instruction dump with info instead of emergency level. The unhandled signal message is only for informational purpose.
v3: Support Zve32[xf] and Zve64[xfd] Vector subextensions
The series composes of two parts. The first part provides a quick fix for the issue on a recent thread[1]. The issue happens when a platform has ununified vector register length across multiple cores. Specifically, patch 1 adds a comment at a callsite of riscv_setup_vsize to clarify how vlenb is observed by the system. Patch 2 fixes the issue by failing the boot process of a secondary core if vlenb mismatches.
v4: riscv: sophgo: add dmamux support for Sophgo CV1800/SG2000 SoCs
Add dma multiplexer support for the Sophgo CV1800/SG2000 SoCs.
The patch include the following patch: http://lore.kernel.org/linux-riscv/PH7PR20MB4962F822A64CB127911978AABB4E2@PH7PR20MB4962.namprd20.prod.outlook.com/
v9: Add timer driver for StarFive JH7110 RISC-V SoC
This patch serises are to add timer driver for the StarFive JH7110 RISC-V SoC. The first patch adds documentation to describe device tree bindings. The subsequent patch adds timer driver and support JH7110 SoC. The last patch adds device node about timer in JH7110 dts.
v2: riscv: dmi: Add SMBIOS/DMI support
Enable the dmi driver for riscv which would allow access the SMBIOS info through some userspace file(/sys/firmware/dmi/*).
进程调度
v8: sched: Don’t trigger misfit if affinity is restricted
There was a discussion on handling hotplug operation removing a capacity level and lead to unnecessary misfit lb to trigger again. I opted not to handle it now, but a working patch is available in [1]. I don’t feel strongly about it and would leave it up to the maintainers to push which direction they prefer. Patch 4 will make sure that balance interval and nr_failed won’t grow unnecessarily due to bad unnecessary misfit lb. It will lead to some sub-optimality, but no incorrect behavior.
v1: sched: Improve the accuracy of sched_stat_wait statistics for rt and dl
Where commit b9c88f752268 (“sched/fair: Improve the accuracy of sched_stat_wait statistics”) fixed a wrong scenairio for cfs schedstat.
[RESEND]v2: sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending()
Zimuzo reported seeing occasional cases in perfetto traces where tasks went from sleeping directly to trace_sched_wakeup() without always seeing a trace_sched_waking().
内存管理
v1: mm: get_mm_counter() get the total memory usage of the process
Currently, the get_mm_counter() function returns only the value of the process memory counter percpu_counter ->count record, ignoring the memory usage count maintained by each CPU in the percpu_counter->counters array, which leads to an error in obtaining the memory usage count of a process, especially when there are many CPU cores. counts, especially when there are many CPU cores.
v1: mm/filemap: set folio->mapping to NULL before xas_store()
Functions such as __filemap_get_folio() check the truncation of folios based on the mapping field. Therefore setting this field to NULL earlier prevents unnecessary operations on already removed folios.
v5: mm/migrate: split source folio if it is on deferred split list
If the source folio is on deferred split list, it is likely some subpages are not used. Split it before migration to avoid migrating unused subpages.
v1: mm: add folio in swapcache if swapin from zswap
There is a report of data corruption caused by double swapin, which is only possible in the skip swapcache path on SWP_SYNCHRONOUS_IO backends.
v1: exec: Don’t disable perf events for setuid root executables
Al Grant reported that the ‘perf record’ command terminates abnormally after setting the setuid bit for the executable. To reproduce this issue, an additional condition is the binary file is owned by the root user but is running under a non-privileged user.
v1: A Summary of VMA scanning improvements explored
I am posting the summary of numa balancing improvements tried out.
(Intention is RFC and revisiting these in future when some one sees potential benefits with PATCH1 and PATCH2).
v1: selftests/mm: Parse VMA range in one go
Use sscanf() to directly parse the VMA range. No functional change is intended.
v1: THP_SWAP support for ARM64 SoC with MTE
The patch has been extracted from the larger folios swap-in series [1], incorporating some new modifications.
v2: transfer page to folio in KSM
This is the first part of page to folio transfer on KSM. Since only single page could be stored in KSM, we could safely transfer stable tree pages to folios.
v4: Improved Memory Tier Creation for CPUless NUMA Nodes
When a memory device, such as CXL1.1 type3 memory, is emulated as normal memory (E820_TYPE_RAM), the memory device is indistinguishable from normal DRAM in terms of memory tiering with the current implementation. The current memory tiering assigns all detected normal memory nodes to the same DRAM tier. This results in normal memory devices with
v2: binfmt: replace deprecated strncpy
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces.
v1: Various significant MM patches
These patches all interact in annoying ways which make it tricky to send them out in any way other than a big batch, even though there’s not really an overarching theme to connect them.
v1: selftests/mm: Confirm VA exhaustion without reliance on correctness of mmap()
Currently, VA exhaustion is being checked by passing a hint to mmap() and expecting it to fail. This patch makes a stricter test by successful write() calls from /proc/self/maps to a dump file, confirming that a free chunk is indeed not available.
v2: mm/slub: mark racy accesses on slab->slabs
The reads of slab->slabs are racy because it may be changed by put_cpu_partial concurrently. In slabs_cpu_partial_show() and show_slab_objects(), slab->slabs is only used for showing information.
v1: mm: migrate: support poison recover from migrate folio
The folio migration is widely used in kernel, memory compaction, memory hotplug, soft offline page, numa balance, memory demote/promotion, etc, but once access a poisoned source folio when migrating, the kerenl will panic.
v2: mm/page-flags: make __PageMovable return bool
make __PageMovable return bool like __folio_test_movable
v1: Improve visibility of writeback
This series tries to improve visilibity of writeback. Patch 1 make /sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi instead of only writeback info in root cgroup. Patch 2 add a new debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback info. Patch 4 add wb_monitor.
文件系统
v2: sysctl: move sysctl type to ctl_table_header
Praparation series to enable constification of struct ctl_table further down the line. No functional changes are intended.
Introduce the LANDLOCK_ACCESS_FS_IOCTL_DEV right, which restricts the use of ioctl(2) on block and character devices.
We attach the this access right to opened file descriptors, as we already do for LANDLOCK_ACCESS_FS_TRUNCATE.
v1: hfsplus: refactor copy_name to not use strncpy
strncpy() is deprecated with NUL-terminated destination strings [1].
The copy_name() method does a lot of manual buffer manipulation to eventually arrive with its desired string. If we don’t know the namespace this attr has or belongs to we want to prepend “osx.” to our final string. Following this, we’re copying xattr_name and doing a bizarre manual NUL-byte assignment with a memset where n=1.
v2: fs: aio: more folio conversion
Convert to use folio throughout aio.
v2: RFC: eventpoll: try to reuse eppoll_entry allocations
Instead of unconditionally allocating and deallocating pwq objects, try to reuse them by storing the entry in the eventpoll struct at deallocation request, and consuming that entry at allocation request. This way every EPOLL_CTL_ADD operation immediately following an EPOLL_CTL_DEL operation effectively cancels out its pwq allocation with the preceding deallocation.
v1: fuse: require FUSE drivers to opt-in for local file leases
Traditionally, we’ve allowed people to set leases on FUSE inodes. Some FUSE drivers are effectively local filesystems and should be fine with kernel-internal lease support. But others are backed by a network server that may have multiple clients, or may be backed by something non-file like entirely.
v1: Further reduce overhead of fsnotify permission hooks
The main motivation for this work was to avoid the overhead that was reported by kernel test robot on the patch that adds the upcoming per-content event hooks (i.e. FS_PRE_ACCESS/FS_PRE_MODIFY).
网络设备
v5: net-next: net/smc: SMC intra-OS shortcut with loopback-ism
This patch set acts as the second part of the new version of [1] (The first part can be referred from [2]), the updated things of this version are listed at the end.
v1: dns_resolver: correct sysfs path name in dns resolver documentation
Fix an incorrect sysfs path in dns resolver documentation
v1: bpf-next: BPF: support mark in bpf_fib_lookup
This patch series adds policy routing support in bpf_fib_lookup. This is a useful functionality which was missing for a long time, as without it some networking setups can’t be implemented in BPF. One example can be found here [1].
v1: net: tcp: properly terminate timers for kernel sockets
We had various syzbot reports about tcp timers firing after the corresponding netns has been dismantled.
v1: ipv6: fib: hide unused ‘pn’ variable
When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing a ‘make W=1’ warning:
net/ipv6/ip6_fib.c: In function ‘fib6_add’: net/ipv6/ip6_fib.c:1388:32: error: variable ‘pn’ set but not used [-Werror=unused-but-set-variable]
v2: iproute2-next: bridge: vlan: add compressvlans manpage
I followed Nikolay and Jiri’s comment and updated the patch to v2. Please check it.
Based recent discussions on LKML, provide preliminary bits of tpm_tis_core dependent drivers. Includes only bare essentials but can be extended later on case by case. This way some people may even want to read it later on.
v1: net: bpf: Don’t redirect too small packets
Some drivers ndo_start_xmit() expect a minimal size, as shown by various syzbot reports [1].
v1: net: dpll: indent DPLL option type by a tab
Indent config option type by a tab. It helps Kconfig parsers to read file without error.
v2: r8169: skip DASH fw status checks when DASH is disabled
On devices that support DASH, the current code in the “rtl_loop_wait” function raises false alarms when DASH is disabled. This occurs because the function attempts to wait for the DASH firmware to be ready, even though it’s not relevant in this case.
v1: net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
PCI11x1x Rev B0 devices might drop packets when receiving back to back frames at 2.5G link speed. Change the B0 Rev device’s Receive filtering Engine FIFO threshold parameter from its hardware default of 4 to 3 dwords to prevent the problem. Rev C0 and later hardware already defaults to 3 dwords.
v1: net-next: devlink: use kvzalloc() to allocate devlink instance resources
During live migration of a virtual machine, the SR-IOV VF need to be re-registered. It may fail when the memory is badly fragmented.
v1: net: gve: Add counter adminq_get_ptype_map_cnt to stats report
This counter counts the number of times get_ptype_map is executed on the admin queue, and was previously missing from the stats report.
GIT PULL: Networking for v6.9-rc1
I’d like to highlight Florian W stepping down as a netfilter maintainer due to constant stream of bug reports. Not sure what we can do but IIUC this is not the first such case.
v1: virtio_net: Do not send RSS key if it is not supported
There is a bug when setting the RSS options in virtio_net that can break the whole machine, getting the kernel into an infinite loop.
v1: net/netlink: how to deal with the problem of exceeding the maximum reach of nlattr’s nla_len
RTM_GETLINK for greater than about 220 VFs truncates IFLA_VFINFO_LIST due to the maximum reach of nlattr’s nla_len being exceeded. As a result, the value of nla_len overflows in nla_nest_end(). According to [1], changing the type of nla_len is not possible, but how can we deal with this overflow problem? The nla_len is constantly set to the maximum value when it overflows? Or some better ways?
v2: bpf-next: Selftests/xsk: Test with maximum and minimum HW ring size configurations
Please find enclosed a patch set that introduces enhancements and new test cases to the selftests/xsk framework. These test the robustness and reliability of AF_XDP across both minimal and maximal ring size configurations.
v1: net: stmmac: Do not enable/disable runtime PM for PCI devices
Common function stmmac_dvr_probe is called for both PCI and non-PCI device. For PCI devices pm_runtime_enable/disable are called by framework and should not be called by the driver.
v4: bpf: verifier: prevent userspace memory access
With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To thwart invalid memory accesses, the JITs add an exception table entry for all such accesses. But in case the src_reg + offset overflows and turns into a userspace address, the BPF program might read that memory if the user has mapped it.
v1: net: devlink: use kvzalloc() to allocate devlink instance resources
During live migration of a virtual machine, the SR-IOV VF need to be re-registered. It may fail when the memory is badly fragmented.
v2: flow_dissector: prevent NULL pointer dereference in __skb_flow_dissect
skb is an optional parameter, so it may be NULL. Add check defore dereference in eth_hdr.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
v3: net: s390/qeth: handle deferred cc1
The IO subsystem expects a driver to retry a ccw_device_start, when the subsequent interrupt response block (irb) contains a deferred condition code 1.
v2: net: mark racy access on sk->sk_rcvbuf
sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be changed by other threads. Mark this as benign using READ_ONCE().
v6: ice: Add get/set hw address for VFs using devlink commands
Changing the MAC address of the VFs is currently unsupported via devlink. Add the function handlers to set and get the HW address for the VFs.
v3: resend: net/ipv4: add tracepoint for icmp_send
Introduce a tracepoint for icmp_send, which can help users to get more detail information conveniently when icmp abnormal events happen.
v1: net-next: net: Rename mono_delivery_time to tstamp_type for scalibilty
mono_delivery_time was added to check if skb->tstamp has delivery time in mono clock base (i.e. EDT) otherwise skb->tstamp has timestamp in ingress and delivery_time at egress.
v1: net: mlxbf_gige: stop PHY during open() error paths
The mlxbf_gige_open() routine starts the PHY as part of normal initialization. The mlxbf_gige_open() routine must stop the PHY during its error paths.
v1: ipv6: delay procfs initialization after the ipv6 structs are ready
procfs files are created before the structure they reference are initialized. For example, if6_proc_init() creates procfs files that access structures initialized by addrconf_init().
v5: net-next: Support ICSSG-based Ethernet on AM65x SR1.0 devices
This series extends the current ICSSG-based Ethernet driver to support AM65x Silicon Revision 1.0 devices.
Notable differences between the Silicon Revisions are that there is no TX core in SR1.0 with this being handled by the firmware, requiring extra DMA channels to manage communication with the firmware (with the firmware being different as well) and in the packet classifier.
v2: net: ll_temac: platform_get_resource replaced by wrong function
Hope I am resubmitting this correctly, I’ve fixed the issues in the original submission.
v1: net: can: kvaser_pciefd: Add additional Xilinx interrupts
Since Xilinx-based adapters now support up to eight CAN channels, the TX interrupt mask array must have eight elements.
v1: net: pull-request: can 2024-03-20
Martin Jocić contributes a fix for the kvaser_pciefd driver, so that up to 8 channels on the Xilinx-based adapters can be used. This issue has been introduced in net-next for v6.9.
v3: vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI
Once enable iommu domain for one device, the MSI translation tables have to be there for software-managed MSI. Otherwise, platform with software-managed MSI without an irq bypass function, can not get a correct memory write event from pcie, will not get irqs.
v1: net: asix: Add check for usbnet_get_endpoints
Add check for usbnet_get_endpoints() and return the error if it fails in order to transfer the error.
v5: net: Report RCU QS for busy network kthreads
We observed this being a problem in production, since it can block RCU tasks from making progress under heavy load. Investigation indicates that just calling cond_resched() is insufficient for RCU tasks to reach quiescent states. This also has the side effect of frequently clearing the TIF_NEED_RESCHED flag on voluntary preempt kernels.
v1: iwl-net: i40e: Report MFS in decimal base instead of hex
If the MFS is set below the default (0x2600), a warning message is reported like the following :
MFS for port 1 has been set below the default: 600
v5: Add support for Intel PPS Generator
The goal of the PPS(Pulse Per Second) hardware/software is to generate a signal from the system on a wire so that some third-party hardware can observe that signal and judge how close the system’s time is to another system or piece of hardware.
v2: net/ipv4: add tracepoint for icmp_send
Introduce a tracepoint for icmp_send, which can help users to get more detail information conveniently when icmp abnormal events happen.
v1: net: inet: inet_defrag: prevent sk release while still in use
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call returns, the sk must not be released.
v1: pull request (net): ipsec 2024-03-19
1) Fix possible page_pool leak triggered by esp_output.From Dragos Tatulea.
2) Fix UDP encapsulation in software GSO path.From Leon Romanovsky.
Please pull or let me know if there are problems.
v1: dt-bindings: net: rfkill-gpio: add reset-gpio property
rfkill-gpio driver supports management of two gpios: reset, shutdown. Reset seems to have been missed when bindings were added.
安全增强
v1: video: fbdev: au1200fb: replace deprecated strncpy with strscpy
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces.
v2: soc: qcom: cmd-db: replace deprecated strncpy with strtomem
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces.
v1: kspp-next: compiler_types: add Endianness-dependent __counted_by_{le,be}
Some structures contain flexible arrays at the end and the counter for them, but the counter has explicit Endianness and thus __counted_by() can’t be used directly.
v1: perf/x86/rapl: Prefer struct_size over open coded arithmetic
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2].
v1: x86, relocs: Ignore relocations in .notes section on walk_relocs
The commit aaa8736370db (“x86, relocs: Ignore relocations in .notes section”) only ignore .note section on print_absolute_relocs, but it also need to add on walk_relocs to avoid relocations in .note section.
异步 IO
v1: Read/Write with meta buffer
This patchset is aimed at getting the feedback on a new io_uring interface that userspace can use to exchange meta buffer along with read/write.
v1: io_uring/alloc_cache: shrink default max entries from 512 to 128
In practice, we just need to recycle a few elements for (by far) most use cases. Shrink the total size down from 512 to 128, which should be more than plenty.
Rust For Linux
v1: WIP: Rust bindings for KMS + RVKMS
porting vkms over to rust so that we could come up with a set of rust KMS bindings for the nova driver to be able to have a modesetting driver written in rust. This driver currently doesn’t really do much, but it does load and register a modesetting device!
Introduce a wrapper around
ktime_t
with a few different useful methods.Rust Binder will use these bindings to compute how many milliseconds a transaction has been active for when dumping the current state of the Binder driver. This replicates the logic in C Binder [1].
BPF
v1: ftrace: make extra rcu_is_watching() validation check optional
Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to control whether ftrace low-level code performs additional rcu_is_watching()-based validation logic in an attempt to catch noinstr violations.
v5: bpf-next: sleepable bpf_timer (was: allow HID-BPF to do device IOs)
New version of the sleepable bpf_timer code, without the HID changes, as they can now go through the HID tree indepandantly.
v1: leds: trigger: Add led trigger for bpf
This patch set adds a new led trigger that uses the bpf subsystem for triggering leds. It is designed to be used in conjunction with a bpf program(s) that can modify led state through the use of bpf kfuncs. This is useful for providing a physical indication that a some event has occurred. In the context of bpf this could range from handling a packet to hitting a tracepoint.
v1: bpf-next: bpf: support resilient split BTF
Split BPF Type Format (BTF) provides huge advantages in that kernel modules only have to provide type information for types that they do not share with the core kernel; for core kernel types, split BTF refers to core kernel BTF type ids.
v1: dwarves: btf_encoder: add base_ref BTF feature to generate split BTF with base refs
Adding “base_ref” to –btf_features when generating split BTF will generate split and base reference BTF - the latter allows us to map references from split BTF to base BTF, even if that base BTF has changed. It does this by providing just enough information about the base types in the .BTF.base_ref section.
It appears support for the gettid() wrapper is variable across glibc versions, so may be safer to use syscall(SYS_gettid) instead.
v1: kbuild: disable pahole multithreading for reproducible builds
A BTF type_id is a numeric identifier allocated by pahole through libbpfd. Ids are incremented for each allocation. Running pahole multithreaded makes the sequence of allocations non-deterministic which also makes the type_id itself non-deterministic. As the type_id end up in the binary this breaks reproducibility.
v3: bpf-next: bpftool: Mount bpffs on provided dir instead of parent dir
When pinning programs/objects under PATH (eg: during “bpftool prog loadall”) the bpffs is mounted on the parent dir of PATH in the following situations:
- the given dir exists but it is not bpffs.
- the given dir doesn’t exist and the parent dir is not bpffs.
v1: bpf-next: Inline two LBR-related helpers
Implement inlining of bpf_get_branch_snapshot() BPF helper using generic BPF assembly approach.
v1: libbpf: add specific btf name info when do core
No logic changed, just add specific btf name when core info print, maybe it seems more understandable.
v1: uprobes: reduce contention on uprobes_tree access
Active uprobes are stored in an RB tree and accesses to this tree are dominated by read operations. Currently these accesses are serialized by a spinlock but this leads to enormous contention when large numbers of threads are executing active probes.
v1: bpf-next: Avoid goto in regs_refine_cond_op()
In case of GE/GT/SGE/JST instructions, regs_refine_cond_op() reuses the logic that does analysis of LE/LT/SLE/SLT instructions. This commit avoids the use of a goto to perform the reuse.
v1: bpf-next: bpf: mark kprobe_multi_link_prog_run as always inlined function
kprobe_multi_link_prog_run() is called both for multi-kprobe and multi-kretprobe BPF programs from kprobe_multi_link_handler() and kprobe_multi_link_exit_handler(), respectively.
v1: bpf-next: bpftool: Enable libbpf logs when loading pid_iter in debug mode
When trying to load the pid_iter BPF program used to iterate over the PIDs of the processes holding file descriptors to BPF links, we would unconditionally silence libbpf in order to keep the output clean if the kernel does not support iterators and loading fails.
v3: bpf-next: BPF raw tracepoint support for BPF cookie
Add ability to specify and retrieve BPF cookie for raw tracepoint programs. Both BTF-aware (SEC(“tp_btf”)) and non-BTF-aware (SEC(“raw_tp”)) are supported, as they are exactly the same at runtime.
v1: bpf-next: perf, amd: support capturing LBR from software events
[0] added ability to capture LBR (Last Branch Records) on Intel CPUs from inside BPF program at pretty much any arbitrary point. This is extremely useful capability that allows to figure out otherwise hard-to-debug problems, because LBR is now available based on some application-defined conditions, not just hardware-supported events.
v1: bpf-next: bpf: avoid get_kernel_nofault() to fetch kprobe entry IP
get_kernel_nofault() (or, rather, underlying copy_from_kernel_nofault()) is not free and it does pop up in performance profiles when kprobes are heavily utilized with CONFIG_X86_KERNEL_IBT=y config.
v2: uprobes: two common case speed ups
This patch set implements two speed ups for uprobe/uretprobe runtime execution path for some common scenarios: BPF-only uprobes (patches #1 and #2) and system-wide (non-PID-specific) uprobes (patch #3). Please see individual patches for details.
v1: bpf-next: xsk: Don’t assume metadata is always requested in TX completion
compl->tx_timestam != NULL
means that the user has explicitly requested the metadata via XDP_TX_METADATA+XDP_TX_METADATA_TIMESTAMP.
v1: bpf-next: bpf: check bpf_map/bpf_program fd validity
libbpf creates bpf_program/bpf_map structs for each program/map that user defines, but it allows to disable creating/loading those objects in kernel, in that case they won’t have associated file descriptor (fd < 0). Such functionality is used for backward compatibility with some older kernels.
v1: bpf-next: uprobe: uretprobe speed up
The speed up depends on instruction type that uprobe is installed and depends on specific HW type, please check patch 1 for details.
周边技术动态
Qemu
The following changes since commit fea445e8fe9acea4f775a832815ee22bdf2b0222:
Merge tag ‘pull-maintainer-final-for-real-this-time-200324-1’ of https://gitlab.com/stsquad/qemu into staging (2024-03-21 10:31:56 +0000)
v1: for-9.0: target/riscv/debug: set tval=pc in breakpoint exceptions
We’re not setting (s/m)tval when triggering breakpoints of type 2 (mcontrol) and 6 (mcontrol6). According to the debug spec section 5.7.12, “Match Control Type 6”:
“The Privileged Spec says that breakpoint exceptions that occur on instruction fetches, loads, or stores update the tval CSR with either zero or the faulting virtual address. The faulting virtual address for an mcontrol6 trigger with action = 0 is the address being accessed and which caused that trigger to fire.”
v1: target/riscv: rvv: Check single width operator for vector fp widen instructions
The require_scale_rvf function only checks the double width operator for the vector floating point widen instructions, so most of the widen checking functions need to add require_rvf for single width operator.
v1: target/riscv: rvv: Check single width operator for vfncvt.rod.f.f.w
The opfv_narrow_check needs to check the single width float operator by require_rvf.
U-Boot
v2: riscv: add support for Milk-V Mars board
The Milk-V Mars board is technically very close to the StarFive VisionFive 2 board.
With this patch series the VisionFive 2 U-Boot SPL will detect that it is running on a Milk-V board and patch the device-tree accordingly. This is the same approach that has been taken to handle the differences between the Visionfive 2 1.2B and 1.3A revisions.
v1: cmd: bootm: add ELF file support
Some operating systems (e.g. seL4) and embedded applications are ELF images. It is convenient to use FIT-images to implement trusted boot. Added “elf” image type for booting using bootm command.
v1: Support new RISC-V ISA extension properties
This would have just been a single patch (the second one), but as I reported a while back there’s a problem with extension detection when the ISA string exceeds 32 characters: https://lore.kernel.org/u-boot/20240221-daycare-reliably-8ec86f95fe71@spud/ The first patch here fixes what I see as a bit of a misuse of cpu_get_desc() in supports_extension() as a preparatory patch for adding the new properties. Or more accurately, new property, as U-Boot barely makes use of extension detection as-is in s-mode and only one of the two new properties is even needed.
猜你喜欢:
- 我要投稿:发表原创技术文章,收获福利、挚友与行业影响力
- 泰晓资讯:汇总一周技术趣闻与文章,查看「Linux 资讯」
- 知识星球:独家 Linux 实战经验与技巧,订阅「Linux知识星球」
- 视频频道:泰晓学院,B 站,发布各类 Linux 视频课
- 开源小店:欢迎光临泰晓科技自营店,购物支持泰晓原创
- 技术交流:Linux 用户技术交流微信群,联系微信号:tinylab
支付宝打赏 ¥9.68元 | 微信打赏 ¥9.68元 | |
请作者喝杯咖啡吧 |
Read Album:
- Stratovirt 的 RISC-V 虚拟化支持(四):内存模型和 CPU 模型
- Stratovirt 的 RISC-V 虚拟化支持(三):KVM 模型
- Stratovirt 的 RISC-V 虚拟化支持(二):库的 RISC-V 适配
- Stratovirt 的 RISC-V 虚拟化支持(一):环境配置
- TinyBPT 和面向 buildroot 的二进制包管理服务(3):服务端说明