RISC-V Linux 内核及周边技术动态第 91 期

呀呀呀创作于 2024/05/13

时间：20240512
编辑：晓瑜
仓库：RISC-V Linux 内核技术调研活动
赞助：PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v1: riscv: do not select MODULE_SECTIONS by default

Handling of those relocations is unnecessary. Only select MODULE_SECTIONS when RELOCATABLE.

v4: Define _GNU_SOURCE for sources using

Centralizes the definition of _GNU_SOURCE into KHDR_INCLUDES and removes redefinitions of _GNU_SOURCE from source code. asprintf into kselftest_harness.h.

v5: Support Zve32[xf] and Zve64[xfd] Vector subextensions

The series is tested on a QEMU and verified that booting, Vector programs context-switch, signal, ptrace, prctl interfaces works when we only report partial V from the ISA. This patch should be able to apply on risc-v for-next branch on top of the commit 0a16a1728790

v4: of: property: Add fw_devlink support for interrupt-map property

Supplier (interrupt controller) based on “interrupt-map” DT property.

GIT PULL: RISC-V Devicetrees for v6.10 Take 2

Microchip: A simple addition of a power-monitor on the Icicle dev board, as the binding for it is now in mainline. Support for the Milk-V Mars. This board is incredibly similar to the VisionFive v2 that is already supported, with only the really ethernet configuration being slightly different. Re-ordering of some nodes to match the DTS coding style on the th1520.

v1: riscv: change XIP’s kernel_map.size to be size of the entire kernel

Change XIP’s kernel_map.size to be the size of the entire kernel.

v1: riscv: Don’t use hugepage mappings for vmemmap if it’s not supported

Only use hugepage mapping if it is supported.

v1: riscv: dts: starfive: Enable Bluetooth on JH7100 boards

This series enables the in-kernel Bluetooth driver to work with the Broadcom Wifi/Bluetooth module on the BeagleV Starlight and StarFive VisionFive V1 boards.

v3: riscv: set trap vector earlier

So fix that by setting the exception vector earlier.

v2: riscv: Support compiling the kernel with more extensions

This series introduces Kconfig options that allow the kernel to be compiled with additional extensions. The motivation for this patch is the performance improvements that come along with compiling the kernel with these extra instructions. Additionally, alternatives that check if an extension is supported can be eliminated when the Kconfig options to assume hardware support is enabled.

进程调度

v4: time/tick-sched: idle load balancing when nohz_full cpu becomes idle.

Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle() without checking !was_stopped so that nohz_full cpu can be chosen to perform idle load balancing when it enters idle state.

[net PATCH] net/sched: Get stab before calling ops->change()

ops->change() depends on stab, there is such a situation When no parameters are passed in for the first time, stab is omitted, as in configuration 1 below. At this time, a warning “Warning: sch_taprio: Size table not specified, frame length estimates may be inaccurate” will be received. When stab is added for the second time, parameters, like configuration 2 below, because the stab is still empty when ops->change() is running, you will also receive the above warning.

v1: sched/fair: prevent unbounded task iteration in load balance

This patch now separates the number of tasks we migrate from the number of tasks we can search. Now, the search limit can be raised while keeping the nr_migrate fixed.

v3: net/sched: adjust device watchdog timer to detect stopped queue at right time

Modify watchdog next timeout to be shorter than the device specified. Compute the next timeout be equal to device watchdog timeout less the how long ago queue stop had been done. At next watchdog timeout tx timeout handler is called into if still in stopped state. Either called or not called, restore the watchdog timeout back to device specified.

v1: perf sched: Introduce schedstat tool

Existing `perf sched` is quite exhaustive and provides lot of insights into scheduler behavior but it quickly becomes impractical to use for long running or scheduler intensive workload. overall `perf sched schedstat record` is much more light- weight compare to `perf sched record`. it is very useful to analyse impact of any scheduler code changes.

v5: sched/fair: allow disabling sched_balance_newidle with sched_relax_domain_level

v1: sched: Clear user_cpus_ptr only when no intersection with the new mask

The commit 851a723e45d1c(“sched: Always clear user_cpus_ptr in do_set_cpus_allowed()”) would cause that online/offline cpu will produce different results for the !top-cpuset task. So add the judgement of whether there is an intersection between them. Clear user_cpus_ptr only when no intersection with the new mask.

v1: time/tick-sched: enable idle load balancing when nohz_full cpu becomes idle.

So, nohz_balance_enter_idle() could be called safely without !was_stooped check.

v1: sched: Introduce task_struct::latency_sensi_flag.

So latency_sensi_flag is introduced in task_struct, when it is set to 1, task only wakes up softirq daemon in __local_bh_enable_ip().

内存管理

v12: mm: report per-page metadata information

Today, we do not have any observability of per-page metadata and how much it takes away from the machine capacity. Thus, we want to describe the amount of memory that is going towards per-page metadata, which can vary depending on build configuration, machine architecture, and system use.
This patch adds 2 fields to /proc/vmstat.

v1: linux-next: mm/huge_memory: mark racy access on huge_anon_orders_always

huge_anon_orders_always and huge_anon_orders_always are accessed lockless, it is better to use the READ_ONCE() wrapper. This is not fixing any visible bug, hopefully this can cease some KCSAN complains in the future. Also do that for huge_anon_orders_madvise.

v1: -rc7: mm/huge_memory: mark huge_zero_page reserved

When I did memory failure tests recently, below panic occurs:

v1: mm/huge_memory: mark huge_zero_folio reserved

When I did memory failure tests recently, below panic occurs:

v2: arch/fault: don’t print logs for simulated poison errors

This patch is based on mm-unstable as of 2024-05-10. In particular it needs this somewhat related fix to apply cleanly.

v10: LUF(Lazy Unmap Flush) reducing tlb numbers over 90%

I’m suggesting a new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. tlb flush can be defered when folios get unmapped as long as it guarantees to perform tlb flush needed, before the folios actually become used, of course, only if all the corresponding ptes don’t have write permission. Otherwise, the system will get messed up.

v2: Enhance soft hwpoison handling and injection

This series aim at the following enhancement -
Let one hwpoison injector, that is, madvise(MADV_HWPOISON) to behave more like as if a real UE occurred.

v1: selftests/mm: hugetlb_madv_vs_map: Avoid test skipping by querying hugepage size at runtime

Since we are using a simple mmap() using MAP_HUGETLB; hence, instead of skipping the test, make it fail.

v1: rfc: mm: memcg: separate legacy cgroup v1 code and put under config option

Cgroup v1-specific code in memcontrol.c is close to 4k lines in size and it’s intervened with generic and cgroup v2-specific code. It’s a burden on developers and maintainers.
This is an RFC version, which is not 100% polished yet, so but it would be great to discuss and agree on the overall approach.

v1: introduce budgt control in readahead

This series patches would like to introduce the helper function to provide the bytes limit and apply it on readahead.

v4: large folios swap-in: handle refault cases first

This patch primarily addressing the handling of scenarios involving large folios in the swap cache. Currently, it is particularly focused on addressing the refaulting of mTHP, which is still undergoing reclamation. This approach aims to streamline code review and expedite the integration of this segment into the MM tree.
It relies on Ryan’s swap-out series, leveraging the helper function swap_pte_batch() introduced by that series.

v1: Make riscv use THP contpte support for arm64

This allows riscv to support napot (riscv equivalent to contpte) THPs by moving arm64 contpte support into mm, the previous series only merging riscv and arm64 implementations of hugetlbfs contpte.

v1: binfmt_elf: Honor PT_LOAD alignment for static PIE

This attempts to implement PT_LOAD p_align support for static PIE builds.

v5: selftests: cgroup: add tests to verify the zswap writeback path

Attempt writeback with the below steps and check using memory.stat.zswpwb if zswap writeback occurred.

v1: mm/ksm: optimize unstable_tree_search_insert()

We use unstable_tree_search_insert() to find matched page or insert our rmap_item into the unstable tree if no matched found.

v2: RESEND: Merge arm64/riscv hugetlbfs contpte support

This patchset intends to merge the contiguous ptes hugetlbfs implementation of arm64 and riscv.

v2: Merge arm64/riscv hugetlbfs contpte support

This patchset intends to merge the contiguous ptes hugetlbfs implementation of arm64 and riscv.

v1: tools/mm: allow filtering and culling by module in page_owner_sort

Extend page_owner_sort filtering and culling features to work with module names as well. The top most module is used. Fix regex error handling, failure labels were one step shifted.

v1: iomap: use huge zero folio in iomap_dio_zero

Instead of looping with ZERO_PAGE, use a huge zero folio to zero pad the block. Fallback to ZERO_PAGE if mm_get_huge_zero_folio() fails.

v3: -next: mm: memcg: make alloc_mem_cgroup_per_node_info() return bool

So change the the function to return bool (true on success) because this is slightly less confusing and more consistent with the other code.

v2: Add XSAVE layout description to Core files for debuggers to support varying XSAVE layouts

This patch proposes to add an extra .note section in the corefile to dump the CPUID information of a machine.

v1: x86/fault: speed up uffd-unit-test by 10x: rate-limit “MCE: Killing” logs

If a system experiences a lot of memory failures, then any associated printk() output really needs to be rate-limited. With this patch, all but 10 lines are suppressed, thus speeding up that particular selftest by 90% (runtime drops from 107 seconds, to 10.6 seconds).

v1: mm-unstable: mm: rmap: abstract updating per-node and per-memcg stats

The folio struct should already be in the cache at this point, so it shouldn’t cause any noticeable overhead.

文件系统

v2: vfs: move dentry shrinking outside the inode lock in ‘rmdir()’

There seems to be no actual reason for holding the inode lock any more by the time we get rid of the now uninteresting negative dentries, and it’s an effect of the calling convention.

v4: ext4: support adding multi-delalloc blocks

v1: -next: fs: fsconfig: intercept for non-new mount API in advance for FSCONFIG_CMD_CREATE_EXCL

fsconfig with FSCONFIG_CMD_CREATE_EXCL command requires the new mount api, here we should return -EOPNOTSUPP in advance to avoid extra procedure.

v1: -next: fsconfig: intercept for non-new mount API in advance for FSCONFIG_CMD_CREATE_EXCL

fsconfig with FSCONFIG_CMD_CREATE_EXCL command requires the new mount api, here we should return -EOPNOTSUPP in advance to avoid extra procedure.

v1: fsnotify: clear PARENT_WATCHED flags lazily

the underlying issue I was trying to resolve was when directories have many dentries (frequently, a ton of negative dentries), the __fsnotify_update_child_dentry_flags() operation can take a while, and it happens under spinlock.

v1: fuse: add simple request tracepoints

I’ve been timing various fuse operations and it’s quite annoying to do with kprobes. Add two tracepoints for sending and ending fuse requests to make it easier to debug and time various operations.

GIT PULL: vfs rw

The core fs signalfd, userfaultfd, and timerfd subsystems did still use f_op->read() instead of f_op->read_iter(). Convert them over since we should aim to get rid of f_op->read() at some point.
Aside from that io_uring and others want to mark files as FMODE_NOWAIT so it can make use of per-IO nonblocking hints to enable more efficient IO. Converting those users to f_op->read_iter() allows them to be marked with FMODE_NOWAIT.

v3: ext4: Don’t reduce symlink i_mode by umask if no ACL support

If CONFIG_EXT4_FS_POSIX_ACL=n then the fallback version of ext4_init_acl() will mask off the umask bits from the new inode’s i_mode. This should not be done if the inode is a symlink. If CONFIG_EXT4_FS_POSIX_ACL=y, then we go through posix_acl_create() instead which does the right thing with symlinks.

v2: fscrypt: try to avoid refing parent dentry in fscrypt_file_open

Merely checking if the directory is encrypted happens for every open when using ext4, at the moment refing and unrefing the parent, costing 2 atomics and serializing opens of different files.
The most common case of encryption not being used can be checked for with RCU instead.

v4: fs/coredump: Enable dynamic configuration of max file note size

Introduce the capability to dynamically configure the maximum file note size for ELF core dumps via sysctl. This enhancement removes the previous static limit of 4MB, allowing system administrators to adjust the size based on system-specific requirements or constraints.

v2: virtiofs: use string format specifier for sysfs tag

The existing emit call is a vector for format string injection. Use the string format specifier to avoid this problem.

v2: epoll: be better about file lifetimes

epoll can call out to vfs_poll() with a file pointer that may race with the last ‘fput()’. That would make f_count go down to zero, and while the ep->mtx locking means that the resulting file pointer tear-down will be blocked until the poll returns, it means that f_count is already dead, and any use of it won’t actually get a reference to the file any more: it’s dead regardless.

v1: blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion. Earlier, polling task used to sleep, relying on interrupt to wake it up. This made some IO take very long when interrupt-coalescing is enabled in NVMe.

网络设备

v2: net-next: ENA driver changes May 2024

This patchset contains several misc and minor changes to the ENA driver.

v1: net-next: mlx5 misc patches

This series includes patches for the mlx5 driver.

v2: tty: rfcomm: prefer struct_size over open coded arithmetic

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .

v6: net-next: add ethernet driver for Tehuti Networks TN40xx chips

This patchset adds a new 10G ethernet driver for Tehuti Networks TN40xx chips. Note in mainline, there is a driver for Tehuti Networks (drivers/net/ethernet/tehuti/tehuti.[hc]), which supports TN30xx chips. To make reviewing easier, this patchset has only basic functions. Once merged, I’ll submit features like ethtool support.

v1: bpf, sockmap: defer sk_psock_free_link() using RCU

Defer kfree() using RCU so that the attached BPF program runs without holding psock->link_lock.

v1: lib80211: Constify struct lib80211_crypto_ops

This serie constify struct lib80211_crypto_ops. This sutructure is mostly some function pointers, so having it in a read-only section when possible is safer.

[PATCH net RFC] net: ethernet: mtk_eth_soc: ppe: add source port comparison

Resolve packet loss issue on the following conditions:- utilizing multiple GMACs- device has more than 4GB DRAM- using PPE

v4: net-next: net: ethernet: mtk_eth_soc: ppe: add support for multiple PPEs

Add the missing pieces to allow multiple PPEs units, one for each GMAC. mtk_gdm_config has been modified to work on targted mac ID, the inner loop moved outside of the function to allow unrelated operations like setting the MAC’s PPE index.

v2: net-next: net: qede: flower: validate control flags

Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower … ip_flags frag`.
In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP.

v1: net-next: selftests: netfilter: nft_flowtable.sh: bump socat timeout to 1m

Looks like socat gets zapped too quickly, so increase timeout to 1m.
Could also reduce tx file size for KSFT_MACHINE_SLOW, but its preferrable to have same test for both debug and nondebug.

v5: net-next: virtio_net: rx enable premapped mode by default

This patch set makes the big mode of virtio-net to support premapped mode. And enable premapped mode for rx by default.

v2: net-next: net: fec: Convert fec driver to use lock guards

Convert the fec driver to use guard() and scoped_guard() defined in linux/cleanup.h to automate lock lifetime control in the fec driver.

v2: net-next: selftests: net: local_termination: annotate the expected failures

The bridge driver fares particularly badly […] mainly becauseit does not implement IFF_UNICAST_FLT.
We don’t want to hide the known gaps, but having a test which always fails prevents us from catching regressions. Report the cases we know may fail as XFAIL.

v1: ynl: ensure exact-len value is resolved

For type String and Binary we are currently usinig the exact-len limit value as is without attempting any name resolution. However, the spec may specify the name of a constant rather than an actual value, which would result in using the constant name as is and thus break the policy.
Ensure the limit value is passed to get_limit(), which will always attempt resolving the name before printing the policy rule.

v9: net-next: Device Memory TCP

v2: net-next: net: ethernet: cortina: TSO and pause param

This restores the TSO support as we put it on the back burner a while back.

v1: pci: Add ACS quirk for Broadcom BCM5760X NIC

Add an ACS quirk for this device so the functions can be in independent IOMMU groups and attached individually to userspace applications using VFIO.

v1: net-next: Add TX stop/wake counters

Several drivers provide TX stop and wake counters via ethtool stats. Add those to the netdev queue stats, and use them in virtio_net.

v8: bpf qdisc

This is the v8 of bpf qdisc patchset. While I would like to do more testing and performance evaluation, I think posting it now may help discussions in the upcoming LSF/MM/BPF.

[PATCH net-next 14/15 v2] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.

The XDP redirect process is two staged:
bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used.
Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list.

v3: net-next: net: A lightweight zero-copy notification

While making maximum reuse of the existing MSG_ZEROCOPY related code, this patch set introduces a new zerocopy socket notification mechanism. Users of sendmsg pass a control message as a placeholder for the incoming notifications. Upon returning, kernel embeds notifications directly into user arguments passed in. By doing so, we can significantly reduce the complexity and overhead for managing notifications. In an ideal pattern, the user will keep calling sendmsg with SCM_ZC_NOTIFICATION msg_control, and the notification will be delivered as soon as possible.

v1: iwl-next: idpf: XDP chapter I: convert Rx to libeth

Applies on top of “idpf: don’t enable NAPI and interrupts prior to allocating Rx buffers” from Tony’s tree. Sent as RFC as we’re at the end of the development cycle and several kdocs are messed up. I’ll fix them when sending non-RFC after the window opens.

[net-next PATCH] test: hsr: Extend the hsr_redbox.sh to have more SAN devices connected

After this change the single SAN device (ns3eth1) is now replaced with two SAN devices - respectively ns4eth1 and ns5eth1.
It is possible to extend this script to have more SAN devices connected by adding them to ns3br1 bridge.

v1: bpf-next: netfilter: Add the capability to offload flowtable in XDP layer

Introduce bpf_xdp_flow_offload_lookup kfunc in order to perform the lookup of a given flowtable entry based on the fib tuple of incoming traffic. bpf_xdp_flow_offload_lookup can be used as building block to offload in XDP the sw flowtable processing when the hw support is not available.

v1: dt-bindings: mfd: syscon: Add more simple compatibles

Add another batch of various “simple” syscon compatibles which were undocumented or still documented with old text bindings. Remove the old text binding docs for the ones which were documented.

v2: net-next: tcp: support rstreasons in the passive logic

In this series, I split all kinds of reasons into five part which, I think, can be easily reviewed. I respectively implement corresponding rstreasons in those functions. After this, we can trace the whole tcp passive reset with clear reasons.

v1: net-next: selftests: net: use upstream mtools

Check that the deployed mtools version is 3.0 or above. Note that the version check breaks compatibility with my fork where I didn’t bump the version, but I assume that won’t be a problem.

v2: net: ptp: ocp: adjust serial port symlink creation

The commit b286f4e87e32 (“serial: core: Move tty and serdev to be children of serial core port device”) changed the hierarchy of serial port devices and device_find_child_by_name cannot find ttyS* devices because they are no longer directly attached. Add some logic to restore symlinks creation to the driver for OCP TimeCard.

v1: net-next: netconsole: Do not shutdown dynamic configuration if cmdline is invalid

If a user provides an invalid netconsole configuration during boot time (e.g., specifying an invalid ethX interface), netconsole will be entirely disabled. Consequently, the user won’t be able to create new entries in /sys/kernel/config/netconsole/ as that directory does not exist. Create /sys/kernel/config/netconsole/ even if the command line arguments are invalid, so, users can create dynamic entries in netconsole.

v3: Add Bananapi R3 Mini

Add mt7986 based BananaPi R3 Mini SBC.

v6: net-next: net: stmmac: Add support for RZN1 GMAC devices

This series consists of a devicetree binding describing the RZN1 GMAC controller IP, a node for the GMAC1 device in the r9a06g032 SoC device tree, and the GMAC driver itself which is a glue layer in stmmac.

v2: iwl-next: ice:Support to dump PHY config, FEC

Implementation to dump PHY configuration and FEC statistics to facilitate link level debugging of customer issues. Implementation has two parts

v2: net-next: mlx5: Add netdev-genl queue stats

This change adds support for the per queue netdev-genl API to mlx5, which seems to output stats

v1: net-next: Introduce IPPROTO_SMC

This patch allows to create smc socket via AF_INET, similar to the following code.

v1: net-next: add basic PSP encryption for TCP connections

Add support for PSP encryption of TCP connections.

安全增强

v1: perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .

v3: perf/ring_buffer: Prefer struct_size over open coded arithmetic

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .

v1: seccomp: Constify sysctl subhelpers

The read_actions_logged() and write_actions_logged() helpers called by the sysctl proc handler seccomp_actions_logged_handler() are already expecting their sysctl table argument to be read-only. Actually mark the argument as const in preparation[1] for global constification of the sysctl tables.

v1: Mitigating unexpected arithmetic overflow

Over the last decade or so, our work hardening against weaknesses in various kernel APIs and eliminating the ambiguities in C language semantics have traditionally been somewhat off in one corner or another of the Linux codebase. This topic is going to be much different as it is ultimately about the C type system, which is rather front and center. So, hold on to your hats while I try to explain what’s desired here. Please try to reserve judgement until the end; as we’ve explored the topic we’ve found a lot of nuances, which I’ve tried to touch on below.

v2: uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be}

This commit only provide UAPI macros for UAPI structs that will gain annotations for _counted_by{le, be} attributes. And it is the previous step to be able to use these attributes in UAPI.

v2: Introduce STM32 DMA3 support

STM32 DMA3 is a direct memory access controller with different features depending on its hardware configuration. It is either called LPDMA (Low Power), GPDMA (General Purpose) or HPDMA (High Performance), and it can be found in new STM32 MCUs and MPUs.

v1: net-next: netdevice: define and allocate &net_device properly

In fact, this structure contains a flexible array at the end, but historically its size, alignment etc., is calculated manually. There are several instances of the structure embedded into other structures, but also there’s ongoing effort to remove them and we could in the meantime declare &net_device properly. Declare the array explicitly, use struct_size() and store the array size inside the structure, so that __counted_by() can be applied. Don’t use PTR_ALIGN(), as SLUB itself tries its best to ensure the allocated buffer is aligned to what the user expects. Also, change its alignment from %NETDEV_ALIGN to the cacheline size as per several suggestions on the netdev ML.

v1: cdrom: rearrange last_media_change check to avoid unintentional overflow

When running syzkaller with the newly reintroduced signed integer wrap sanitizer we encounter this splat.

异步 IO

v2: io_uring/rsrc: coalescing multi-hugepage registered buffers

This patch series enables coalescing registered buffers with more than one hugepages. It optimizes the DMA-mapping time and saves memory for these kind of buffers.

v3: io_uring: support sqe group and provide group kbuf

When running 64KB block size test on ublk-loop(‘ublk add -t loop –buffered_io -f $backing’), it is observed that perf is doubled.

v2: io_uring: support to inject result for NOP

The two patches add nop_flags for supporting to inject result on NOP.

v1: Propagate back queue status on accept

This series starts by changing the proto/proto_ops accept prototypes to eliminate flags/errp/kern and replace it with a structure that encompasses all of them.

v1: io_uring: add IORING_OP_NOP_FAIL

Add IORING_OP_NOP_FAIL so that it is easy to inject failure from userspace.
Like IORING_OP_NOP, the main use case is test, and it is very helpful for covering failure handling code in io_uring core change.

v1: io_uring/filetable: don’t unnecessarily clear/reset bitmap

If we’re updating an existing slot, we clear the slot bitmap only to set it again right after. Just leave the bit set rather than toggle it off and on, and move the unused slot setting into the branch of not already having a file occupy this slot.

v3: io_uring/io-wq: Use set_bit() and test_bit() at worker->flags

Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq to address potential data races. These races involve writes and reads to the same memory location by

v1: io_uring/rsrc: Add support for multi-folio buffer coalescing

Currently fixed buffers consisting of pages in one same folio(huge page) can be coalesced into a single bvec entry at registration. This patch expands it to support coalescing fixed buffers with multiple folios.

Rust For Linux

v2: kbuild: rust: split up helpers.c

Each helper file is listed explicitly and thus conflicts in the file list are still likely. However, they should be simpler to resolve than the conflicts usually seen in helpers.c.

**[v1: rust: alloc: use `if instead of match` in VecExt::reserve()](http://lore.kernel.org/rust-for-linux/20240507201709.105693-1-dakr@redhat.com/)**

In commit 1161057f53f6 (“rust: alloc: fix dangling pointer in VecExt::reserve()") the check for zero of a vector's capacity has been implemented using a \`match\` statement. Using an \`if` statement is the preferred style, hence change that.

BPF

v2: bpf-next: bpf: make list_for_each_entry portable

This patch adds a new macro can_loop to bpf_experimental, that implements the same logic than cond_break but evaluates to a boolean expression. The patch also changes all the current instances of usage of cond_break withing the header of loop accordingly.

v1: bpf-next: bpf: disable strict aliasing in test_global_func9.c

The BPF selftest test_global_func9.c performs type punning and breaks srict-aliasing rules. In this case the warning is not emitted, because s-> is initialized.
This patch disables strict aliasing in this test when building with GCC. clang seems to not optimize this particular code even when strict aliasing is enabled.

v1: bpf-next: selftests/bpf: Free strdup memory in xdp_hw_metadata

This patch adds this missing “free(saved_hwtstamp_ifname)” in cleanup() to avoid a potential memory leak in xdp_hw_metadata.c.

v1: bpf-next: use network helpers, part 5

This patchset uses post_socket_cb and post_connect_cb callbacks of struct network_helper_opts to refactor do_test() in bpf_tcp_ca.c to move dctcp test dedicated code out of do_test() into test_dctcp().

v2: riscv, bpf: Optimize zextw insn with Zba extension

The Zba extension provides add.uw insn which can be used to implement zext.w with rs2 set as ZERO.

v4: perf/core: Check sample_type in sample data saving helper functions

We use helper functions to save raw data, callchain and branch stack in perf_sample_data. These functions update perf_sample_data->dyn_size without checking event->attr.sample_type, which may result in unused space allocated in sample records. To prevent this from happening, this patchset enforces checking sample_type of an event in these helper functions.

v1: bpf-next: Retire progs/test_sock_addr.c

This patch series migrates remaining tests from bpf/test_sock_addr.c to prog_tests/sock_addr.c and progs/verifier_sock_addr.c in order to fully retire the old-style test program and expands test coverage to test previously untested scenarios related to sockaddr hooks.

[PATCH net-next 14/15 v2] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.

At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists.
On PREEMPT_RT the pointer to bpf_net_context is saved task’s task_struct. On non-PREEMPT_RT builds the pointer saved in a per-CPU variable (which is always NODE-local memory). Using always the bpf_net_context approach has the advantage that there is almost zero

v2: bpf-next: bpf: make trusted args nullable

Current verifier checks for the arg to be nullable after checking for certain pointer types. It prevents programs to pass NULL to kfunc args even if they are marked as nullable. This patchset adjusts verifier and changes bpf crypto kfuncs to allow null for IV parameter which is optional for some ciphers. Benchmark shows 4% improvements when there is no need to initialise 0-sized dynptr.

v1: dwarves: btf_encoder: add “distilled_base” BTF feature to split BTF generation

Adding “distilled_base” to –btf_features when generating split BTF will create split and .BTF.base BTF - the latter allows us to map references from split BTF to base BTF, even if that base BTF has changed. It does this by providing just enough information about the base types in the .BTF.base section.
Patch is applicable on the “next” branch of dwarves, and requires the libbpf from the series in

v3: bpf-next: bpf: support resilient split BTF

For a STRUCT sk_buff, a module that uses that structure (or a pointer to it) simply needs to refer to the core kernel type id, saving the need to define the structure and its many dependents. This cuts down on duplication and makes BTF as compact as possible.

v5: bpf-next: Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.

The patch set aims to enable the use of these specific types in arrays and struct fields, providing flexibility. It examines the types of global variables or the value types of maps, such as arrays and struct types, recursively to identify these special types and generate field information for them.

v3: bpf-next: Notify user space when a struct_ops object is detached/unregistered

This patch set enables the detach feature for struct_ops links and send an event to epoll when a link is detached. Subsystems could call link->ops->detach() to detach a link and notify user space programs through epoll.

v3: perf:core: Save raw sample data

v8: bpf-next: Replace mono_delivery_time with tstamp_type

周边技术动态

Qemu

v1: target/riscv: Support RISC-V privilege 1.13 spec

Based on the change log for the RISC-V privilege 1.13 spec, add the support for ss1p13.

v5: target/riscv: Implement dynamic establishment of custom decoder

In this patch, we modify the decoder to be a freely composable data structure instead of a hardcoded one. It can be dynamically builded up according to the extensions.