RISC-V Linux 内核及周边技术动态第 93 期

呀呀呀创作于 2024/05/27

时间：20240526
编辑：晓瑜
仓库：RISC-V Linux 内核技术调研活动
赞助：PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

This patch series introduces support for RISC-V IOMMU architected hardware into the Linux kernel.

v1: Documentation: RISC-V: uabi: Only scalar misaligned loads are supported

We’re stuck supporting scalar misaligned loads in userspace because they were part of the ISA at the time we froze the uABI. That wasn’t the case for vector misaligned accesses, so depending on them unconditionally is a userspace bug. All extant vector hardware traps on these misaligned accesses.

GIT PULL: RISC-V Patches for the 6.10 Merge Window, Part 2

merged tag ‘riscv-for-linus-6.10-mw1’ The following changes since commit 0bfbc914d9433d8ac2763a9ce99ce7721ee5c8e0:

v4: Add Svadu Extension Support

Svadu is a RISC-V extension for hardware updating of PTE A/D bits. This patch set adds support to enable Svadu extension for both host and guest OS.

v4: bpf-next: riscv, bpf: Introduce Zba optimization

The riscv Zba extension provides instructions to accelerate the generation of addresses that index into arrays of basic data types, bpf JIT generated insn counts could be reduced by leveraging Zba for address calculation.

v1: riscv: hweight: relax assembly constraints

rd and rs don’t have to be the same.

v2: riscv: prevent pt_regs corruption for secondary idle threads

Top of the kernel thread stack should be reserved for pt_regs. However this is not the case for the idle threads of the secondary boot harts. Their stacks overlap with their pt_regs, so both may get corrupted.

v2: riscv, bpf: Use STACK_ALIGN macro for size rounding up

Use the macro STACK_ALIGN that is defined in asm/processor.h for stack size rounding up, just like bpf_jit_comp32.c does.

GIT PULL: RISC-V Patches for the 6.10 Merge Window, Part 1

There’s a pair of driver build fixes that are already on the lists and a report of a ftrace failure that might be triggered by the ftrace/AIA fix, but seems like we’re better off with these than without. This first one isn’t showing up in a in-flight merge `git diff`, but it looks pretty straight-forward

v2: dt-bindings: interrupt-controller: riscv,cpu-intc

This series of patches converts the RISC-V CPU interrupt controller to the newer dt-schema binding.

v2: KVM: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()

The other motivation for this is to avoid yet another arch hook, and more arbitrary ordering, if there’s a future need to hook kvm_sched_out() (we’ve come close on the x86 side several times). E.g. kvm_arch_vcpu_put() can simply check kvm_vcpu.scheduled_out if it needs to something specific for the vCPU being scheduled out.

v5: Define _GNU_SOURCE for sources using

Centralizes the definition of _GNU_SOURCE into KHDR_INCLUDES and removes redefinitions of _GNU_SOURCE from source code.

v3: riscv: Memory Hot(Un)Plug support

Memory Hot(Un)Plug support (and ZONE_DEVICE) for the RISC-V port

v9: Add support for Allwinner PWM on D1/T113s/R329 SoCs

v1: riscv, bpf: Introduce shift add helper with Zba optimization

Zba extension is very useful for generating addresses that index into array of basic data types. This patch introduces sh2add and sh3add helpers for RV32 and RV64 respectively, to accelerate pointer array addressing.

v1: riscv, bpf: try RVC for reg move within BPF_CMPXCHG JIT

We could try to emit compressed insn for reg move operation during CMPXCHG JIT, the instruction compression has no impact on the jump offsets of following forward and backward jump instructions.

v2: bpf-next: Use bpf_prog_pack for RV64 bpf trampoline

We used bpf_prog_pack to aggregate bpf programs into huge page to relieve the iTLB pressure on the system. We can apply it to bpf trampoline, as Song had been implemented it in core and x86 . This patch is going to use bpf_prog_pack to RV64 bpf trampoline.

进程调度

v1: net: net/sched: Add xmit_recursion level in sch_direct_xmit()

packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will hit WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path while ipvlan device has qdisc queue.

v1: sched/numa: Correct NUMA imbalance calculation

When perform load balance, a NUMA imbalance is allowed if busy CPUs is less than the maximum threshold, it remains a pair of communication tasks on the current node when the source doamin is lightly loaded. In many cases, this prevents communicating tasks being pulled apart.

内存管理

v1: kmsan: introduce test_unpoison_memory()

Add a regression test to ensure that kmsan_unpoison_memory() works the same as an unpoisoning operation added by the instrumentation. (Of course, please correct me if I’m misunderstanding how these should work).

v4: Enhance soft hwpoison handling and injection

v1: Huge remap_pfn_range for vfio-pci

Changing remap_pfn_range to install PUDs or PMDs is straightforward. The hairy part is the fault / follow side of things:

v1: mm: swap: mTHP swap allocator base on swap cluster order

This is the short term solutiolns “swap cluster order” listed in my “Swap Abstraction” discussion slice 8 in the recent LSF/MM conference.

**[v2: ioctl()-based API to query VMAs from /proc//maps](http://lore.kernel.org/linux-mm/20240524041032.1048094-1-andrii@kernel.org/)**

The new PROCMAP_QUERY ioctl() API added in this patch set was motivated by the former pattern of usage. Patch #9 adds a tool that faithfully reproduces an efficient VMA matching pass of a symbolizer, collecting a subset of covering VMAs for a given set of addresses as efficiently as possible. This tool is serving both as a testing ground, as well as a benchmarking tool. It implements everything both for currently existing text-based /proc//maps interface, as well as for newly-added PROCMAP_QUERY ioctl().

v1: Modified XArray entry bit flags as macro constants

It would be better to modify the operation on the last two bits of the entry with a macro constant name rather than using a numeric constant.

v1: mm: let kswapd work again for node that used to be hopeless but may not now

From now on, the system should run without reclaim service in background served by kswapd until direct reclaim will do for that. Even worse, tiering mechanism is no longer able to work because kswapd has stopped that the mechanism relies on.

v1: memcg: rearrage fields of mem_cgroup_per_node

Kernel test robot reported performance regression for will-it-scale test suite’s page_fault2 test case for the commit 70a64b7919cb (“memcg: dynamically allocate lruvec_stats”). After inspection it seems like the commit has unintentionally introduced false cache sharing.

v1: 9p: Enable multipage folios

Enable support for multipage folios on the 9P filesystem. This is all handled through netfslib and is already enabled on AFS and CIFS also.

v1: mm: page_type, zsmalloc and page_mapcount_reset()

Wanting to remove the remaining abuser of _mapcount/page_type along with page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be converted away from “struct page” .

v2: mm/memory: cleanly support zeropage in vm_insert_page(), vm_map_pages() and vmf_insert_mixed()

There is interest in mapping zeropages via vm_insert_pages() into MAP_SHARED mappings.

v1: mm, slab: don’t wrap internal functions with alloc_hooks()

The functions __kmalloc_noprof(), kmalloc_large_noprof(), kmalloc_trace_noprof() and their _node variants are all internal to the implementations of kmalloc_noprof() and kmalloc_node_noprof() and are only declared in the “public” slab.h and exported so that those implementations can be static inline and distinguish the build-time constant size variants.

v1: Improve dmesg output for swapfile+hibernation

While trying to use a swapfile for hibernation, I noticed that the suspend process was failing when it tried to search for the swap to use for snapshot. I had created the swapfile on ext4 and got the starting physical block offset using the filefrag command.

v1: Improve dump_page() output for slab pages

While using dump_page() on a range of pages, I noticed that there were some PG_slab pages that were also showing as PG_anon pages, according to the function output.

v1: Restructure va_high_addr_switch

The va_high_addr_switch memory selftest tests out some corner cases related to allocation and page/hugepage faulting around the switch boundary.

v2: mm: batch unlink_file_vma calls in free_pgd_range

Execs of dynamically linked binaries at 20-ish cores are bottlenecked on the i_mmap_rwsem semaphore, while the biggest singular contributor is free_pgd_range inducing the lock acquire back-to-back for all consecutive mappings of a given file.

v3: percpu_counter: add a cmpxchg-based _add_batch variant

Interrupt disable/enable trips are quite expensive on x86-64 compared to a mere cmpxchg (note: no lock prefix!) and percpu counters are used quite often.

v2: mm: refactor folio_undo_large_rmappable()

There is a repeated check for small folio (order 0) during each call of the folio_undo_large_rmappable(), so only keep folio_order() check inside the function.

v1: support large folio swap-out and swap-in for shmem

Shmem will support large folio allocation to get a better performance, however, the memory reclaim still splits the precious large folios when trying to swap-out shmem, which may lead to the memory fragmentation issue and can not take advantage of the large folio for shmeme.

v1: mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp

hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which means that we might need to call restore_reserve_on_error() upon error. vmf_anon_prepare() releases the vma lock before returning, but restore_reserve_on_error() expects the vma lock to be held by the caller.

v6: Reclaim lazyfree THP without splitting

This series adds support for reclaiming PMD-mapped THP marked as lazyfree without needing to first split the large folio via split_huge_pmd_address().

v1: memblock: introduce memsize showing reserved memory

This patch introduce a debugfs node, memblock/memsize, to see reserved memory easily.

v1: selftests: mm: check return values

Check return value and return error/skip the tests.

v1: Convert __unmap_hugepage_range() to folios

Replaces 4 calls to compound_head() with one. Also converts unmap_hugepage_range() and unmap_ref_private() to take in folios.

v2: Add NUMA-aware DAMOS watermarks

These patches allow for DAMON to select monitoring target either total memory or a specific NUMA memory node.

文件系统

v1: [CFT][experimental] net/socket.c: use straight fdget/fdput

Checking the theory that the important part in sockfd_lookup_light() is avoiding needless file refcount operations, not the marginal reduction of the register pressure from not keeping a struct file pointer in the caller.

v1: netfs: if extracting pages from user iterator fails return 0

When extracting the pages from a user iterator fails, netfs_extract_user_iter() will return 0, this situation will result in an abnormal and oversized return value for netfs_unbuffered_writer_locked() (for example, 9223372036854775807). Therefore, when the number of extracted pages is 0, set ret to 0 and jump to out.

v2: fhandle: expose u64 mount id to name_to_handle_at(2)

Now that we provide a unique 64-bit mount ID interface in statx, we can now provide a race-free way for name_to_handle_at(2) to provide a file handle and corresponding mount without needing to worry about racing with /proc/mountinfo parsing.

v3: zonefs: move super block reading from page to folio

Move reading of the on-disk superblock from page to folios.

v1: fs/adfs: add MODULE_DESCRIPTION

Fix the ‘make W=1’ issue: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/adfs/adfs.o

v1: exfat: handle idmapped mounts

Pass the idmapped mount information to the different helper functions. Adapt the uid/gid checks in exfat_setattr to use the vfsuid/vfsgid helpers.

v2: fs: fsconfig: intercept non-new mount API in advance for FSCONFIG_CMD_CREATE_EXCL command

fsconfig with FSCONFIG_CMD_CREATE_EXCL command requires the new mount api, here we should return -EOPNOTSUPP in advance to avoid extra procedure.

git pull: vfs.git misc stuff

Stuff that should've been pushed in the last merge window, but had fallen through the cracks ;-/

git pull: vfs.git last bdev series

We can easily have up to 24 flags with sane atomicity, without pushing anything out of the first cacheline of struct block_device.

git pull: vfs bdev pile 2

Next block device series - replacing ->bd_inode (me and Yu Kuai). Two trivial conflicts (block/ioctl.c and fs/btrfs/disk-io.c); proposed resolution in #merge-candidate (or in linux-next, for that matter).

git pull: vfs.git set_blocksize() (bdev pile 1)

First bdev-related pile - set_blocksize() stuff getting rid of bogus set_blocksize() uses, switching it to struct file * and verifying that caller has device opened exclusively.

v1: udf: Correct lock ordering in udf_setsize()

Syzbot has reported a lockdep warning in udf_setsize(). After some analysis this is actually harmless but this series fixes the lockdep false positive error. I plan to merge these patches through my tree.

v2: jbd2: speed up jbd2_transaction_committed()

We have already stored the sequence number of the most recently committed transaction in journal t->j_commit_sequence, we could do this check by comparing it with the given tid instead. If the given tid isn’t smaller than j_commit_sequence, we can ensure that the given transaction has been committed. That way we could drop the expensive lock and achieve about 10%20% performance gains in concurrent DIOs on may virtual machine with 100G ramdisk.

v20: Implement copy offload support

The patch series covers the points discussed in the past and most recently in LSFMM’24. We have covered the initial agreed requirements in this patch set and further additional features suggested by the community.

GIT PULL: isofs, udf, quota, ext2, reiserfs changes for 6.10-rc1

GIT PULL: fsnotify changes for 6.10-rc1

v3: iomap: avoid redundant fault_in_iov_iter_readable() judgement when use larger chunks

Since commit (5d8edfb900d5 “iomap: Copy larger chunks from userspace”), iomap will try to copy in larger chunks than PAGE_SIZE. However, if the mapping doesn’t support large folio, only one page of maximum 4KB will be created and 4KB data will be writen to pagecache each time. Then, next 4KB will be handled in next iteration. This will cause potential write performance problem. With this change, the write speed will be stable. Tested on ARM64 device.

v1: exec: Add KUnit test for bprm_stack_limits()

This adds a first KUnit test to the core exec code. With the ability to manipulate userspace memory from KUnit coming[1], I wanted to at least get the KUnit framework in place in exec.c. Most of the coming tests will likely be to binfmt_elf.c, but still, this serves as a reasonable first step.

v1: eventfd: introduce ratelimited wakeup for non-semaphore eventfd

For the NON-SEMAPHORE eventfd, a write (2) call adds the 8-byte integer value provided in its buffer to the counter, while a read (2) returns the 8-byte value containing the value and resetting the counter value to 0. Therefore, the accumulated counter values of multiple eventfd_write can be read out by a single eventfd_read. Therefore, the accumulated value of multiple writes can be retrieved by a single read.

v1: blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion.

网络设备

v1: net: ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound

Raw packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will hit WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path.

v1: net: selftests: mptcp: mark unstable subtests as flaky

Some subtests can be unstable, failing once every X runs. Fixing them can take time: there could be an issue in the kernel or in the subtest, and it is then important to do a proper analysis, not to hide real bugs.

v6: iwl-next: ixgbe: Add support for Intel(R) E610 device

This patch series adds low level support for the following features and enables link management.

v2: net: sock_map: avoid race between sock_map_close and sk_psock_put

This can be reproduced with a thread deleting an element from the sock map, while the second one creates a socket, adds it to the map and closes it.

v1: net: phy: microchip_t1s: lan865x rev.b1 support

This has been tested with a lan8650 rev.b1 chip on one end and a lan8670 usb eval board on the other end. Performance is rather lacking, the rev.b0 reaches close to the 10Mbit/s limit, but b.1 only gets about
4Mbit/s, with the same results when PLCA enabled or disabled.

v3: iwl-next: ice:Support to dump PHY config, FEC

Implementation to dump PHY configuration and FEC statistics to facilitate link level debugging of customer issues.

v1: net-next: net: stmmac: Add 2500BASEX support for integrated PCS

Qcom mac supports both SGMII and 2500BASEX with integrated PCS. Add changes to enable 2500BASEX along woth SGMII.

v2: Socket type control for Landlock

It is based on the landlock’s mic-next branch on top of v6.9 kernel version.

[net v3 PATCH] net:fec: Add fec_enet_deinit()

When fec_probe() fails or fec_drv_remove() needs to release the fec queue and remove a NAPI context, therefore add a function corresponding to fec_enet_init() and call fec_enet_deinit() which does the opposite to release memory and remove a NAPI context.

v3: Bluetooth: Add vendor-specific packet classification for ISO data

To avoid additional lookups, this patch introduces vendor-specific packet classification for Intel BT controllers to distinguish ISO data packets from ACL data packets.

v10: VMware hypercalls enhancements

VMware hypercalls invocations were all spread out across the kernel implementing same ABI as in-place asm-inline. With encrypted memory and confidential computing it became harder to maintain every changes in these hypercall implementations.

v1: net-next: net: mana: Allow variable size indirection table

Allow variable size indirection table allocation in MANA instead of using a constant value MANA_INDIRECT_TABLE_SIZE. The size is now derived from the MANA_QUERY_VPORT_CONFIG and the indirection table is allocated dynamically.

v1: ipvs: Avoid unnecessary calls to skb_is_gso_sctp

In the context of the SCTP SNAT/DNAT handler, these calls can only return true.

GIT PULL: Networking for v6.10-rc1

Quite smaller than usual. Notably it includes the fix for the unix regression you have been notified of in the past weeks. The TCP window fix will require some follow-up, already queued.

v1: iproute-next: Add support for xfrm state direction attribute

This patchset adds support for setting the new xfrm state directionattribute.

v1: net: gro: initialize network_offset in network layer

Syzkaller was able to trigger

v1: net: tcp: reduce accepted window in NEW_SYN_RECV state

Jason commit made checks against ACK sequence less strict and can be exploited by attackers to establish spoofed flows with less probes.

v3: bpf-next: netfilter: Add the capability to offload flowtable in XDP layer

This series has been tested running the xdp_flowtable_offload eBPF program on an ixgbe 10Gbps NIC (eno2) in order to XDP_REDIRECT the TCP traffic to a veth pair (veth0-veth1) based on the content of the nf_flowtable as soon as the TCP connection is in the established state:

v1: can: m_can: Add am62 wakeup support

To support mcu_mcan0 and mcu_mcan1 wakeup for the mentioned SoCs, the series introduces a notion of wake-on-lan for m_can. If the user decides to enable wake-on-lan for a m_can device, the device is set to wakeup enabled. A ‘wakeup’ pinctrl state is selected to enable wakeup flags for the relevant pins. If wake-on-lan is disabled the default pinctrl is selected.

v1: net: filter: use DEV_STAT_INC()

syzbot/KCSAN reported that races happen when multiple cpus updating dev->stats.tx_error concurrently.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.

v3: iproute2: color: default to dark background

Since the COLORFGBG environment variable isn’t always there, and anyway it seems that terminals and consoles more commonly default to dark backgrounds, make that assumption here.

v1: [resend] color: default to dark background

v2: net: af_unix: Read sk->sk_hash under bindlock during bind().

There could be a chance that sk->sk_hash changes after the lockless read. However, in such a case, non-NULL unix_sk(sk)->addr is visible under unix_sk(sk)->bindlock, and bind() returns -EINVAL without using the prefetched value.

v2: net: af_unix: Annotate data-race around unix_sk(sk)->addr.

In other functions, we still read unix_sk(sk)->addr locklessly to check if the socket is bound, and KCSAN complains about it.

v3: RESEND: can: mcp251xfd: add gpio functionality

The mcp251xfd allows two pins to be configured as GPIOs. This series adds support for this feature. The GPIO functionality is controlled with the IOCON register which has an erratum.

v1: net: usb: smsc95xx: configure external LEDs function for EVB-LAN8670-USB

By default, LAN9500A configures the external LEDs to the below function. But, EVB-LAN8670-USB uses the below external LEDs function which can be enabled by writing 1 to the LED Select (LED_SEL) bit in the LAN9500A.

v3: rds: rdma: Add ability to force GFP_NOIO

This series enables RDS and the RDMA stack to be used as a block I/O device. This to support a filesystem on top of a raw block device which uses RDS and the RDMA stack as the network transport layer.

v1: bpf-next: selftests/bpf: test_sockmap, use section names understood by libbpf

libbpf can deduce program type and attach type from the ELF section name. We don’t need to pass it out-of-band if we switch to libbpf convention

v2: net: enic: Validate length of nl attributes in enic_set_vf_port

These attributes are validated (in the function do_setlink in rtnetlink.c) using the nla_policy ifla_port_policy. The policy defines IFLA_PORT_PROFILE as NLA_STRING, IFLA_PORT_INSTANCE_UUID as NLA_BINARY and IFLA_PORT_HOST_UUID as NLA_STRING. That means that the length validation using the policy is for the max size of the attributes and not on exact size so the length of these attributes might be less than the sizes that enic_set_vf_port expects. This might cause an out of bands read access in the memcpys of the data of these attributes in enic_set_vf_port.

v1: net-next: ila: avoid genlmsg_reply when not ila_map found

The current ila_xlat_nl_cmd_get_mapping will call genlmsg_reply even if not ila_map found with user provided parameters. Then an empty netlink message will be sent and cause a WARNING like below.

[net PATCH] net: fec: free fec queue when fec_enet_mii_init() fails

commit 63e3cc2b87c2 (“arm64: dts: imx93-11x11-evk: add reset gpios for ethernet PHYs”) the rese-gpios attribute is added, but this pcal6524 is loaded later, which causes fec driver defer, the following memory leak occurs.

安全增强

v1: Input: keyboard - use sizeof(*pointer) instead of sizeof(type)

It is preferred to use sizeof(*pointer) instead of sizeof(type) due to the type of the variable can change and one needs not change the former (unlike the latter). This patch has no effect on runtime behavior.

v1: Bluetooth: Use sizeof(*pointer) instead of sizeof(type)

It is preferred to use sizeof(*pointer) instead of sizeof(type) due to the type of the variable can change and one needs not change the former (unlike the latter). This patch has no effect on runtime behavior.

v1: ext4: Use memtostr_pad() for s_volume_name

As with the other strings in struct ext4_super_block, s_volume_name is not NUL terminated. The other strings were marked in commit 072ebb3bffe6 (“ext4: add nonstring annotations to ext4.h”). Using strscpy() isn’t the right replacement for strncpy(); it should use memtostr_pad() instead.

v1: clocksource/drivers/sprd: Enable register for timer counter from 32 bit to 64 bit

Using 32 bit for suspend compensation, the max compensation time is 36 hours(working clock is 32k).In some IOT devices, the suspend time may be long, even exceeding 36 hours. Therefore, a 64 bit timer counter is needed for counting.

v3: Introduce STM32 DMA3 support

STM32 DMA3 is a direct memory access controller with different features depending on its hardware configuration. It is either called LPDMA (Low Power), GPDMA (General Purpose) or HPDMA (High Performance), and it can be found in new STM32 MCUs and MPUs.

v1: usercopy: Convert test_user_copy to KUnit test

This builds on the proposal[1] from Mark and lets me convert the existing usercopy selftest to KUnit. Besides adding this basic test to the KUnit collection, it also opens the door for execve testing (which depends on having a functional current->mm), and should provide the basic infrastructure for adding Mark’s much more complete usercopy tests.

v1: efi: pstore: Return proper errors on UEFI failures

Right now efi-pstore either returns 0 (success) or -EIO; but we do have a function to convert UEFI errors in different standard error codes, helping to narrow down potential issues more accurately.

v1: RDMA/irdma: Annotate flexible array with __counted_by() in struct irdma_qvlist_info

So annotate it with __counted_by() to make it explicit and enable some additional checks.
This allocation is done in irdma_save_msix_info().

v1: dma-buf/fence-array: Add flex array to struct dma_fence_array

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .

v2: Bluetooth: hci_core: Refactor hci_get_dev_list() function

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .

异步 IO

v2: io_uring/sqpoll: ensure that normal task_work is also run timely

With the move to private task_work, SQPOLL neglected to also run the normal task_work, if any is pending. This will eventually get run, but we should run it with the private task_work to ensure that things like a final fput() is processed in a timely fashion.

Rust For Linux

v2: rust: kernel: make impl_has_work compatible with more generics

回复: v1: rust: kernel: make impl_has_work compatible with more complex generics

v2: Rust block device driver API and null block driver

Kernel robot found a few issues with the first iteration of this patch [1]. I also rebased the patch on the Rust PR for 6.10 [2], because we have some changes to allocation going in, and this patch needs updates for those changes.
This is a resend to correct those issues.

v1: Device / Driver and PCI Rust abstractions

This patch sereis implements basic generic device / driver Rust abstractions, as well as some basic PCI abstractions.

v1: DRM Rust abstractions and Nova

This patch series implements some basic DRM Rust abstractions and a stub implementation of the Nova GPU driver.

BPF

v5: bpf-next: use network helpers, part 5

This patchset uses post_socket_cb and post_connect_cb callbacks of struct network_helper_opts to refactor do_test() in bpf_tcp_ca.c to move dctcp test dedicated code out of do_test() into test_dctcp().

v1: function_graph: Allow multiple users for function graph tracing

This is a continuation of the function graph multi user code. I wrote a proof of concept back in 2019 of this code[1] and Masami started cleaning it up.

v1: bpf-next: libbpf: configure log verbosity with env variable

Configure logging verbosity by setting LIBBPF_LOG_LEVEL environment variable, which is applied only to default logger. Once user set their custom logging callback, it is up to them to handle filtering.

v5: bpf-next: Notify user space when a struct_ops object is detached/unregistered

This patch set enables the detach feature for struct_ops links and send an event to epoll when a link is detached. Subsystems could call link->ops->detach() to detach a link and notify user space programs through epoll.

v7: bpf-next: Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.

The patch set aims to enable the use of these specific types in arrays and struct fields, providing flexibility. It examines the types of global variables or the value types of maps, such as arrays and struct types, recursively to identify these special types and generate field information for them.

v1: bpf-next: guard against access_size overflow?

Looking at commit ecc6a2101840 (“bpf: Protect against int overflow for stack access size”) and the associated syzbot report (linked below), it seems that the underlying issue is that access_size argument check_helper_mem_access() can be overflowed.

v2: bpf-next: bpf: Relax precision marking in open coded iters and may_goto loop.

Skipping precision mark at if (i > 1000) keeps ‘i’ imprecise, but arr[i] will mark ‘i’ as precise anyway, because ‘arr’ is a map. On the next iteration of the loop the patch does copy_precision() that copies precision markings for top of the loop into next state of the loop. So on the next iteration ‘i’ will be seen as precise.

v1: perf record: Use pinned BPF program for filter (v1)

This is to support the unprivileged BPF filter for profiling per-task events. Until now only root (or any user with CAP_BPF) can use the filter and we cannot add a new unprivileged BPF program types. After talking with the BPF folks at LSF/MM/BPF 2024, I was told that this is the way to go. Finally I managed to make it working with pinned BPF objects. :)

v1: bpf-next: selftests/bpf: Use prog_attach_type to attach in test_sockmap

Since prog_attach_type[] array is defined, it makes sense to use it paired with prog_fd[] array for bpf_prog_attach() and bpf_prog_detach2() instead of open-coding.

周边技术动态

Qemu

v2: Add support for RISC-V ACPI tests

Currently, bios-table-test doesn’t support RISC-V. This series enables the framework changes required and basic testing. Things like NUMA related test cases will be added later.

v3: riscv: QEMU RISC-V IOMMU Support

This series was tested using an emulated QEMU RISC-V host booting a QEMU KVM guest, passing through an emulated e1000 network card from the host to the guest. I can provide more details (e.g. QEMU command lines) if required, just let me know. For now this cover-letter is too much of an essay as is.

v1: target/riscv: Support Zabha extension

Zabha adds support AMO operations for byte and half word. If zacas has been implemented, zabha also adds support amocas.b and amocas.h.

v1: target/riscv: Implement May-Be-Operations(zimop) extension

The may be operation means that it has an initial behavior which can be redefined by later extensions to perform some other action.

v2: RISC-V virt MHP support

The RISC-V “virt” machine is currently missing memory hotplugging support (MHP). This series adds the missing virtio-md, and PC-DIMM support.

U-Boot

v1: LoongArch initial support

So far this series has implemented general support for initializing CPU, exceptions, kernel booting, CPU and timer drivers, QEMU LoongArch virt machine support and UEFI standard compliant EFI booting support.

[置顶] 泰晓 RISC-V 实验箱，配套 30+ 讲嵌入式 Linux 系统开发公开课

RISC-V Linux 内核及周边技术动态第 93 期

内核动态

RISC-V 架构支持

进程调度

内存管理

文件系统

网络设备

安全增强

异步 IO

Rust For Linux

BPF

周边技术动态

Qemu

U-Boot

猜你喜欢：

Read Album:

Read Related:

Read Latest:

支付宝打赏￥9.68元		微信打赏￥9.68元
	请作者喝杯咖啡吧