泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

泰晓实验箱:上手简单,配公开课
请稍侯

RISC-V Linux 内核及周边技术动态第 48 期

Zhangjin Wu 创作于 2023/06/13

时间:20230604
编辑:晓依
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v1: tools/nolibc: add two new syscall helpers

When I worked on adding new syscalls and the related library routines, I have seen most of the library routines share the same syscall call and return logic, this patchset adds two macros to simplify and shrink them.

v3: nolibc: add part2 of support for rv32

This is the v3 part2 of support for rv32, differs from the v2 part2 [1], we only fix up compile issues in this patchset.

With the v3 generic part1 [2] and this patchset, we can compile nolibc for rv32 now.

This is based on the idea of suggestions from Arnd [3], instead of ‘#error’ on the unsupported syscall on a target platform, a ‘return -ENOSYS’ allow us to compile it at first and then allow we fix up the test failures reported by nolibc-test one by one.

v3: nolibc: add generic part1 of prepare for rv32

This is the v3 generic part1 for rv32, all of the found issues of v2 part1 [1] have been fixed up, several generic patches have been fixed up and merged from v2 part2 [2] to this series, the standalone test_fork patch [4] is merged with a Reviewed-by line into this series too.

v2: Use MMU read lock for clear-dirty-log

This series is on top of kvmarm/next as I needed to also modify Eager page splitting logic in clear-dirty-log API. Eager page splitting is not present in Linux 6.4-rc4.

v2: Add initialization of clock for StarFive JH7110 SoC

This patchset adds initial rudimentary support for the StarFive Quad SPI controller driver. And this driver will be used in StarFive’s VisionFive 2 board. In 6.4, the QSPI_AHB and QSPI_APB clocks changed from the default ON state to the default OFF state, so these clocks need to be enabled in the driver.At the same time, dts patch is added to this series.

v1: Add DRM driver for StarFive SoC JH7110

This series is a DRM driver for StarFive SoC JH7110, which includes a display controller driver for Verisilicon DC8200 and an HMDI driver.

We use GEM framework for buffer management and allocate memory by using DMA APIs.

v2: gpio: sifive: Add missing check for platform_get_irq

Add the missing check for platform_get_irq and return error code if it fails.

v2: Add support for Allwinner GPADC on D1/T113s/R329/T507 SoCs

This series adds support for general purpose ADC (GPADC) on new Allwinner’s SoCs, such as D1, T113s, T507 and R329. The implemented driver provides basic functionality for getting ADC channels data.

v2: riscv/purgatory: Do not use fortified string functions

This means that the memcpy() calls with “buf” as a destination in sha256.c’s code will attempt to perform run-time bounds checking, which could lead to calling missing functions, specifically a potential WARN_ONCE, which isn’t callable from purgatory.

v1: mm: jit/text allocator

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystmes that need to allocate code, such as ftrace, kprobes and BPF to modules and puts the burden of code allocation to the modules code.

v4: StarFive’s Pulse Width Modulation driver support

This patchset adds initial rudimentary support for the StarFive Pulse Width Modulation controller driver. And this driver will be used in StarFive’s VisionFive 2 board.The first patch add Documentations for the device and Patch 2 adds device probe for the module.

v3: Split ptdesc from struct page

The MM subsystem is trying to shrink struct page. This patchset introduces a memory descriptor for page table tracking - struct ptdesc.

This patchset introduces ptdesc, splits ptdesc from struct page, and converts many callers of page table constructor/destructors to use ptdescs.

v2: riscv: mm: Pre-allocate PGD entries for vmalloc/modules area

The RISC-V port requires that kernel PGD entries are to be synchronized between MMs. This is done via the vmalloc_fault() function, that simply copies the PGD entries from init_mm to the faulting one.

v3: RISC-V: KVM: Ensure SBI extension is enabled

Ensure guests can’t attempt to invoke SBI extension functions when the SBI extension’s probe function has stated that the extension is not available.

v1: selftests/nolibc: add user-space ‘efault’ handler

This is not really for merge, but only let it work as a demo code to test whether it is possible to restore the next test when there is a bad pointer access in user-space [1].

v1: fdt: Mark “/reserved-memory” nodes as nosave if !reusable

In the RISC-V kernel, the firmware does not mark the region it uses as “no-map” so that the kernel can avoid having holes in the linear mapping and then use larger pages.

v2: nolibc: add part3 of support for rv32

Hi, Willy

These two patches are based on part2 of support for rv32 [1], I have forgotten to send them together.

v1: riscv: mm: Pre-allocate PGD entries vmalloc/modules area

The RISC-V port requires that kernel PGD entries are to be synchronized between MMs. This is done via the vmalloc_fault() function, that simply copies the PGD entries from init_mm to the faulting one.

v5: Add JH7110 MIPI DPHY RX support

This patchset adds mipi dphy rx driver for the StarFive JH7110 SoC. It is used to transfer CSI camera data. The series has been tested on the VisionFive 2 board.

v1: riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idle

With this configuration opened, the basic platform-independent s2idle is provided by the sole “s2idle” string in /sys/power/mem_sleep.

At the end of s2idle, harts will hit the wfi instruction or enter the SUSPENDED state through the sbi_cpuidle driver. The interrupt of possible wakeup devices will be kept to wake the system up.

v12: -next: riscv: Add independent irq/softirq stacks

This patch series adds independent irq/softirq stacks to decrease the press of the thread stack. Also, add a thread STACK_SIZE config for users to adjust the proper size during compile time.

进程调度

v1: net: sched: wrap tc_skip_wrapper with CONFIG_RETPOLINE

This patch fixes the following sparse warning:

net/sched/sch_api.c:2305:1: sparse: warning: symbol ‘tc_skip_wrapper’ was not declared. Should it be static?

v1: net-next: net/sched: introduce pretty printers for Qdiscs

Sometimes when debugging Qdiscs it may be confusing to know exactly what you’re looking at, especially since they’re hierarchical. Pretty printing the handle, parent handle and netdev is a bit cumbersome, so this patch proposes a set of wrappers around __qdisc_printk() which are heavily inspired from __net_printk().

v1: sched: EEVDF and latency-nice and/or slice-attr

Latest version of the EEVDF [1] patches.

The only real change since last time is the fix for tick-preemption [2], and a simple safe-guard for the mixed slice heuristic.

Other than that, I’ve re-arranged the patches to make EEVDF come first and have the latency-nice or slice-attribute patches on top.

v2: sched/fair: Don’t balance task to its current running CPU

The new_dst_cpu is chosen from the env->dst_grpmask. Currently it contains CPUs in sched_group_span() and if we have overlapped groups it’s possible to run into this case. This patch makes env->dst_grpmask of group_balance_mask() which exclude any CPUs from the busiest group and solve the issue. For balancing in a domain with no overlapped groups the behaviour keeps same as before.

v8: sched/fair: Scan cluster before scanning LLC in wake-up path

This is the follow-up work to support cluster scheduler. Previously we have added cluster level in the scheduler for both ARM64[1] and X86[2] to support load balance between clusters to bring more memory bandwidth and decrease cache contention. This patchset, on the other hand, takes care of wake-up path by giving CPUs within the same cluster a try before scanning the whole LLC to benefit those tasks communicating with each other.

v1: sched: deadline: Simplify pick_earliest_pushable_dl_task()

Using the while statement instead of the if and goto statements is more concise and efficient.

内存管理

v6: mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8

Here’s version 6 of the series reducing the kmalloc() minimum alignment on arm64 to 8 (from 128). There are patches already to do the same for riscv (pretty straight-forward after this series).

The first 11 patches decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those aligned to the run-time probed cache_line_size(). On arm64 we gain the kmalloc-{64,192} caches.

v1: Introduce cmpxchg128() – aka. the demise of cmpxchg_double()

After much breaking of things, find here the improved version.

v2: net-next: splice, net: Handle MSG_SPLICE_PAGES in AF_TLS

Here are patches to make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol that it should splice the pages supplied if it can. Its sendpage implementations are then turned into wrappers around that.

v7: bio: check return values of bio_add_page

We have two functions for adding a page to a bio, __bio_add_page() which is used to add a single page to a freshly created bio and bio_add_page() which is used to add a page to an existing bio.

While __bio_add_page() is expected to succeed, bio_add_page() can fail.

v2: net-next: splice, net: Handle MSG_SPLICE_PAGES in AF_KCM

Here are patches to make AF_KCM handle the MSG_SPLICE_PAGES internal sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol that it should splice the pages supplied if it can. Its sendpage implementation is then turned into a wrapper around that.

v2: net-next: splice, net: Handle MSG_SPLICE_PAGES in Chelsio-TLS

Here are patches to make Chelsio-TLS handle the MSG_SPLICE_PAGES internal sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol that it should splice the pages supplied if it can. Its sendpage implementation is then turned into a wrapper around that.

v1: make unregistration of super_block shrinker more faster

The kernel test robot noticed a -88.8% regression of stress-ng.ramfs.ops_per_sec on commit f95bdb700bc6 (“mm: vmscan: make global slab shrink lockless”). More details can be seen from the link[1] below.

v2: mm/migrate_device: Try to handle swapcache pages

Migrating file pages and swapcache pages into device memory is not supported. The decision is done based on page_mapping(). For now, swapcache pages are not migrated.

v1: mm: zswap: multiple zpool support

Support using multiple zpools of the same type in zswap, for concurrency purposes. Add CONFIG_ZSWAP_NR_ZPOOLS_ORDER to control the number of zpools. The order is specific by the config rather than the absolute number to guarantee a power of 2. This is useful so that we can use deterministically link each entry to a zpool by hashing the zswap_entry pointer.

v3: zswap: do not shrink if cgroup may not zswap

Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy. If this limit is reached or exceeded, then zswap shrinking is triggered and short-circuits the store attempt.

v1: mm: zswap: support exclusive loads

Commit 71024cb4a0bf (“frontswap: remove frontswap_tmem_exclusive_gets”) removed support for exclusive loads from frontswap as it was not used.

Bring back exclusive loads support to frontswap by adding an exclusive_loads argument to frontswap_ops. Add support for exclusive loads to zswap behind CONFIG_ZSWAP_EXCLUSIVE_LOADS.

v1: zswap: do not shrink when memory.zswap.max is 0

Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy. If this limit is reached or exceeded, then zswap shrinking is triggered and short-circuits the store attempt.

v2: net-next: crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)

Here’s the fourth tranche of patches towards providing a MSG_SPLICE_PAGES internal sendmsg flag that is intended to replace the ->sendpage() op with calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol that it should splice the pages supplied if it can.

v4: sock: Improve condition on sockmem pressure

Currently the memcg’s status is also accounted into the socket’s memory pressure to alleviate the memcg’s memstall. But there are still cases that can be improved. Please check the patches for detailed info.

v2: string: use __builtin_memcpy() in strlcpy/strlcat

lib/string.c is built with -ffreestanding, which prevents the compiler from replacing certain functions with calls to their library versions.

v1: -next: mm: page_alloc: simplify has_managed_dma()

The ZONE_DMA should only exists on Node 0, only check NODE_DATA(0) is enough, so simplify has_managed_dma() and make it inline.

v1: mm: free retracted page table by RCU

Here is the third series of patches to mm (and a few architectures), based on v6.4-rc3 with the preceding two series applied: in which khugepaged takes advantage of pte_offset_map_lock allowing for pmd transitions.

v1: Do not print page type when the page has no type

It is confusing and unnecessary to print the page type when the page has no type.

文件系统

v2: Create large folios in iomap buffered write path

Commit ebb7fb1557b1 limited the length of ioend chains to 4096 entries to improve worst-case latency. Unfortunately, this had the effect of limiting the performance of:

fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30
-iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1
-numjobs=4 -directory=/mnt/test

The problem ends up being lock contention on the i_pages spinlock as we clear the writeback bit on each folio (and propagate that up through the tree). By using larger folios, we decrease the number of folios to be processed by a factor of 256 for this benchmark, eliminating the lock contention.

v1: highmem: Rename put_and_unmap_page() to unmap_and_put_page()

With commit 849ad04cf562a (“new helper: put_and_unmap_page()”), Al Viro introduced the put_and_unmap_page() to use in those many places where we have a common pattern consisting of calls to kunmap_local() + put_page().

v1: fs: Rename put_and_unmap_page() to unmap_and_put_page()

With commit 849ad04cf562a (“new helper: put_and_unmap_page()”), Al Viro introduced the put_and_unmap_page() to use in those many places where we have a common pattern consisting of calls to kunmap_local() + put_page().

v2: zonefs: use iomap for synchronous direct writes

Remove the function zonefs_file_dio_append() that is used to manually issue REQ_OP_ZONE_APPEND BIOs for processing synchronous direct writes and use iomap instead.

v1: fs.h: Optimize file struct to prevent false sharing

In the syscall test of UnixBench, performance regression occurred due to false sharing.

v1: fuse: Abort the requests under processing queue with a spin_lock

There is a potential race/timing issue while aborting the requests on processing list between fuse_dev_release() and fuse_abort_conn(). This is resulting into below warnings and can even result into UAF issues.

v3: NFSD: recall write delegation on GETATTR conflict

This patch series adds the recall of write delegation when there is conflict with a GETATTR and a counter in /proc/net/rpc/nfsd to keep count of this recall.

v4: fs/sysv: Null check to prevent null-ptr-deref bug

sb_getblk(inode->i_sb, parent) return a null ptr and taking lock on that leads to the null-ptr-deref bug.

Reported-by: syzbot+aad58150cbc64ba41bdc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=aad58150cbc64ba41bdc

v1: sysctl: move umh and keys sysctls

If you look at kernel/sysctl.c there are two sysctl arrays which are declared in header files but registered with no good reason now on kernel/sysctl.c instead of the place they belong. So just do the registration where it belongs.

v2: multiblock allocator improvements

So this patch was intended to remove a dead if-condition but it was not actually dead code and removing it was causing a performance regression. Unfortunately I somehow missed that when I was reviewing his patchset and it already went in so I had to revert the commit. I’ve added details of the regression and root cause in the revert commit. Also attaching the performance numbers I observer:

v1: fs/buffer: using __bio_add_page in submit_bh_wbc()

In submit_bh_wbc(), bio is newly allocated, so it does not need any merging logic.

And using bio_add_page here will execute ‘bio_flagged( bio, BIO_CLONED)’ and ‘bio_full’ twice, which is unnecessary.

v1: FUSE: dev: Change the posiion of spin_lock

just list_del need spin_lock ,so the spin_lock should be close to “list_del(&req->list)”, this may add a little benefit.

v1: Null check to prevent null-ptr-deref bug

sb_getblk(inode->i_sb, parent) return a null ptr and taking lock on that leads to the null-ptr-deref bug.

网络设备

v5: net-next: net: flower: add cfm support

The first patch adds cfm support to the flow dissector. The second adds the flower classifier support. The third adds a selftest for the flower cfm functionality.

iproute2 changes will come in follow up patches.

v4: vsock: MSG_ZEROCOPY flag support

Difference with copy way is not significant. During packet allocation, non-linear skb is created and filled with pinned user pages. There are also some updates for vhost and guest parts of transport - in both cases i’ve added handling of non-linear skb for virtio part. vhost copies data from such skb to the guest’s rx virtio buffers. In the guest, virtio transport fills tx virtio queue with pages from skb.

v1: Add support for sam9x7 SoC family

This patch series adds support for the new SoC family - sam9x7.

  • The device tree, configs and drivers are added
  • Clock driver for sam9x7 is added
  • Support for basic peripherals is added

v1: RDMA/siw: Fabricate a GID on tun and loopback devices

LOOPBACK and NONE (tunnel) devices have all-zero MAC addresses. Currently, siw_device_create() falls back to copying the IB device’s name in those cases, because an all-zero MAC address breaks the RDMA core address resolution mechanism.

v1: net-next: Move KSZ9477 errata handling to PHY driver

Patches to move handling for KSZ9477 PHY errata register fixes from the DSA switch driver into the corresponding PHY driver, for more proper layering and ordering.

v1: net: dsa: realtek: rtl8365mb: add missing case for digital interface 0

when bringing up the switch on a Netgear WNDAP660, I observed that no traffic got passed from the RTL8363 to the ethernet interface…

Turns out, this was because the dropped case for RTL8365MB_DIGITAL_INTERFACE_SELECT_REG(0) that got deleted by accident.

v10: vfio: pds_vfio driver

This is a patchset for a new vendor specific VFIO driver (pds_vfio) for use with the AMD/Pensando Distributed Services Card (DSC). This driver makes use of the pds_core driver.

v1: RDMA/core: Handle ARPHRD_NONE devices

We would like to enable the use of siw on top of a VPN that is constructed and managed via a tun device. That hasn’t worked up until now because ARPHRD_NONE devices (such as tun devices) have no GID for the RDMA/core to look up.

v1: net: dsa: realtek: rtl8365mb: use mdio passthrough to access PHYs

when bringing up the PHYs on a Netgear WNDAP660, I observed that none of the PHYs are getting enumerated and the rtl8365mb fails to load.

v1: net: rfs: annotate lockless accesses

rfs runs without locks held, so we should annotate read and writes to shared variables.

v5: net-next: net: ioctl: Use kernel memory on protocol ioctl callbacks

Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers.

v1: iproute2: ipaddress: accept symbolic names

The function rtnl_addproto_a2n() was defined but never used. Use it to allow for symbolic names, and fix the function signatures so protocol value is consistently __u8.

v1: net-next: complete Lynx mdio device handling

This series completes the mdio device lifetime handling for Lynx PCS users which do not create their own mdio device, but instead fetch it using a firmware description - namely the DPAA2 and FMAN_MEMAC drivers.

v1: net: net/sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values

We got multiple syzbot reports, all duplicates of the following [1]

syzbot managed to install fq_pie with a zero TCA_FQ_PIE_QUANTUM, thus triggering infinite loops.

[PATCH RESEND net-next 0/5] Improve the taprio qdisc’s relationship with its children

[ Original patch set was lost due to an apparent transient problem with kernel.org’s DNSBL setup. This is an identical resend. ]

Prompted by Vinicius’ request to consolidate some child Qdisc dereferences in taprio: https://lore.kernel.org/netdev/87edmxv7x2.fsf@intel.com/

v1: net: enetc: correct the statistics of rx bytes

The purpose of this patch set is to fix the issue of rx bytes statistics. The first patch corrects the rx bytes statistics of normal kernel protocol stack path, and the second patch is used to correct the rx bytes statistics of XDP.

v1: net-next: ipv6: lower “link become ready”’s level message

This following message is printed in the console each time a network device configured with an IPv6 addresses is ready to be used:

v5: net-next: sock: Improve condition on sockmem pressure

Currently the memcg’s status is also accounted into the socket’s memory pressure to alleviate the memcg’s memstall. But there are still cases that can be improved. Please check the patches for detailed info.

v2: bpf-next: bpf, x86: allow function arguments up to 14 for TRACING

For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used on the kernel functions whose arguments count less than 6. This is not friendly at all, as too many functions have arguments count more than 6.

Therefore, let’s enhance it by increasing the function arguments count allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.

v4: Introduce a vringh accessor for IO memory

Vringh is a host-side implementation of virtio rings, and supports the vring located on three kinds of memories, userspace, kernel space and a space translated iotlb.

v1: net-next: tools: ynl-gen: dust off the user space code

Every now and then I wish I finished the user space part of the netlink specs, Python scripts kind of stole the show but C is useful for selftests and stuff which needs to be fast. Recently someone asked me how to access devlink and ethtool from C++ which pushed me over the edge.

v6: net-next: net: dsa: mv88e6xxx: implement USXGMII mode for mv88e6393x

Changes from previous version:

  • use phylink_decode_usxgmii_word() to decode USXGMII link state
  • use existing include/uapi/linux/mdio.h defines when parsing status bits

v6: net-next: Brcm ASP 2.0 Ethernet Controller

Add support for the Broadcom ASP 2.0 Ethernet controller which is first introduced with 72165.

v1: net: dsa: qca8k: add CONFIG_LEDS_TRIGGERS dependency

There is a mix of ‘depends on’ and ‘select’ for LEDS_TRIGGERS, so it’s not clear what we should use here, but in general using ‘depends on’ causes fewer problems, so use that.

v1: net: tcp: gso: really support BIG TCP

oldlen name is a bit misleading, as it is the contribution of skb->len on the input skb TCP checksum. I added a comment to clarify this point.

v3: net/sctp: Make sha1 as default algorithm if fips is enabled

MD5 is not FIPS compliant. But still md5 was used as the default algorithm for sctp if fips was enabled. Due to this, listen() system call in ltp tests was failing for sctp in fips environment, with below error message.

**[GIT PULL: Networking for v6.4-rc5](http://lore.kernel.org/netdev/20230601180906.238637-1-

Additional napi fields such as PID association for napi thread etc. can be supported in a follow-on patch set.

This series only supports ‘get’ ability for retrieving napi fields (specifically, napi ids and queue[s]). The ‘set’ ability for setting queue[s] associated with a napi instance via netdev-genl will be submitted as a separate patch series.

安全增强

v5: checkpatch: Check for 0-length and 1-element arrays

Fake flexible arrays have been deprecated since last millennium. Proper C99 flexible arrays must be used throughout the kernel so CONFIG_FORTIFY_SOURCE and CONFIG_UBSAN_BOUNDS can provide proper array bounds checking.

Fixed-by: Joe Perches joe@perches.com

v1: s390/purgatory: Do not use fortified string functions

This means that the memcpy() calls with “buf” as a destination in sha256.c’s code will attempt to perform run-time bounds checking, which could lead to calling missing functions, specifically a potential WARN_ONCE, which isn’t callable from purgatory.

v1: x86/purgatory: Do not use fortified string functions

This means that the memcpy() calls with “buf” as a destination in sha256.c’s code will attempt to perform run-time bounds checking, which could lead to calling missing functions, specifically a potential WARN_ONCE, which isn’t callable from purgatory.

v1: next: firewire: Replace zero-length array with flexible-array member

Zero-length and one-element arrays are deprecated, and we are moving towards adopting C99 flexible-array members, instead.

v1: next: drm/amdgpu/discovery: Replace fake flex-arrays with flexible-array members

Zero-length and one-element arrays are deprecated, and we are moving towards adopting C99 flexible-array members, instead.

Use the DECLARE_FLEX_ARRAY() helper macro to transform zero-length arrays in a union into flexible-array members. And replace a one-element array with a C99 flexible-array member.

Rust For Linux

v1: add abstractions for network device drivers

This patchset adds minimum abstractions for network device drivers and Rust dummy network device driver, a simpler version of drivers/net/dummy.c.

The dummy network device driver doesn’t attach any bus such as PCI so the dependency is minimum. Hopefully, it would make reviewing easier.

v2: Rust scatterlist abstractions

This is a version of scatterlist abstractions for Rust drivers.

Scatterlist is used for efficient management of memory buffers, which is essential for many kernel-level operations such as Direct Memory Access (DMA) transfers and crypto APIs.

v2: rust: workqueue: add bindings for the workqueue

This patchset contains bindings for the kernel workqueue.

One of the primary goals behind the design used in this patch is that we must support embedding the work_struct as a field in user-provided types, because this allows you to submit things to the workqueue without having to allocate, making the submission infallible. If we didn’t have to support this, then the patch would be much simpler. One of the main things that make it complicated is that we must ensure that the function pointer in the work_struct is compatible with the struct it is contained within.

v1: rust: error: integrate Rust error type with errname

This integrates the Error type with the errname by making it accessible via the name method or via the Debug trait.

BPF

v11: evm: Do HMAC of multiple per LSM xattrs for new inodes

One of the major goals of LSM stacking is to run multiple LSMs side by side without interfering with each other. The ultimate decision will depend on individual LSM decision.

[PATCH RESEND bpf-next 00/18] BPF token

Resending with trimmed CC list because original version didn’t make it to the mailing list.

This patch set introduces new BPF object, BPF token, which allows to delegate a subset of BPF functionality from privileged system-wide daemon (e.g., systemd or any other container manager) to a trusted unprivileged application. Trust is the key here. This functionality is not about allowing unconditional unprivileged BPF usage. Establishing trust, though, is completely up to the discretion of respective privileged application that would create a BPF token.

v1: selftests/bpf: Add missing selftests kconfig options

Our selftests of course rely on the kernel being built with CONFIG_DEBUG_INFO_BTF=y, though this (nor its dependencies of CONFIG_DEBUG_INFO=y and CONFIG_DEBUG_INFO_DWARF4=y) are not specified. This causes the wrong kernel to be built, and selftests to similarly fail to build.

v10: vhost: virtio core prepares for AF_XDP

Now, virtio may can not work with DMA APIs when virtio features do not have VIRTIO_F_ACCESS_PLATFORM.

  1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just work with the “real” devices.
  2. I tried to let xsk support callballs to get phy address from virtio-net driver as the dma address. But the maintainers of xsk may want to use dma-buf to replace the DMA APIs. I think that may be a larger effort. We will wait too long.

v1: bpf-next: bpf: Support ->fill_link_info for kprobe prog

Currently, it is not easy to determine which functions are probed by a kprobe_multi program. This patchset supports ->fill_link_info for it, allowing the user to easily obtain the probed functions.

Although the user can retrieve the functions probed by a perf_event program using bpftool perf show, it would be beneficial to also support ->fill_link_info. This way, the user can obtain it in the same manner as other bpf links.

v2: bpf-next: bpf_refcount followups (part 1)

This series is the first of two (or more) followups to address issues in the bpf_refcount shared ownership implementation discovered by Kumar. Specifically, this series addresses the “bpf_refcount_acquire on non-owning ref in another tree” scenario described in [0], and does not address issues raised in [1]. Further followups will address the other issues.

v2: bpf-next: bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo

Currently we observed a significant performance degradation in samples/bpf xdp1 and xdp2, due XDP multibuffer “xdp.frags” handling, added in commit 772251742262 (“samples/bpf: fixup some tools to be able to support xdp multibuffer”).

v1: bpf-next: bpf: getsockopt hook to get optval without checking kernel retval

Remove the judgment on retval and pass bpf ctx by default. The advantage of this is that it is more flexible. Bpf getsockopt can support the new optname without using the module to call the nf_register_sockopt to register.

v1: bpf-next: bpf: support BTF kind metadata to separate

BTF kind metadata provides information to parse BTF kinds. By separating parsing BTF from using all the information it provides, we allow BTF to encode new features even if they cannot be used. This is helpful in particular for cases where newer tools for BTF generation run on an older kernel; BTF kinds may be present that the kernel cannot yet use, but at least it can parse the BTF provided. Meanwhile userspace tools with newer libbpf may be able to use the newer information.

v1: net: ice: recycle/free all of the fragments from multi-buffer frame

The ice driver caches next_to_clean value at the beginning of ice_clean_rx_irq() in order to remember the first buffer that has to be freed/recycled after main Rx processing loop. The end boundary is indicated by first descriptor of frame that Rx processing loop has ended its duties. Note that if mentioned loop ended in the middle of gathering multi-buffer frame, next_to_clean would be pointing to the descriptor in the middle of the frame BUT freeing/recycling stage will stop at the first descriptor. This means that next iteration of ice_clean_rx_irq() will miss the (first_desc, next_to_clean - 1) entries.

v1: bpf/tests: Use struct_size()

Use struct_size() instead of hand writing it. This is less verbose and more informative.

v1: net: bpf, sockmap: avoid potential NULL dereference in sk_psock_verdict_data_ready()

syzbot found sk_psock(sk) could return NULL when called from sk_psock_verdict_data_ready().

Just make sure to handle this case.

v2: bpf-next: verify scalar ids mapping in regsafe()

To represent this set I use a u32_hashset data structure derived from tools/lib/bpf/hashmap.h. I tested it locally (see [1]), but I think that ideally it should be tested using KUnit. However, AFAIK, this would be the first use of KUnit in context of BPF verifier. If people are ok with this, I will prepare the tests and necessary CI integration.

v1: bpf-next: samples/bpf: xdp1 and xdp2 reduce XDPBUFSIZE to 60

Default samples/pktgen scripts send 60 byte packets as hardware adds 4-bytes FCS checksum, which fulfils minimum Ethernet 64 bytes frame size.

XDP layer will not necessary have access to the 4-bytes FCS checksum.

v2: bpf-next: xsk: multi-buffer support

This series of patches add multi-buffer support for AF_XDP. XDP and various NIC drivers already have support for multi-buffer packets. With this patch set, programs using AF_XDP sockets can now also receive and transmit multi-buffer packets both in copy as well as zero-copy mode. ZC multi-buffer implementation is based on ice driver.

v1: net: tcp: introduce a compack timer handler in sack compression

We’ve got some issues when sending a compressed ack is deferred to release phrase due to the socket owned by another user:

  1. a compressed ack would not be sent because of lack of ICSK_ACK_TIMER flag.
  2. the tp->compressed_ack counter should be decremented by 1.
  3. we cannot pass timeout check and reset the delack timer in tcp_delack_timer_handler().
  4. we are not supposed to increment the LINUX_MIB_DELAYEDACKS counter. …

v1: bpf-next: multi-buffer support for XDP_REDIRECT samples

This series adds multi-buffer support for two XDP_REDIRECT sample programs. It follows the pattern from xdp1 and xdp2.

v2: net-next: support non-frag page for page_pool_alloc_frag()

In [1] & [2], there are usecases for veth and virtio_net to use frag support in page pool to reduce memory usage, and it may request different frag size depending on the head/tail room space for xdp_frame/shinfo and mtu/packet size. When the requested frag size is large enough that a single page can not be split into more than one frag, using frag support only have performance penalty because of the extra frag count handling for frag support.

v1: bpf-next: bpf: Support ->show_fdinfo and ->fill_link_info for kprobe prog

Currently, it is not easy to determine which functions are probed by a kprobe_multi program. This patchset supports ->show_fdinfo and ->fill_link_info for it, allowing the user to easily obtain the probed functions.

周边技术动态

Qemu

RFC: target/riscv: Add support for Zacas extension

The Zacas[1] extension is a proposed unprivileged ISA extension for adding support for atomic compare-and-swap. Since this extension is not yet frozen (although no significant changes are expected) these patches are RFC/informational.

v2: linux-user/riscv: Add syscall riscv_hwprobe

This patch adds the new syscall for the “RISC-V Hardware Probing Interface” (https://docs.kernel.org/riscv/hwprobe.html).

v7: hw/riscv/virt: pflash improvements

This series improves the pflash usage in RISC-V virt machine with solutions to below issues.

1) Currently the first pflash is reserved for ROM/M-mode firmware code. But S-mode payload firmware like EDK2 need both pflash devices to have separate code and variable store so that OS distros can keep the FW code as read-only.

v1: disas/riscv: Add vendor extension support

This series adds vendor extension support to the QEMU disassembler for RISC-V. The following vendor extensions are covered:

  • XThead{Ba,Bb,Bs,Cmo,CondMov,FMemIdx,Fmv,Mac,MemIdx,MemPair,Sync}
  • XVentanaCondOps

Buildroot

package/openjdk{-bin}: security bump versions to 11.0.19+7 and 17.0.7+7

For details, see the announcements: https://mail.openjdk.org/pipermail/jdk-updates-dev/2023-April/021899.html https://mail.openjdk.org/pipermail/jdk-updates-dev/2023-April/021900.html

U-Boot

v4: SPL NVMe support

This patchset adds support to load images of the SPL’s next booting stage from a NVMe device.

v1: riscv: JH7110: move pll clocks to their own device node (Was: The latest U-boot…) visionfive2 1.3B board

Here is the revert, along with a work in progress attempt to make the DT match the hardware. Conor had asked me to share it, regardless of its early stage. It compiles, and boots Linux kernels, but there is no PLL driver I can find currently. So clocks are still hanging in PROBE_DEFER.



Read Album:

Read Related:

Read Latest: