RISC-V Linux 内核及周边技术动态第 51 期

呀呀呀创作于 2023/06/28

时间：20230625
编辑：晓依
仓库：RISC-V Linux 内核技术调研活动
赞助：PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v1: Allwinner R329/D1/R528/T113s Dual/Quad SPI modes support

This series extends the previous https://lore.kernel.org/all/20230510081121.3463710-1-bigunclemax@gmail.com And adds support for Dual and Quad SPI modes for the listed SoCs. Both modes have been tested on the T113s and should work on other Allwinner’s SoCs that have a similar SPI conttoller. It may also work for previous SoCs that support Dual/Quad modes. One of them are H6 and H616.

v1: Add support to handle misaligned accesses in S-mode

Since commit 61cadb9 (“Provide new description of misaligned load/store behavior compatible with privileged architecture.”) in the RISC-V ISA manual, it is stated that misaligned load/store might not be supported. However, the RISC-V kernel uABI describes that misaligned accesses are supported. In order to support that, this series adds support for S-mode handling of misaligned accesses, SBI call for misaligned trap delegation as well prctl support for PR_SET_UNALIGN.

v1: riscv: Select HAVE_ARCH_USERFAULTFD_MINOR

This allocates the VM flag needed to support the userfaultfd minor fault functionality. Because the flag bit is >= bit 32, it can only be enabled for 64-bit kernels. See commit 7677f7fd8be7 (“userfaultfd: add minor fault registration mode”) for more information.

v2: Add support for Allwinner PWM on D1/T113s/R329 SoCs

This series adds support for PWM controller on new Allwinner’s SoCs, such as D1, T113s and R329. The implemented driver provides basic functionality for control PWM channels.

v5: Risc-V Svinval support

This patch adds support for the Svinval extension as defined in the Risc V Privileged specification.

v4: RISCV: Add KVM_GET_REG_LIST API

KVM_GET_REG_LIST will dump all register IDs that are available to KVM_GET/SET_ONE_REG and It’s very useful to identify some platform regression issue during VM migration.

v2: RISC-V: T-Head vector handling

As is widely known the T-Head C9xx cores used for example in the Allwinner D1 implement an older non-ratified variant of the vector spec.
While userspace will probably have a lot more problems implementing support for both, on the kernel side the needed changes are actually somewhat small’ish and can be handled via alternatives somewhat nicely.

v5: Split ptdesc from struct page

The MM subsystem is trying to shrink struct page. This patchset introduces a memory descriptor for page table tracking - struct ptdesc.
This patchset introduces ptdesc, splits ptdesc from struct page, and converts many callers of page table constructor/destructors to use ptdescs.

v1: riscv: Discard vector state on syscalls

The RISC-V vector specification states:Executing a system call causes all caller-saved vector registers(v0-v31, vl, vtype) and vstart to become unspecified.

GIT PULL: KVM/riscv changes for 6.5

We have the following KVM RISC-V changes for 6.5: 1) Redirect AMO load/store misaligned traps to KVM guest 2) Trap-n-emulate AIA in-kernel irqchip for KVM guest 3) Svnapot support for KVM Guest

Patch “riscv: Link with ‘-z norelro’” has been added to the 6.3-stable tree

This is a note to let you know that I’ve just added the patch titled
riscv: Link with '-z norelro'
to the 6.3-stable tree which can be found at:http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:riscv-link-with-z-norelro.patch and it can be found in the queue-6.3 subdirectory.

v1: RISC-V: make ARCH_THEAD preclude XIP_KERNEL

Randy reported build errors in linux-next where XIP_KERNEL was enabled. ARCH_THEAD requires alternatives to support the non-standard ISA extensions used by the THEAD cores, which are mutually exclusive with XIP kernels. Clone the dependency list from the Allwinner entry, since Allwinner’s D1 uses T-Head cores with the same non-standard extensions.

v1: 6.3: riscv: Link with ‘-z norelro’

This patch fixes a stable only patch, so it has no direct upstream equivalent.
After a stable only patch to explicitly handle the ‘.got’ section to handle an orphan section warning from the linker, certain configurations error when linking with ld.lld, which enables relro by default:
ld.lld: error: section: .got is not contiguous with other relro sections

GIT PULL: RISC-V Devicetrees for v6.5 Part 2

Please pull a second part, if it is not too late for v6.5. This lot is based on top of v6.4-rc2, because Randy & Linus did a rejig of the MAINTAINERS file. As a result, the diff below includes what was in the previous PR. Wasn’t sure if there was a request-pull incantation to exclude what was in PR #1 (I guess I’d have to do a local merge of my first PR & then use that as the base for the request-pull command?)

v3: RISC-V: Document that V registers are clobbered on syscalls

This is included in the ISA manual, but it’s pretty common for bits of the ISA manual that are actually ABI to change. So let’s document it explicitly.

v8: Add support for Allwinner GPADC on D1/T113s/R329/T507 SoCs

This series adds support for general purpose ADC (GPADC) on new Allwinner’s SoCs, such as D1, T113s, T507 and R329. The implemented driver provides basic functionality for getting ADC channels data.

v4: tools/nolibc: add a new syscall helper

Thanks very much for your kindly review.
This is the revision of v3 “tools/nolibc: add a new syscall helper” [1], this mainly applies the suggestion from David in this reply [2] and rebased everything on the dev.2023.06.14a branch of linux-rcu [3].

v5: nolibc: add part2 of support for rv32

This is the revision of the v4 part2 of support for rv32 [1], this further split the generic KARCH code out of the old rv32 compile patch and also add kernel specific KARCH and nolibc specific NARCH for tools/include/nolibc/Makefile too.
This is rebased on the dev.2023.06.14a branch of linux-rcu repo [2] with basic run-user and run tests.

v7: Add JH7110 USB PHY driver support

This patchset adds USB and PCIe PHY for the StarFive JH7110 SoC. The patch has been tested on the VisionFive 2 board.

v3: Add initialization of clock for StarFive JH7110 SoC

This patchset adds initial rudimentary support for the StarFive Quad SPI controller driver. And this driver will be used in StarFive’s VisionFive 2 board. In 6.4, the QSPI_AHB and QSPI_APB clocks changed from the default ON state to the default OFF state, so these clocks need to be enabled in the driver.At the same time, dts patch is added to this series.

v1: kdump: add generic functions to simplify crashkernel crashkernel in architecture

In the current arm64, crashkernel=,high support has been finished after several rounds of posting and careful reviewing. The code in arm64 which parses crashkernel kernel parameters firstly, then reserve memory can be a good example for other ARCH to refer to.

v1: riscv: dts: sort makefile entries by directory

New additions to the list have tried to respect alphanumeric ordering, but the thing was out of order to start with. Sort it.

v3: Add Sipeed Lichee Pi 4A RISC-V board support

Sipeed’s Lichee Pi 4A development board uses Lichee Module 4A core module which is powered by T-HEAD’s TH1520 SoC. Add minimal device tree files for the core module and the development board.

进程调度

v1: Sched/fair: Block nohz tick_stop when cfs bandwidth in use

CFS bandwidth limits and NOHZ full don’t play well together. Tasks can easily run well past their quotas before a remote tick does accounting. This leads to long, multi-period stalls before such tasks can run again. Currentlyi, when presented with these conflicting requirements the scheduler is favoring nohz_full and letting the tick be stopped. However, nohz tick stopping is already best-effort, there are a number of conditions that can prevent it, whereas cfs runtime bandwidth is expected to be enforced.

v3: sched/isolation: add a workqueue parameter onto isolcpus to constrain unbound CPUs

Motivation of doing this is to better improve boot times for devices when we want to prevent our workqueue works from running on some specific CPUs, i,e, some CPUs are busy with interrupts.

v2: sched/cputime: Make IRQ time accounting configurable at boot time

IRQ time accounting reduces performance by 40% for some block storage workloads on Android. Despite this some producers of Android devices want to keep IRQ time accounting enabled.

内存管理

回复: v1: mm: vmscan: export func:shrink_slab

On 16.06.23 11:21, lipeifeng@oppo.com wrote:
Some of shrinkers during shrink_slab would enter synchronous-wait due to lock or other reasons, which would causes kswapd or direct_reclaim to be blocked.
This patch export shrink_slab so that it can be called in drivers which can shrink memory independently.

v1: memblock: report failures when memblock_can_resize is not set

The callers of memblock_reserve() do not check the return value presuming that memblock_reserve() always succeeds, but there are cases where it may fail.
Having numerous memblock reservations at early boot where memblock_can_resize is unset may exhaust the INIT_MEMBLOCK_REGIONS sized memblock.reserved regions array and an attempt to double this array via memblock_double_array() will fail and will return -1 to the caller.

v1: memblock: Introduce memblock_reserve_node()

It only returns address now in memblock_find_in_range_node(), we can add a parameter pointing to integer for node id of the range, which can be used to pass the node id to the new reserve region.

v2: seqlock,mm: lockdep annotation + write_seqlock_irqsave()

this has been a single patch (2/2) but then it was pointed out that the lockdep annotation in seqlock needs to be adjusted to fully close the printk window so that there is no printing after the seq-lock has been acquired and before printk_deferred_enter() takes effect.

v2: Improve hugetlbfs read on HWPOISON hugepages

Today when hardware memory is corrupted in a hugetlb hugepage, kernel leaves the hugepage in pagecache [1]; otherwise future mmap or read will suject to silent data corruption. This is implemented by returning -EIO from hugetlb_read_iter immediately if the hugepage has HWPOISON flag set.

v2: elf: correct note name comment

NT_PRFPREG note is named “CORE”. Correct the comment accordingly.

v1: zsmalloc: small compaction improvements

A tiny series that can reduce the number of find_alloced_obj() invocations (which perform a linear scan of sub-page) during compaction. Inspired by Alexey Romanov's findings.

v1: Transparent Contiguous PTEs for User Mappings

This is a series to opportunistically and transparently use contpte mappings (set the contiguous bit in ptes) for user memory when those mappings meet the requirements. It is part of a wider effort to improve performance of the 4K kernel with the aim of approaching the performance of the 16K kernel, but without breaking compatibility and without the associated increase in memory. It also benefits the 16K and 64K kernels by enabling 2M THP, since this is the contpte size for those kernels.

v1: use refcount+RCU method to implement lockless slab shrink

We used to implement the lockless slab shrink with SRCU [1], but then kernel test robot reported -88.8% regression in stress-ng.ramfs.ops_per_sec test case [2], so we reverted it [3].
This patch series aims to re-implement the lockless slab shrink using the refcount+RCU method proposed by Dave Chinner [4].

v1: udmabuf: Add back support for mapping hugetlb pages

The first patch ensures that the mappings needed for handling mmap operation would be managed by using the pfn instead of struct page. The second patch restores support for mapping hugetlb pages where subpages of a hugepage are not directly used anymore (main reason for revert) and instead the hugetlb pages and the relevant offsets are used to populate the scatterlist for dma-buf export and for mmap operation.

v1: RESEND: elf: correct note name comment

Only the NT_PRFPREG note is named “LINUX”. Correct the comment accordingly.

v2: mm: working set reporting

RFC v1: https://lore.kernel.org/linux-mm/20230509185419.1088297-1-yuanchu@google.com/ For background and interfaces, see the RFC v1 posting.

v1: mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().

__build_all_zonelists() acquires zonelist_update_seq by first disabling interrupts via local_irq_save() and then acquiring the seqlock with write_seqlock(). This is troublesome and leads to problems on PREEMPT_RT because the inner spinlock_t is now acquired with disabled interrupts. The API provides write_seqlock_irqsave() which does the right thing in one step. printk_deferred_enter() has to be invoked in non-migrate-able context to ensure that deferred printing is enabled and disabled on the same CPU. This is the case after zonelist_update_seq has been acquired.

v3: mm/min_free_kbytes: modify min_free_kbytes calculation rules

The current calculation of min_free_kbytes only uses ZONE_DMA and ZONE_NORMAL pages,but the ZONE_MOVABLE zone->_watermark[WMARK_MIN] will also divide part of min_free_kbytes.This will cause the min watermark of ZONE_NORMAL to be too small in the presence of ZONE_MOVEABLE.

v1: mm: page_alloc: use the correct type of list for free pages

Commit bf75f200569d (“mm/page_alloc: add page->buddy_list and page->pcp_list”) introduces page->buddy_list and page->pcp_list as a union with page->lru, but missed to change get_page_from_free_area() to use page->buddy_list to clarify the correct type of list for a free page.

文件系统

v1: proc: proc_setattr for /proc/$PID/net

/proc/$PID/net currently allows the setting of file attributes, in contrast to other /proc/$PID/ files and directories.
This would break the nolibc testsuite so the first patch in the series removes the offending testcase. The “fix” for nolibc-test is intentionally kept trivial as the series will most likely go through the filesystem tree and if conflicts arise, it is obvious on how to resolve them.

v1: pipe: Make a partially-satisfied blocking read wait for more

Can you consider merging something like the attached patch? Unfortunately, there are applications out there that depend on a read from pipe() waiting until the buffer is full under some circumstances. Patch a28c8b9db8a1 removed the conditionality on there being an attached writer.

GIT PULL: vfs: mount

/* Summary */ This contains the work to extend move_mount() to allow adding a mount beneath the topmost mount of a mount stack.
There are two LWN articles about this. One covers the original patch series in [1]. The other in [2] summarizes the session and roughly the discussion between Al and me at LSFMM. The second article also goes into some good questions from attendees.

GIT PULL: vfs: file

/* Summary */ This contains Amir’s work to fix a long-standing problem where an unprivileged overlayfs mount can be used to avoid fanotify permission events that were requested for an inode or superblock on the underlying filesystem.

GIT PULL: vfs: rename

/* Summary */ This contains the work from Jan to fix problems with cross-directory renames originally reported in [1].
To quickly sum it up some filesystems (so far we know at least about ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to lock the directory when it is being renamed into another directory.

GIT PULL: vfs: misc

Use mode 0600 for file created by cachefilesd so it can be run by unprivileged users. This aligns them with directories which are already created with mode 0700 by cachefilesd.
Reorder a few members in struct file to prevent some false sharing scenarios.
Indicate that an eventfd is used a semaphore in the eventfd’s fdinfo procfs file.
Add a missing uapi header for eventfd exposing relevant uapi defines.
Let the VFS protect transitions of a superblock from read-only to read-write in addition to the protection it already provides for transitions from read-write to read-only. Protecting read-only to read-write transitions allows filesystems such as ext4 to perform internal writes, keeping writers away until the transition is completed.

GIT PULL: fs: ntfs

/* Summary */ This contains a pile of various smaller fixes for ntfs. There’s really not a lot to say about them. I’m just the messenger, so this is an unusually short pull request.
/* Testing */ clang: Ubuntu clang version 15.0.7
All patches are based on v6.4-rc2 and have been sitting in linux-next. No build failures or warnings were observed.

v1: fcntl.2: document F_UNLCK F_OFD_GETLK extension

F_UNLCK has the special meaning when used as a lock type on input. It returns the information about any lock found in the specified region on that particular file descriptor. Locks on other file descriptors are ignored by F_UNLCK.

v3: F_OFD_GETLK extension to read lock info

This extension allows to use F_UNLCK on query, which currently returns EINVAL. Instead it can be used to query the locks on a particular fd - something that is not currently possible. The basic idea is that on F_OFD_GETLK, F_UNLCK would “conflict” with (or query) any types of the lock on the same fd, and ignore any locks on other fds.

v1: iomap regression for aio dio 4k writes

There has been a standing performance regression involving AIO DIO 4k-aligned writes on ext4 backed by a fast local SSD since the switch to iomap. I think it was originally reported and investigated in this thread: https://lore.kernel.org/all/87lf7rkffv.fsf@collabora.com/

v1: minimum folio order support in filemap

There has been a lot of discussion recently to support devices and fs for bs > ps. One of the main plumbing to support buffered IO is to have a minimum order while allocating folios in the page cache.
Hannes sent recently a series[1] where he deduces the minimum folio order based on the i_blkbits in struct inode. This takes a different approach based on the discussion in that thread where the minimum and maximum folio order can be set individually per inode.

v20: Implement IOCTL to get and optionally clear info about PTEs

This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code.

v4: Add support for Vendor Defined Error Types in Einj Module

This patchset adds support for Vendor Defined Error types in the einj module by exporting a binary blob file in module’s debugfs directory. Userspace tools can write OEM Defined Structures into the blob file as part of injecting Vendor defined errors.

v1: next: readdir: Replace one-element arrays with flexible-array members

One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element arrays with flexible-array members in multiple structures.

v1: Support negative dentry cache for FUSE and virtiofs

This patch series adds a new mount option called negative_dentry_timeout for FUSE and virtio-fs filesystems. This option allows the kernel to cache negative dentries, which are dentries that represent a non-existent file. When this option is enabled, the kernel will skip FUSE_LOOKUP requests for second and subsequent lookups to a non-existent file.

v1: ovl: reserve ability to reconfigure mount options with new mount api

We don’t need to carry this issue into the new mount api port. Similar to FUSE we can use the fs_context::oldapi member to figure out that this is a request coming through the legacy mount api. If we detect it we continue silently ignoring all mount options.

v1: RFC: F_OFD_GETLK should provide more info

This patch-set implements 2 small extensions to the current F_OFD_GETLK, allowing it to gather more information than it currently returns.
First extension allows to use F_UNLCK on query, which currently returns EINVAL. Instead it can be used to query the locks on a particular fd - something that is not currently possible. The basic idea is that on F_OFD_GETLK, F_UNLCK would “conflict” with (or query) any types of the lock on the same fd, and ignore any locks on other fds.

v2: fs: Provide helpers for manipulating sb->s_readonly_remount

Provide helpers to set and clear sb->s_readonly_remount including appropriate memory barriers. Also use this opportunity to document what the barriers pair with and why they are needed.

v1: blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion. Earlier, polling task used to sleep, relying on interrupt to wake it up. This made some IO take very long when interrupt-coalescing is enabled in NVMe.

网络设备

v2: net-next: net: dsa: vsc73xx: Make vsc73xx usable

This patch series is focused on getting vsc73xx usable.
First patch was added in v2, it’s switch from poll loop to read_poll_timeout.
Second patch is simple convert to phylink, because adjust_link won’t work anymore.

tc.8: some remarks and a patch for the manual

Mark a full stop (.) with “\&”, if it does not mean an end of a sentence. This is a preventive action, the paragraph could be reshaped, e.g., after changes.
When typing, one does not always notice when the line wraps after the period. There are too many examples of input lines in manual pages, that end with an abbreviation point.

v2: net-next: Support offload LED blinking to PHY.

Allow offloading of the LED trigger netdev to PHY drivers and implement it for the Marvell PHY driver. Additionally, correct the handling of when the initial state of the LED cannot be represented by the trigger, and so an error is returned.

v1: net: lan743x: Don’t sleep in atomic context

dev_set_rx_mode() grabs a spin_lock, and the lan743x implementation proceeds subsequently to go to sleep using readx_poll_timeout().
Introduce a helper wrapping the readx_poll_timeout_atomic() function and use it to replace the calls to readx_polL_timeout().

v1: use array_size

Use array_size to protect against multiplication overflows.
This follows up on the following patches by Kees Cook from 2018.
42bc47b35320 (“treewide: Use array_size() in vmalloc()”) fad953ce0b22 (“treewide: Use array_size() in vzalloc()”)

v2: Add support for sam9x7 SoC family

This patch series adds support for the new SoC family - sam9x7.
The device tree, configs and drivers are added
Clock driver for sam9x7 is added
Support for basic peripherals is added
Target board SAM9X75 Curiosity is added

v1: net-next: netlink: add display-hint to ynl

Add a display-hint property to the netlink schema, to be used by generic netlink clients as hints about how to display attribute values.
A display-hint on an attribute definition is intended for letting a client such as ynl know that, for example, a u32 should be rendered as an ipv4 address. The display-hint enumeration includes a small number of networking domain-specific value types.

v3: io_uring: Add io_uring command support for sockets

Date: Thu, 22 Jun 2023 14:59:14 -0700
Enable io_uring commands on network sockets. Create two new SOCKET_URING_OP commands that will operate on sockets.
In order to call ioctl on sockets, use the file_operations->io_uring_cmd callbacks, and map it to a uring socket function, which handles the SOCKET_URING_OP accordingly, and calls socket ioctls.

v1: net-next: dsa/88e6xxx/phylink changes after the next merge window

This patch series contains the minimum set of patches that I would like to get in for the following merge window.
The first four patches are laying the groundwork for converting the mv88e6xxx driver to use phylink PCS support. Patches 5 through 11 perform that conversion.

v2: net-next: net/tcp: optimise locking for blocking splice

Even when tcp_splice_read() reads all it was asked for, for blocking sockets it’ll release and immediately regrab the socket lock, loop around and break on the while check.
Check tss.len right after we adjust it, and return if we’re done. That saves us one release_sock(); lock_sock(); pair per successful blocking splice read.

[net-next PATCH RFC] net: dsa: qca8k: make learning configurable and keep off if standalone

Address learning should initially be turned off by the driver for port operation in standalone mode, then the DSA core handles changes to it via ds->ops->port_bridge_flags().
Currently this is not the case for qca8k where learning is enabled unconditionally in qca8k_setup for every user port.

v2: net-next: net: phy: C45-over-C22 access

[Sorry for the very late follow-up on this series, I simply haven’t had time to look into it. Should be better now.]
The goal here is to get the GYP215 and LAN8814 running on the Microchip LAN9668 SoC. The LAN9668 suppports one external bus and unfortunately, the LAN8814 has a bug which makes it impossible to use C45 on that bus. Fortunately, it was the intention of the GPY215 driver to be used on a C22 bus. But I think this could have never really worked, because the phy_get_c45_ids() will always do c45 accesses and thus gpy_probe() will fail.

安全增强

v1: next: openprom: Use struct_size() helper

Prefer struct_size() over open-coded versions.

v1: ACPI: APEI: Use ERST timeout for slow devices

Slow devices such as flash may not meet the default 1ms timeout value, so use the ERST max execution time value that they provide as the timeout if it is larger.

v1: pstore/ram: Add support for dynamically allocated ramoops memory regions

The reserved memory region for ramoops is assumed to be at a fixed and known location when read from the devicetree. This is not desirable in environments where it is preferred for the region to be dynamically allocated early during boot (i.e. the memory region is defined with the “alloc-ranges” property instead of the “reg” property).

v1: next: reiserfs: Replace one-element array with flexible-array member

One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element array with flexible-array member in direntry_uarea structure, and refactor the rest of the code, accordingly.

v1: next: ksmbd: Use struct_size() helper in ksmbd_negotiate_smb_dialect()

Prefer struct_size() over open-coded versions.

v1: next: smb: Replace one-element array with flexible-array member

One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element array with flexible-array member in struct smb_negotiate_req.
This results in no differences in binary output.

v1: next: scsi: smartpqi: Replace one-element arrays with flexible-array members

One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element arrays with flexible-array members in a couple of structures, and refactor the rest of the code, accordingly.
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy().
This results in no differences in binary output.

v1: scsi: Replace strlcpy with strscpy

This patch series replaces strlcpy in the scsi subsystem wherever trivial replacement is possible, i.e return value from strlcpy is unused. The patches themselves are independent of each other and are included as a series for ease of review.

v1: net: wwan: iosm: Convert single instance struct member to flexible array

Adjust the struct mux_adth definition and associated sizeof() math; no binary output differences are observed in the resulting object file.
Closes: https://lore.kernel.org/lkml/dbfa25f5-64c8-5574-4f5d-0151ba95d232@gmail.com/

v1: igc: Ignore AER reset when device is suspended

The issue is that the PTM requests are sending before driver resumes the device. Since the issue can also be observed on Windows, it’s quite likely a firmware/hardwar limitation.
So avoid resetting the device if it’s not resumed. Once the device is fully resumed, the device can work normally.

异步 IO

v1: liburing: Introduce ‘–use-libc’ option

This is an RFC patch series to introduce the ‘–use-libc’ option to the configure script.
Currently, when compiling liburing on x86, x86-64, and aarch64 architectures, the resulting binary lacks the linkage with the standard C library (libc).

v2: io_uring/net: disable partial retries for recvmsg with cmsg

We cannot sanely handle partial retries for recvmsg if we have cmsg attached. If we don’t, then we’d just be overwriting the initial cmsg header on retries. Alternatively we could increment and handle this appropriately, but it doesn’t seem worth the complication.

v2: io_uring/net: clear msg_controllen on partial sendmsg retry

If we have cmsg attached AND we transferred partial data at least, clear msg_controllen on retry so we don’t attempt to send that again.

Rust For Linux

v1: Rust device mapper abstractions

Additionally, there are some dummy codes used to wrap the block layer structs, i.e., bio and request, which seems being in the review process, so I just place it in the same file.

v1: rust: alloc: Add realloc and alloc_zeroed to the GlobalAlloc impl

While there are default impls for these methods, using the respective C api’s is faster. Currently neither the existing nor these new GlobalAlloc method implementations are actually called. Instead the _rust* function defined below the GlobalAlloc impl are used. With rustc 1.71 these functions will be gone and all allocation calls will go through the GlobalAlloc implementation.

BPF

v2: libbpf: kprobe.multi: Filter with available_filter_functions_addrs

When using regular expression matching with “kprobe multi”, it scans all the functions under “/proc/kallsyms” that can be matched. However, not all of them can be traced by kprobe.multi. If any one of the functions fails to be traced, it will result in the failure of all functions. The best approach is to filter out the functions that cannot be traced to ensure proper tracking of the functions.

v1: perf: Replace deprecated -target with –target= for Clang

-target has been deprecated since Clang 3.4 in 2013. Use the preferred –target=bpf form instead. This matches how we use –target= in scripts/Makefile.clang.

v2: bpf: Replace deprecated -target with –target= for Clang

-target has been deprecated since Clang 3.4 in 2013. Use the preferred –target=bpf form instead. This matches how we use –target= in scripts/Makefile.clang.

v4: lib/test_bpf: Call page_address() on page acquired with GFP_KERNEL flag

generate_test_data() acquires a page with alloc_page(GFP_KERNEL). The GFP_KERNEL is typical for kernel-internal allocations. The caller requires ZONE_NORMAL or a lower zone for direct access.
Therefore the page cannot come from ZONE_HIGHMEM. Thus there’s no need to map it with kmap().

v5: bpf-next: bpf: Support ->fill_link_info for kprobe_multi and perf_event links

This patchset enhances the usability of kprobe_multi program by introducing support for ->fill_link_info. This allows users to easily determine the probed functions associated with a kprobe_multi program. While bpftool perf show already provides information about functions probed by perf_event programs, supporting ->fill_link_info ensures consistent access to this information across all bpf links.

v4: Bring back vmlinux.h generation

Commit 760ebc45746b (“perf lock contention: Add empty ‘struct rq’ to satisfy libbpf ‘runqueue’ type verification”) inadvertently created a declaration of ‘struct rq’ that conflicted with a generated vmlinux.h’s:

[RFC v2 PATCH bpf-next 0/4] bpf: add percpu stats for bpf_map

This series adds a mechanism for maps to populate per-cpu counters of elements on insertions/deletions. The sum of these counters can be accessed by a new kfunc from a map iterator program.

v7: bpf-next: bpf, x86: allow function arguments up to 12 for TRACING

Therefore, let’s enhance it by increasing the function arguments count allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
In the 1st patch, we save/restore regs with BPF_DW size to make the code in save_regs()/restore_regs() simpler.
In the 2nd patch, we make arch_prepare_bpf_trampoline() support to copy function arguments in stack for x86 arch. Therefore, the maximum arguments can be up to MAX_BPF_FUNC_ARGS for FENTRY, FEXIT and MODIFY_RETURN. Meanwhile, we clean the potential garbage value when we copy the arguments on-stack.

v1: net-next: TSN auto negotiation between 1G and 2.5G

Intel platforms’ integrated Gigabit Ethernet controllers support 2.5Gbps mode statically using BIOS programming. In the current implementation, the BIOS menu provides an option to select between programs the Phase Lock Loop (PLL) registers. The BIOS also read the TSN lane registers from Flexible I/O Adapter (FIA) block and provided auto-negotiation between 10/100/1000Mbps and 2.5Gbps is not allowed.

v3: bpf-next: BPF token

This patch set introduces new BPF object, BPF token, which allows to delegate a subset of BPF functionality from privileged system-wide daemon (e.g., systemd or any other container manager) to a trusted unprivileged application. Trust is the key here. This functionality is not about allowing unconditional unprivileged BPF usage. Establishing trust, though, is completely up to the discretion of respective privileged application that would create a BPF token, as different production setups can and do achieve it through a combination of different means (signing, LSM, code reviews, etc), and it’s undesirable and infeasible for kernel to enforce any particular way of validating trustworthiness of particular process.

v2: bpf-next: bpf: Netdev TX metadata

Support passing metadata via XSK
Showcase how to consume this metadata at TX in the selftests
Sample untested mlx5 implementation
Simplify attach/detach story with simple global fentry (Alexei)
Add ‘return 0’ in xdp_metadata selftest (Willem)
Add missing ‘sizeof(*ip6h)’ in xdp_hw_metadata selftest (Willem)
Document ‘timestamp’ argument of kfunc (Simon)
Not relevant due to attach/detach rework:
s/devtx_sb/devtx_submit/ in netdev (Willem)
s/devtx_cp/devtx_complete/ in netdev (Willem)
Document ‘devtx_complete’ and ‘devtx_submit’ in netdev (Simon)
Add devtx_sb/devtx_cp forward declaration (Simon)
Add missing __rcu/rcu_dereference annotations (Simon)

v1: fs: new accessors for inode->i_ctime

I’ve been working on a patchset to change how the inode->i_ctime is accessed in order to give us conditional, high-res timestamps for the ctime and mtime. struct timespec64 has unused bits in it that we can use to implement this. In order to do that however, we need to wrap all accesses of inode->i_ctime to ensure that bits used as flags are appropriately handled.
This patchset first adds some new inode_ctime_* accessor functions. It then converts all in-tree accesses of inode->i_ctime to use those new functions and then renames the i_ctime field to __i_ctime to help ensure that use of the accessors.

v1: bpf-next: bpf: Add two new bpf helpers bpf_perf_type_[uk]probe()

We are utilizing BPF LSM to monitor BPF operations within our container environment. Our goal is to examine the program type and perform the respective audits in our LSM program.
When it comes to the perf_event BPF program, there are no specific definitions for the perf types of kprobe or uprobe. In other words, there is no PERF_TYPE_[UK]PROBE. It appears that defining them as UAPI at this stage would be impractical.

v5: bpf-next: Handle immediate reuse in bpf memory allocator

V5 incorporates suggestions from Alexei and Paul (Big thanks for that). The main changes includes: *) Use per-cpu list for reusable list and freeing list to reduce lockcontention and retain numa-ware attribute *) Use multiple RCU callback for reuse as v3 did *) Use rcu_momentary_dyntick_idle() to reduce the peak memory footprint

v1: net-next: virtio-net: avoid XDP and _F_GUEST_CSUM

virtio-net needs to clear the VIRTIO_NET_F_GUEST_CSUM feature when loading XDP. The main reason for doing this is because VIRTIO_NET_F_GUEST_CSUM allows to receive packets marked as VIRTIO_NET_HDR_F_NEEDS_CSUM. Such packets are not compatible with XDP programs, because we cannot guarantee that the csum_{start, offset} fields are correct after XDP modifies the packets.

v3: bpf-next: bpf, arm64: use BPF prog pack allocator in BPF JIT

BPF programs currently consume a page each on ARM64. For systems with many BPF programs, this adds significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system.
Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue. It packs multiple BPF programs into a single huge page. It is currently only enabled for the x86_64 BPF JIT.

周边技术动态

Qemu

QEMU RISC-V

hello, I built RISC-V toolchain and QEMU as follows:
Install prerequisites:
https://github.com/riscv-collab/riscv-gnu-toolchain#prerequisites
Install additional prerequisites:
https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1251 git clone https://github.com/riscv-collab/riscv-gnu-toolchain cd riscv-gnu-toolchain ./configure –prefix=/home/RISCV-installed-Tools –with-arch=rv32i_zicsr –with-abi=ilp32 make make build-qemu

v2: target/riscv: Restrict KVM-specific fields from ArchCPU

These fields shouldn’t be accessed when KVM is not available.
Restrict the KVM timer migration state. Rename the KVM timer post_load() handler accordingly, because cpu_post_load() is too generic.

v4: Add RISC-V KVM AIA Support

This series adds support for KVM AIA in RISC-V architecture.
In order to test these patches, we require Linux with KVM AIA support which can be found in the riscv_kvm_aia_hwaccel_v1 branch at https://github.com/avpatel/linux.git

v1: linux-user/riscv: Add syscall riscv_hwprobe

This patch adds the new syscall for the “RISC-V Hardware Probing Interface” (https://docs.kernel.org/riscv/hwprobe.html).

U-Boot

v2: riscv: Add ACLINT mtimer and mswi devices support

This RISC-V ACLINT specification [1] defines a set of memory mapped devices which provide inter-processor interrupts (IPI) and timer functionalities for each HART on a multi-HART RISC-V platform.
This seriesl updates U-Boot existing SiFive CLINT driver to handle the ACLINT changes, and is now able to support both CLINT and ACLINT.

CFP open for RISC-V MC at Linux Plumbers Conference 2023

The CFP for topic proposals for the RISC-V micro conference[1] 2023 is open now. Please submit your proposal before it’s too late!
The Linux plumbers event will be both in person and remote (hybrid)virtual this year. More details can be found here [2].

[置顶] 泰晓 RISC-V 实验箱，配套 30+ 讲嵌入式 Linux 系统开发公开课

[置顶] Linux Lab v1.4 升级部分内核到 v6.10，新增泰晓 RISC-V 实验箱支持，新增最小化内核配置支持大幅提升内核编译速度，在单终端内新增多窗口调试功能等Linux Lab 发布 v1.4 正式版，升级部分内核到 v6.10，新增泰晓实验箱支持

[置顶] 泰晓社区近日发布了一款儿童益智版 Linux 系统盘，集成了数十个教育类与益智游戏类开源软件国内首个儿童 Linux 系统来了，既可打字编程学习数理化，还能下棋研究数独提升智力

RISC-V Linux 内核及周边技术动态第 51 期

内核动态

RISC-V 架构支持

进程调度

内存管理

文件系统

网络设备

安全增强

异步 IO

Rust For Linux

BPF

周边技术动态

Qemu

Install prerequisites:

Install additional prerequisites:

U-Boot

猜你喜欢：

Read Album:

Read Related:

Read Latest:

支付宝打赏￥9.68元		微信打赏￥9.68元
	请作者喝杯咖啡吧