RISC-V Linux 内核及周边技术动态第 63 期

呀呀呀创作于 2023/10/13

时间：20231010
编辑：晓怡
仓库：RISC-V Linux 内核技术调研活动
赞助：PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

Huashan Pi board is an embedded development platform based on the CV1812H chip. Add minimal device tree files for this board. Currently, it can boot to a basic shell.
NOTE: this series is based on the Jisheng’s Milk-V Duo patch.

v3: riscv,isa-extensions additions

Now with the RFC tag dropped. There are no changes here from “RFC v2”, other than the addition of tags that were provided along the way. I have not added “Zfh” to the T-Head based stuff, as I can’t actually read the documentation that would show that they’re encoding-for-encoding compatible with the standard extension, since it is apparently only in Chinese.

v1: soc: renesas: select ERRATA_ANDES for R9A07G043 only when alternatives are present

Randy reported a randconfig build issue against linux-next: WARNING: unmet direct dependencies detected for ERRATA_ANDESDepends on [n]: RISCV_ALTERNATIVE [=n] && RISCV_SBI [=y]Selected by [y]:
ARCH_R9A07G043 [=y] && SOC_RENESAS [=y] && RISCV [=y] && NONPORTABLE [=y] && RISCV_SBI [=y]

v2: riscv: Add remaining module relocations and tests

A handful of module relocations were missing, this patch includes the remaining ones. I also wrote some test cases to ensure that module loading works properly. Some relocations cannot be supported in the kernel, these include the ones that rely on thread local storage and dynamic linking.

v2: Add Milk-V Duo board support

Milk-V Duo[1] board is an embedded development platform based on the CV1800B[2] chip. Add minimal device tree files for the development board. Currently, now it’s supported to boot to a basic shell.

v1: soc: renesas: make ARCH_R9A07G043 (riscv version) depend on NONPORTABLE

Drew found “CONFIG_DMA_GLOBAL_POOL=y causes ADMA buffer alloc to fail” the log looks like: [ 3.741083] mmc0: Unable to allocate ADMA buffers - falling back to standard DMA
The logic is: generic riscv defconfig selects ARCH_RENESAS then ARCH_R9A07G043 which selects DMA_GLOBAL_POOL, which assumes all non-dma-coherent riscv platforms have a dma global pool, this assumption seems not correct. And I believe DMA_GLOBAL_POOL should not be selected by ARCH_SOCFAMILIY, instead, only ARCH under some specific conditions can select it globaly, for example NOMMU ARM and so on, because it’s designed for special cases such as “nommu cases where non-cacheable memory lives in a fixed place in the physical address map” as pointed out by Robin.

v2: Add support to handle misaligned accesses in S-mode

Since commit 61cadb9 (“Provide new description of misaligned load/store behavior compatible with privileged architecture.”) in the RISC-V ISA manual, it is stated that misaligned load/store might not be supported. However, the RISC-V kernel uABI describes that misaligned accesses are supported. In order to support that, this series adds support for S-mode handling of misaligned accesses as well support for prctl(PR_UNALIGN).

v1: riscv: blacklist assembly symbols for kprobe

Adding kprobes on some assembly functions (mainly exception handling) will result in crashes (either recursive trap or panic). To avoid such errors, add ASM_NOKPROBE() macro which allow adding specific symbols into the __kprobe_blacklist section and use to blacklist the following symbols that showed to be problematic:
handle_exception()
ret_from_exception()
handle_kernel_stack_overflow()

v1: bpf-next: selftest/bpf, riscv: Improved cross-building support

Yet another “more cross-building support for RISC-V” series.
An example how to invoke a gen_tar build:
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- CC=riscv64-linux-gnu-gcc \
HOSTCC=gcc O=/workspace/kbuild FORMAT= \
SKIP_TARGETS=”arm64 ia64 powerpc sparc64 x86 sgx” -j $(($(nproc)-1)) \
-C tools/testing/selftests gen_tar

v1: bpf: riscv, bpf: Properly sign-extend return values

The RISC-V architecture does not expose sub-registers, and hold all 32-bit values in a sign-extended format [1] [2]:
The compiler and calling convention maintain an invariant that all
32-bit values are held in a sign-extended format in 64-bit
registers. Even 32-bit unsigned integers extend bit 31 into bits
63 through 32. Consequently, conversion between unsigned and
signed 32-bit integers is a no-op, as is conversion from a signed
32-bit integer to a signed 64-bit integer.

v2: pwm: add driver for T-THEAD TH1520 SoC

T-HEAD SoCs such as the TH1520 contain a PWM controller used to control the LCD backlight, fan and so on. Add the PWM driver support for it.

v3: RISC-V: build: Allow LTO to be selected

Allow LTO to be selected for RISC-V, only when LLD >= 14, since there is an issue [1] in prior LLD versions that prevents LLD to generate proper machine code for RISC-V when writing nops.

v10: Linux RISC-V AIA Support

The RISC-V AIA specification is ratified as-per the RISC-V international process. The latest ratified AIA specifcation can be found at: https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf

v3: KVM RISC-V Conditional Operations

This series extends KVM RISC-V to allow Guest/VM discover and use conditional operations related ISA extensions (namely XVentanaCondOps and Zicond).

进程调度

v1: sched/numa: Complete scanning of partial and inactive VMAs

NUMA Balancing currently uses PID fault activity within a VMA to determine if it is worth updating PTEs to trap NUMA hinting faults. While this is reduces overhead, it misses two important corner case. The first is that if Task A partially scans a VMA that is active and Task B resumes the scan but is inactive, then the remainder of the VMA may be missed. Similarly, if a VMA is inactive for a period of time then it may never be scanned again.

v3: linux-next: sched/psi: Optimize the process of updating triggers and rtpoll_total

When psimon wakes up and there are no state changes for rtpoll_states, it’s unnecessary to update triggers and rtpoll_total because the pressures being monitored by user have not changed. This will help to slightly reduce unnecessary computations of psi.

v2: drm-misc-next: drm/sched: implement dynamic job-flow control

Currently, job flow control is implemented simply by limiting the number of jobs in flight. Therefore, a scheduler is initialized with a submission limit that corresponds to the number of jobs which can be sent to the hardware.

v7: sched/rt: Move sched_rt_entity::back to CONFIG_RT_GROUP_SCHED

The member back in struct sched_rt_entity only related to RT_GROUP_SCHED, it should not place out of RT_GROUP_SCHED, move back to RT_GROUP_SCHED. It will save a few bytes.

v1: linux-next: sched/psi: Optimize the process of updating triggers and rtpoll_total

When psimon wakes up and there are no state changes for rtpoll_states, it’s unnecessary to update triggers and rtpoll_total because the pressures being monitored by user have not changed. This will help to slightly reduce unnecessary computations of psi.

v1: sched/rt: case sysctl_sched_rt_period to integer

proc_dointvec_minmax is for integer, but sysctl_sched_rt_period is an unsigned integer. And sysctl_sched_rt_period takes values from 1 to INT_MAX, so sysctl_sched_rt_period doesn’t have to be an unsigned integer.

v1: sched/fair: Avoid unnecessary IPIs for ILB

Whenever a CPU stops its tick, it now requires another idle CPU to handle the balancing for it because it can’t perform its own periodic load balancing. This means it might need to update ‘nohz.next_balance’ to ‘rq->next_balance’ if the upcoming nohz-idle load balancing is too distant in the future. This update process is done by triggering an ILB, as the general ILB handler (_nohz_idle_balance) that manages regular nohz balancing also refreshes ‘nohz.next_balance’ by looking at the ‘rq->next_balance’ of all other idle CPUs and selecting the smallest value.

v1: perf bench sched pipe: Add -G/–cgroups option

The -G/–cgroups option is to put sender and receiver in different cgroups in order to measure cgroup context switch overheads.

v2: sched/fair: Preserve PLACE_DEADLINE_INITIAL deadline

An entity is supposed to get an earlier deadline with PLACE_DEADLINE_INITIAL when it’s forked, but the deadline gets overwritten soon after in enqueue_entity() the first time a forked entity is woken so that PLACE_DEADLINE_INITIAL is effectively a no-op.

v4: sched/core: Use zero length to reset cpumasks in sched_setaffinity()

Since commit 8f9ea86fdf99 (“sched: Always preserve the user requested cpumask”), user provided CPU affinity via sched_setaffinity(2) is perserved even if the task is being moved to a different cpuset. However, that affinity is also being inherited by any subsequently created child processes which may not want or be aware of that affinity.

内存管理

v1: Swap-out small-sized THP without splitting

This is an RFC for a small series to add support for swapping out small-sized THP without needing to first split the large folio via __split_huge_page(). It closely follows the approach already used by PMD-sized THP.

v1: maple_tree: Add GFP_KERNEL to allocations in mas_expected_entries()

Users complained about OOM errors during fork without triggering compaction. This can be fixed by modifying the flags used in mas_expected_entries() so that the compaction will be triggered in low memory situations. Since mas_expected_entries() is only used during fork, the extra argument does not need to be passed through.

v1: exec: allow executing block devices

As far as I can tell, the S_ISREG() check is there to prevent executing files where that would be nonsensical, like directories, fifos, or sockets. But the semantics for executing a block device are quite obvious — the block device acts just like a regular file.
My use case is having a common VM image that takes a configurable payload to run. The payload will always be a single ELF file.

v2: mm/mprotect: allow unfaulted VMAs to be unaccounted on mprotect()

When mprotect() is used to make unwritable VMAs writable, they have the VM_ACCOUNT flag applied and memory accounted accordingly.
If the VMA has had no pages faulted in and is then made unwritable once again, it will remain accounted for, despite not being capable of extending memory usage.

v1: -next: mm: convert page cpupid functions to folios

The cpupid(or access time) used by numa balancing is stored in flags or _last_cpupid(if LAST_CPUPID_NOT_IN_PAGE_FLAGS) of page, this is to convert page cpupid to folio cpupid, a new _last_cpupid is added into folio, which make us to use folio->_last_cpupid directly, and the page_cpupid_xchg_last(), xchg_page_access_time() and page_cpupid_last() are converted to folio ones.

v1: sched/wait: introduce endmark in __wake_up_common

Without this patch applied, it can cause the waker to fall into an infinite loop in some cases. The commit 2554db916586 (“sched/wait: Break up long wake list walk”) introduces WQ_FLAG_BOOKMARK to break up long wake list walk. When the number of walked entries reach 64, the waker will record scan position and release the queue lock, which reduces interrupts and rescheduling latency.

v2: mm: memcg: subtree stats flushing and thresholds

This series attempts to address shortages in today’s approach for memcg stats flushing, namely occasionally stale or expensive stat reads. The series does so by changing the threshold that we use to decide whether to trigger a flush to be per memcg instead of global (patch 3), and then changing flushing to be per memcg (i.e. subtree flushes) instead of global (patch 5).

v2: mm: improve performance of accounted kernel memory allocations

This patchset improves the performance of accounted kernel memory allocations by 30% as measured by a micro-benchmark [1]. The benchmark is very straightforward: 1M of 64 bytes-large kmalloc() allocations.

v2: Abstract vma_merge() and split_vma()

The vma_merge() interface is very confusing and its implementation has led to numerous bugs as a result of that confusion.
In addition there is duplication both in invocation of vma_merge(), but also in the common mprotect()-style pattern of attempting a merge, then if this fails, splitting the portion of a VMA about to have its attributes changed.

v1: align maple tree write paths

This series modifies the store paths in mas_store_gfp() and mas_erase() to use the newly refined preallocation calculation before their calls to mas_wr_store_entry(). This will avoid having to do worst case calculations.

v1: mm: hugetlb_vmemmap: use folio argument for hugetlb_vmemmap_* functions

Most function calls in hugetlb.c are made with folio arguments. This brings hugetlb_vmemmap calls inline with them by using folio instead of head struct page. Head struct page is still needed within these functions.

v1: mm: hugetlb: Only prep and add allocated folios for non-gigantic pages

Calling prep_and_add_allocated_folios when allocating gigantic pages at boot time causes the kernel to crash as folio_list is empty and iterating it causes a NULL pointer dereference. Call this only for non-gigantic pages when folio_list has entires.

v6: arm64/gcs: Provide support for GCS in userspace

The arm64 Guarded Control Stack (GCS) feature provides support for hardware protected stacks of return addresses, intended to provide hardening against return oriented programming (ROP) attacks and to make it easier to gather call stacks for applications such as profiling.

v2: kasan:print the original fault addr when access invalid shadow

The generic kasan also has similar oops.
It only reports the shadow address which causes oops but not the original address.
Commit 2f004eea0fc8(“x86/kasan: Print original address on #GP”) introduce to kasan_non_canonical_hook but limit it to KASAN_INLINE.

v3: userfaultfd move option

This patch series introduces UFFDIO_MOVE feature to userfaultfd, which has long been implemented and maintained by Andrea in his local tree [1], but was not upstreamed due to lack of use cases where this approach would be better than allocating a new page and copying the contents. Previous upstraming attempts could be found at [6] and [7].

v1: hot page swap to zram, cold page swap to swapfile directly

We team developed a feature in Android linux v4.19 that can directly swapout cold pages to the swapfile device and hot pages to the ZRAM device. This can reduce the lag when writing back cold pages to backing-dev through ZRAM when there is a lot of memory pressure, saving the ZRAM compression/decompression process. Especially for low-end Android devices, low CPU frequency and small memory.

v3: permit write-sealed memfd read-only shared mappings

The man page for fcntl() describing memfd file seals states the following about F_SEAL_WRITE:-
Furthermore, trying to create new shared, writable memory-mappings via
mmap(2) will also fail with EPERM.
With emphasis on ‘writable’. In turns out in fact that currently the kernel simply disallows all new shared memory mappings for a memfd with F_SEAL_WRITE applied, rendering this documentation inaccurate.

v1: memcg: add interface to force disable swap

Global reclaim will swap even if swappiness is set to 0. In particular case, users wish to be able to completely disable swap for specific processes. One scenario is that if JVM memory pages falls into swap, the performance will noticeably reduce and the GC pauses tend to increase to levels not tolerable by most applications. If it’s possible to only disable swap out for specific processes, it can address the JVM GC pauses issues, and at the same time, memory reclaim pressure is also manageable.

v9: ACPI: APEI: handle synchronous errors in task work with proper si_code

I have rewritten the cover letter with the hope that the maintainer will truly understand the necessity of this patch. Both Alibaba and Huawei met the same issue in products, and we hope it could be fixed ASAP.

v1: mm: add printf attribute to shrinker_debugfs_name_alloc

This fixes a compiler warning when compiling an allyesconfig with W=1:
mm/internal.h:1235:9: error: function might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format]

v2: Handle more faults under the VMA lock

At this point, we’re handling the majority of file-backed page faults under the VMA lock, using the ->map_pages entry point. This patch set attempts to expand that for the following siutations:
There is no support in this patch set for drivers to mark themselves as being VMA lock friendly; they could implement the ->map_pages vm_operation, but if they do, they would be the first. This is probably something we want to change at some point in the future, and I’ve marked where to make that change in the code.

v4: hugetlb memcg accounting

Currently, hugetlb memory usage is not acounted for in the memory controller, which could lead to memory overprotection for cgroups with hugetlb-backed memory. This has been observed in our production system.

v1: mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()

Commit b035f5a6d852 (“mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible”) allows architectures with non-coherent DMA to define a small ARCH_KMALLOC_MINALIGN (e.g. sizeof(unsigned long long)) and this has been enabled on arm64. With KASAN_HW_TAGS enabled, however, ARCH_SLAB_MINALIGN becomes 16 on arm64 (arch_slab_minalign() dynamically selects it since commit d949a8155d13 (“mm: make minimum slab alignment a runtime property”)).

v1: zsmalloc: use copy_page for full page copy

Some architectures have implemented optimized copy_page for full page copying, such as arm.
On my arm platform, use the copy_page helper for single page copying is about 10 percent faster than memcpy.

v7: Batch hugetlb vmemmap modification operations

When hugetlb vmemmap optimization was introduced, the overhead of enabling the option was measured as described in commit 426e5c429d16 [1]. The summary states that allocating a hugetlb page should be 2x slower with optimization and freeing a hugetlb page should be 2-3x slower. Such overhead was deemed an acceptable trade off for the memory savings obtained by freeing vmemmap pages.

文件系统

v8: Introduce provisioning primitives

This patch series is version 8 of the patch series to introduce block-level provisioning mechanism (original [1]), which is useful for provisioning space across thinly provisioned storage architectures (loop devices backed by sparse files, dm-thin devices, virtio-blk). This series has minimal changes over v7[2].

v3: fs-verity support for XFS

This patchset introduces fs-verity [6] support in XFS. This implementation uses extended attributes to store fs-verity metadata. The Merkle tree blocks are stored in the remote extended attributes. The names are offsets into the tree.

v1: filemap: call filemap_get_folios_tag() from filemap_get_folios()

filemap_get_folios() is filemap_get_folios_tag() with XA_PRESENT as the tag that is being matched. Return filemap_get_folios_tag() with XA_PRESENT as the tag instead of duplicating the code in filemap_get_folios().

v1: virtiofs: Export filesystem tags through sysfs

virtiofs filesystem is mounted using a “tag” which is exported by the virtiofs device. virtiofs driver knows about all the available tags but these are not exported to user space.

v2: Pass data temperature information to UFS devices

UFS vendors need the data lifetime information to achieve good performance. Without this information there is significantly higher write amplification due to garbage collection. Hence this patch series that add support in F2FS and also in the block layer for data lifetime information. The SCSI disk (sd) driver is modified such that it passes write hint information to SCSI devices via the GROUP NUMBER field.

v1: bootconfig: Expose boot-loader kernel command-line arguments

This series contains bootconfig updates that make the kernel command-line arguments that came from the bootloader (excluding those from bootconfig) visible as a comment in the existing /proc/bootconfig file. It also updates documentation.

v1: backing file: free directly

Backing files as used by overlayfs are never installed into file descriptor tables and are explicitly documented as such. They aren’t subject to rcu access conditions like regular files are.

网络设备

v1: net-next: selftests: netdevsim: use suitable existing dummy file for flash test

The file name used in flash test was “dummy” because at the time test was written, drivers were responsible for file request and as netdevsim didn’t do that, name was unused. However, the file load request is now done in devlink code and therefore the file has to exist. Use first random file from /lib/firmware for this purpose.

v1: net-next: net: dsa: microchip: enable setting rmii reference

KSZ88X3 devices can select between internal and external RMII reference clock. This patch series introduces new device tree property for setting reference clock to internal.

v6: Introduce STM32 Firewall framework

Introduce STM32 Firewall framework for STM32MP1x and STM32MP2x platforms. STM32MP1x(ETZPC) and STM32MP2x(RIFSC) Firewall controllers register to the framework to offer firewall services such as access granting.

v5: Add MCTP-over-KCS transport binding

This change adds a MCTP KCS transport binding, as defined by the DMTF specificiation DSP0254 - “MCTP KCS Transport Binding”. A MCTP protocol network device is created for each KCS channel found in the system. The interrupt code for the KCS state machine is based on the current IPMI KCS driver. Since the KCS subsystem code is now used both in IPMI and MCTP drivers the separate patchsets move KCS subsystem includes to a common folder.

v1: net-next: devlink: finish conversion to generated split_ops

This patchset converts the remaining genetlink commands to generated split_ops and removes the existing small_ops arrays entirely alongside with shared netlink attribute policy.

v1: net-next: devlink: retain error in struct devlink_fmsg

Extend devlink fmsg to retain error, and return it at each subsequent call (patch 1), so drivers could omit all but last error checks (the rest of the patches).

v1: iproute2: bridge: fdb: add an error print for unknown command

Commit 6e1ca489c5a2 (“bridge: fdb: add new flush command”) added support for “bridge fdb flush” command. This commit did not handle unsupported keywords, they are just ignored.

v3: net-next: net: netconsole: configfs entries for boot target

There is a limitation in netconsole, where it is impossible to disable or modify the target created from the command line parameter. (netconsole=…).

v2: net-next: devlink: don’t take instance lock for nested handle put

To fix this, don’t take the devlink instance lock when putting nested handle. Instead, rely on devlink reference to access relevant pointers within devlink structure. Also, make sure that the device does not disappear by taking a reference in devlink_alloc_ns().

v2: iproute2-next: rdma: Support dumping SRQ resource in raw format

This patchset adds support to dump SRQ resource in raw format with rdmatool. The corresponding kernel commit is aebf8145e11a (“RDMA/core: Add support to dump SRQ resource in RAW format”)

v2: appletalk: make localtalk and ppp support conditional

The last localtalk driver is gone now, and ppp support was never fully merged, but the code to support them for phase1 networking still calls the deprecated .ndo_do_ioctl() helper.

v2: net-next: netlink: specs: don’t allow version to be specified for genetlink

There is no good reason to specify the version for new protocols. Forbid it in genetlink schema.

v1: drivers: net: wwan: wwan_core.c: resolved spelling mistake

resolved typing mistake from devce to device

v1: e100: replace deprecated strncpy with strscpy

strncpy is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces.

v1: net: fec: replace deprecated strncpy with ethtool_sprintf

strncpy is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces.

v1: next: net: wwan: t7xx: Add __counted_by for struct t7xx_fsm_event and use struct_size()

Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).

v2: net: nfc: nci: assert requested protocol is valid

The protocol is used in a bit mask to determine if the protocol is supported. Assert the provided protocol is less than the maximum defined so it doesn’t potentially perform a shift-out-of-bounds and provide a clearer error for undefined protocols vs unsupported ones.

v1: net: dsa: qca8k: replace deprecated strncpy with ethtool_sprintf

strncpy is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces.

v1: net-next: net: can: Use device_get_match_data()

Use preferred device_get_match_data() instead of of_match_device() to get the driver match data. With this, adjust the includes to explicitly include the correct headers.

v5: net-next: net: Make timestamping selectable

Up until now, there was no way to let the user select the layer at which time stamping occurs. The stack assumed that PHY time stamping is always preferred, but some MAC/PHY combinations were buggy.

v1: ip_tunnel: convert __be16 tunnel flags to bitmaps

Derived from the PFCP support series[0] as this grew bigger (2 -> 14 commits) and involved more core bitmap changes. Only commits 10 and 11 are from the mentioned tree, the rest is new. PFCP itself still depends on this series.

v1: net-next: net: stmmac: dwmac-stm32: refactor clock config

Currently, clock configuration is spread throughout the driver and partially duplicated for the STM32MP1 and STM32 MCU variants. This makes it difficult to keep track of which clocks need to be enabled or disabled in various scenarios.

v3: net-next: add skb_segment kunit coverage

As discussed at netconf last week. Some kernel code is exercised in many different ways. skb_segment is a prime example. This 350 line function has 49 different patches in git blame with 28 different authors.

安全增强

v1: next: atags_proc: Add __counted_by for struct buffer and use struct_size()

Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).

v1: next: wifi: brcmfmac: fweh: Add __counted_by for struct brcmf_fweh_queue_item and use struct_size()

Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).

v1: next: media: venus: hfi_cmds: Replace one-element array with flex-array member and use __counted_by

Array data in struct hfi_sfr is being used as a fake flexible array at run-time:
drivers/media/platform/qcom/venus/hfi_venus.c:

v1: arm64: dts: exynos: Add reserved memory for pstore on E850-96

Reserve a 2 MiB memory region to record kmsg dumps, console, ftrace and userspace messages. The implemented memory split allows capturing and reading corresponding ring buffers:
dmesg: 6 dumps, 128 KiB each
console: 128 KiB
ftrace: 128 KiB for each of 8 CPUs (1 MiB total)
userspace messages: 128 KiB

v2: drivers: misc: ti-st: replace deprecated strncpy with strscpy

strncpy is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces.

异步 IO

v1: for-6.7/io_uring: ublk: simplify abort with cancelable uring_cmd

Simplify ublk request & io command aborting handling with the new added cancelable uring_cmd. With this change, the aborting logic becomes simpler and more reliable, and it becomes easy to add new feature, such as relaxing queue/ublk daemon association.

v1: liburing: tests: Add test for bid limits of provided_buffer commands

A couple of tests for bid limits when allocating/freeing buffers with provide_buffers command.

v1: io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()

A cpu hotplug callback was issued before wq->all_list was initialized. This results in a null pointer dereference. The fix is to fully setup the io_wq before calling cpuhp_state_add_instance_nocalls().

Rust For Linux

v3: net-next: Rust abstractions for network PHY drivers

This patchset adds Rust abstractions for phylib. It doesn’t fully cover the C APIs yet but I think that it’s already useful. I implement two PHY drivers (Asix AX88772A PHYs and Realtek Generic FE-GE). Seems they work well with real hardware.

v1: rust: crates in other kernel directories

This RFC provides makes possible to have bindings for kernel subsystems that are compiled as modules.
Previously, if you wanted to have Rust bindings for a subsystem, like AMBA for example, you had to put it under rust/kernel/ so it came part of the kernel crate, but this came with many downsides. Namely if you compiled said subsystem as a module you’ve a dependency on it from kernel, which is linked directly on vmlinux.

v2: Rust abstractions for network PHY drivers

This patchset adds Rust abstractions for network PHY drivers. It doesn’t fully cover the C APIs for PHY drivers yet but I think that it’s already useful. I implement two PHY drivers (Asix AX88772A PHYs and Realtek Generic FE-GE). Seems they work well with real hardware.

v4: rust: Respect HOSTCC when linking for host

Matthew Maurer mmaurer@google.com writes:
Currently, rustc defaults to invoking cc, even if HOSTCC is defined, resulting in build failures in hermetic environments where cc does not exist. This includes both hostprogs and proc-macros.

v1: Rust 1.73.0 upgrade

This is the next upgrade to the Rust toolchain since the initial Rust merge, from 1.72.1 to 1.73.0 (i.e. the latest, released today).

BPF

v2: bpf-next: bpf: Detect jumping to reserved code during check_cfg()

Here the verifier rejects the program because it thinks insn at 7 is an invalid BPF_LD_IMM, but such a error log is not accurate since the issue is jumping to reserved code not because the program contains invalid insn. Therefore, make the verifier check the jump target during check_cfg(). For the same program, the verifier reports the following log:

v1: bpf-next: selftests/bpf: add options and frags to xdp_hw_metadata

This is a follow-up to the commit 9b2b86332a9b (“bpf: Allow to use kfunc XDP hints and frags together”).
The are some possible implementations problems that may arise when providing metadata specifically for multi-buffer packets, therefore there must be a possibility to test such option separately.

v1: bpf-next: Detect jumping to reserved code during check_cfg()

Here the verifier rejects the program because it thinks insn at 7 is an invalid BPF_LD_IMM, but such a error log is not accurate since the issue is jumping to reserved code not because the program contains invalid insn. Therefore, make the verifier check the jump target during check_cfg(). For the same program, the verifier reports the following log:
func#0 @0 jump to reserved code from insn 8 to 7

v1: bpf-next: bpf, cgroup: Add BPF support for cgroup1 hierarchy

Currently, BPF is primarily confined to cgroup2, with the exception of cgroup_iter, which supports cgroup1 fds. Unfortunately, this limitation prevents us from harnessing the full potential of BPF within cgroup1 environments.

v6: Reduce overhead of LSMs with static calls

Background
LSM hooks (callbacks) are currently invoked as indirect function calls. These callbacks are registered into a linked list at boot time as the order of the LSMs can be configured on the kernel command line with the “lsm=” command line parameter.
Indirect function calls have a high overhead due to retpoline mitigation for various speculative execution attacks.

v3: bpf-next: selftests/bpf: Add pairs_redir_to_connected helper

Extract duplicate code from these four functions
unix_redir_to_connected()udp_redir_to_connected()inet_unix_redir_to_connected()unix_inet_redir_to_connected()
to generate a new helper pairs_redir_to_connected(). Create the

v2: clang-tools support in tools

Allow the clang-tools scripts to work with builds in tools such as tools/perf and tools/lib/perf. An example use looks like:
Fix a number of the more serious low-hanging issues in perf found by clang-tidy.

v3: bpf-next: bpf: Avoid unnecessary -EBUSY from htab_lock_bucket

However, if an IRQ hits between 2 and 3, BPF programs attached to the IRQ logic will not able to access the same hash of the hashtab and get -EBUSY. This -EBUSY is not really necessary. Fix it by disabling IRQ before checking map_locked:
preempt_disable();
local_irq_save();
check percpu counter htab->map_locked[hash] for recursion; 3.1. if map_lock[hash] is already taken, return -BUSY;
raw_spin_lock().

v1: bpf-next: bpf: Inherit system settings for CPU security mitigations

Currently, there exists a system-wide setting related to CPU security mitigations, denoted as ‘mitigations=’. When set to ‘mitigations=off’, it deactivates all optional CPU mitigations. Therefore, if we implement a system-wide ‘mitigations=off’ setting, it should inherently bypass Spectre v1 and Spectre v4 in the BPF subsystem.

v2: bpf-next: bpf: Add ability to pin bpf timer to calling CPU

BPF supports creating high resolution timers using bpf_timer_* helper functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which specifies that the timeout should be interpreted as absolute time. It would also be useful to be able to pin that timer to a core. For example, if you wanted to make a subset of cores run without timer interrupts, and only have the timer be invoked on a single core.

v1: kbuild: kselftest-merge target improvements

Two minor changes to the kselftest-merge target:
Let builtin have presedence over modules when merging configs
Merge per-arch configs, if available
Björn

v1: net: i40e: sync next_to_clean and next_to_process for programming status desc

When a programming status desc is encountered on the rx_ring, next_to_process is bumped along with cleaned_count but next_to_clean is not. This causes I40E_DESC_UNUSED() macro to misbehave resulting in overwriting whole ring with new buffers.

周边技术动态

Qemu

v3: target/riscv: Use env_archcpu for better performance

RISCV_CPU(cs) uses a checked cast. When QOM cast debugging is enabled this adds about 5% total overhead when emulating RV64 on x86-64 host.
Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk disk. The guest has a copy of the qemu source tree. The test involves compiling the qemu source tree with ‘make clean; time make -j16’.

v2: riscv: deprecate capital ‘Z’ CPU properties

This second version of the patch fixes a ‘with with’ typo in the deprecated.rst document. No other changes made.

v1: target/riscv: deprecate capital ‘Z’ CPU properties

At this moment there are eleven CPU extension properties that starts with capital ‘Z’: Zifencei, Zicsr, Zihintntl, Zihintpause, Zawrs, Zfa, Zfh, Zfhmin, Zve32f, Zve64f and Zve64d. All other extensions are named with lower-case letters.

v2: riscv: RVA22U64 profile support

Several design changes were made in this version after the reviews and feedback in the v1 [1]. The high-level summary is:
we’ll no longer allow users to set profile flags for vendor CPUs. If we’re to adhere to the current policy of not allowing users to enable extensions for vendor CPUs, the profile support would become a glorified way of checking if the vendor CPU happens to support a specific profile. If a future vendor CPU supports a profile the CPU can declare it manually in its cpu_init() function, the flag will still be set, but users can’t change it;

v2: riscv, kvm: support KVM_GET_REG_LIST

In this new version all instances of “error_setg(&error_fatal, …” were replaced with error_report() and exit(1), as suggested by Phil in v1.
No other changes made.

v4: Risc-V/gdb: replace exit calls with proper shutdown

This series replaces some of the call to exit in hardware used by Risc-V boards. Otherwise, the gdb connection can be abruptly disconnected resulting in the last gdb packet “Wxx” being not sent.

U-Boot

v2: Tidy up use of CONFIG_CMDLINE

It should be possible to disable CONFIG_CMDLINE and have all commands and related functionality dropped from U-Boot. This is useful when trying to reduce the size of U-Boot.
Recent changes have stopped this from working.

Pull request: u-boot-rockchip-20231007

Please pull the updates for rockchip platform:
Add Board: rk3568 Bananapi R2Pro;
Update pcie bifurcation support;
dwc_eth_qos controller support for rk3568 and rk3588;
Compressed binary support for U-Boot on rockchip platform;
dts and config updates for different board and soc;
CI: https://source.denx.de/u-boot/custodians/u-boot-rockchip/-/pipelines/18047

v1: RESEND: riscv: spl: OpenSBI OS boot mode

Introduce a shortcut boot mode for RISC-V.
As we know, in ARM architecture has the Falcon mode to do the shortcut boot to the Linux kernel. (by enabling CONFIG_SPL_OS_BOOT) ARM Falcon mode boot flow would be as follows: u-boot SPL -> Linux kernel

[置顶] 泰晓 RISC-V 实验箱，配套 30+ 讲嵌入式 Linux 系统开发公开课

[置顶] Linux Lab v1.4 升级部分内核到 v6.10，新增泰晓 RISC-V 实验箱支持，新增最小化内核配置支持大幅提升内核编译速度，在单终端内新增多窗口调试功能等Linux Lab 发布 v1.4 正式版，升级部分内核到 v6.10，新增泰晓实验箱支持

[置顶] 泰晓社区近日发布了一款儿童益智版 Linux 系统盘，集成了数十个教育类与益智游戏类开源软件国内首个儿童 Linux 系统来了，既可打字编程学习数理化，还能下棋研究数独提升智力

RISC-V Linux 内核及周边技术动态第 63 期

内核动态

RISC-V 架构支持

进程调度

内存管理

文件系统

网络设备

安全增强

异步 IO

Rust For Linux

BPF

Background

周边技术动态

Qemu

U-Boot

猜你喜欢：

Read Album:

Read Related:

Read Latest:

支付宝打赏￥9.68元		微信打赏￥9.68元
	请作者喝杯咖啡吧