泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

还在观望?5小时公开课入门RISC-V架构
请稍侯

RISC-V Linux 内核及周边技术动态第 72 期

呀呀呀 创作于 2024/01/05

时间:20231231
编辑:晓怡
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v2: riscv: Add Zicbop & prefetchw support

This patch series adds Zicbop support and then enables the Linux prefetchw feature. It’s based on v6.7-rc7.

GIT PULL: KVM/riscv changes for 6.8 part #1

We have the following KVM RISC-V changes for 6.8: 1) KVM_GET_REG_LIST improvement for vector registers 2) Generate ISA extension reg_list using macros in get-reg-list selftest 3) Steal time account support along with selftest

v1: Enable SPCR table for console output on RISC-V

This patch will enable the SPCR table for RISC-V.

Vendor will enable/disable the SPCR table in the firmware based on the platform design. However, in cases where the SPCR table is not usable, a kernel parameter could be used to specify the preferred console.

v1: riscv: dts: sophgo: add watchdog dt node for CV1800

Add the watchdog device tree node to cv1800 SoC. This patch depends on the clk driver and reset driver. Clk driver link: https://lore.kernel.org/all/IA1PR20MB49539CDAD9A268CBF6CA184BBB9FA@IA1PR20MB4953.namprd20.prod.outlook.com/ Reset driver link: https://lore.kernel.org/all/20231113005503.2423-1-jszhang@kernel.org/

v1: riscv: dts: sophgo: add timer dt node for CV1800

Add the timer device tree node to CV1800 SoC. This patch depends on the clk driver and reset driver. Clk driver link: https://lore.kernel.org/all/IA1PR20MB49539CDAD9A268CBF6CA184BBB9FA@IA1PR20MB4953.namprd20.prod.outlook.com/ Reset driver link: https://lore.kernel.org/all/20231113005503.2423-1-jszhang@kernel.org/

v1: riscv: tlb: avoid tlb flushing on exit & execve

The mmu_gather code sets fullmm=1 when tearing down the entire address space for an mm_struct on exit or execve. So if the underlying platform supports ASID, the tlb flushing can be avoided because the ASID allocator will never re-allocate a dirty ASID.

v1: Add driver for Cadence SD6HC SD/eMMC controller

Starfive JH8100 SoC consists of a Cadence SD/eMMC host controller (Version 6) with Combo PHY which provides DFI interface to SD/eMMC removable or embedded devices. This patch adds initial SD/eMMC support for JH8100 SoC by providing device drivers for Cadence SD/eMMC Version 6 host controller and Combo PHY. This patch series is depending on the JH8100 base patch series in [1], [2], and [3]. The relevant dt-bindings documentation has been updated accordingly.

v2: Unified cross-architecture kernel-mode FPU API

This series unifies the kernel-mode FPU API across several architectures by wrapping the existing functions (where needed) in consistently-named functions placed in a consistent header location, with mostly the same semantics: they can be called from preemptible or non-preemptible task context, and are not assumed to be reentrant. Architectures are also expected to provide CFLAGS adjustments for compiling FPU-dependent code. For the moment, SIMD/vector units are out of scope for this common API.

v1: dt-bindings: riscv: cpus: Clarify mmu-type interpretation

The current description implies that only a single address translation mode is available to the operating system. However, some implementations support multiple address translation modes, and the operating system is free to choose between them.

v14: riscv: Add fine-tuned checksum functions

Each architecture generally implements fine-tuned checksum functions to leverage the instruction set. This patch adds the main checksum functions that are used in networking. Tested on QEMU, this series allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.

v5: riscv: sophgo: add clock support for Sophgo CV1800 SoCs

Add clock controller support for the Sophgo CV1800B and CV1812H.

This patch follow this patch series: https://lore.kernel.org/all/IA1PR20MB495399CAF2EEECC206ADA7ABBBD5A@IA1PR20MB4953.namprd20.prod.outlook.com/

v1: irqchip/sifive-plic: One function call less in __plic_init() after error detection

Date: Tue, 26 Dec 2023 21:34:47 +0100

The kfree() function was called in one case by the __plic_init() function during error handling even if the passed data structure member contained a null pointer. This issue was detected by using the Coccinelle software.

v1: Basic clock and reset support for StarFive JH8100 RISC-V SoC

This patch series enabled basic clock & reset support for StarFive JH8100 SoC.

This patch series depends on the Initial device tree support for StarFive JH8100 SoC patch series which can be found at [1].

v6: Support Andes PMU extension

This patch series introduces the Andes PMU extension, which serves the same purpose as Sscofpmf. To use FDT-based probing for hardware support of the PMU extensions, we first convert T-Head’s PMU to CPU feature alternative, then add Andes PMU alternatives.

v4: riscv: enable EFFICIENT_UNALIGNED_ACCESS and DCACHE_WORD_ACCESS

Some riscv implementations such as T-HEAD’s C906, C908, C910 and C920 support efficient unaligned access, for performance reason we want to enable HAVE_EFFICIENT_UNALIGNED_ACCESS on these platforms. To avoid performance regressions on non efficient unaligned access platforms, HAVE_EFFICIENT_UNALIGNED_ACCESS can’t be globally selected.

v1: riscv: Improve exception and system call latency

Many CPUs implement return address branch prediction as a stack. The RISCV architecture refers to this as a return address stack (RAS). If this gets corrupted then the CPU will mispredict at least one but potentally many function returns.

进程调度

v2: net-next: net/sched: cls_api: complement tcf_tfilter_dump_policy

In function tc_dump_tfilter, the attributes array is parsed via tcf_tfilter_dump_policy which only describes TCA_DUMP_FLAGS. However, the NLA TCA_CHAIN is also accessed with nla_get_u32.

v1: drm/sched: Adjustments for drm_sched_init()

Date: Tue, 26 Dec 2023 16:48:48 +0100

A few update suggestions were taken into account from static source code analysis.

v2: sched/fair: Do not scan non-movable tasks several times

If busiest rq is small, nr_running < SCHED_NR_MIGRATE_BREAK and all tasks are not movable, detach_tasks() should not iterate more than tasks available in the busiest rq.

v1: net: net/sched: cls_api: complement tcf_tfilter_dump_policy

In function tc_dump_tfilter, the attributes array is parsed via tcf_tfilter_dump_policy which only describes TCA_DUMP_FLAGS. However, the NLA TCA_CHAIN is also accessed with nla_get_u32. According to the commit 5e2424708da7 (“xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESH”), such a missing piece could lead to a potential heap data leak.

内存管理

v11: Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

This patchset is also available at:

https://github.com/amdese/linux/commits/snp-host-v11

and is based on top of the following series:

“v1: Add AMD Secure Nested Paging (SEV-SNP) Initialization Support”https://lore.kernel.org/kvm/20231230161954.569267-1-michael.roth@amd.com/

v1: mm: memory: use nth_page() in clear/copy_subpage()

The clear and copy of huge gigantic page has converted to use nth_page() to handle the possible discontinuous struct page(SPARSEMEM without VMEMMAP), but not change for the non-gigantic part, fix it too.

[mm-stable PATCH] mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING

Demotion can work well without CONFIG_NUMA_BALANCING. But the commit 23e9f0138963 (“mm/vmstat: move pgdemote_* to per-node stats”) wrongly hid it behind CONFIG_NUMA_BALANCING.

v1: x86 NUMA-aware kernel replication

This patchset implements initial support of kernel text and rodata replication for x86_64 platform. Linux kernel 6.5.5 is used as a baseline.

There was a work previously published for ARM64 platform by Russell King (arm64 kernel text replication). We hope that it will be possible to push this technology forward together.

v1: mm: ratelimit stat flush from workingset shrinker

One of our internal workload regressed on newer upstream kernel and on further investigation, it seems like the cause is the always synchronous rstat flush in the count_shadow_nodes() added by the commit f82e6bf9bb9b (“mm: memcg: use rstat for non-hierarchical stats”). On further inspection it seems like we don’t really need accurate stats in this function as it was already approximating the amount of appropriate shadow entried to keep for maintaining the refault information. Since there is already 2 sec periodic rstat flush, we don’t need exact stats here. Let’s ratelimit the rstat flush in this code path.

v9: mm/gup: Introduce memfd_pin_folios() for pinning memfd folios (v9)

The first two patches were previously reviewed but not yet merged. These ones need to be merged first as the fourth patch depends on the changes introduced in them and they also fix bugs seen in very specific scenarios (running Qemu with hugetlb=on, blob=true and rebooting guest VM).

v1: mm: kasan: stop leaking stack trace handles

Commit 773688a6cb24 (“kasan: use stack_depot_put for Generic mode”) added support for stack trace eviction for Generic KASAN.

However, that commit didn’t evict stack traces when the object is not put into quarantine. As a result, some stack traces are never evicted from the stack depot.

v2: vhost-vdpa: account iommu allocations

iommu allocations should be accounted in order to allow admins to monitor and limit the amount of iommu memory.

v1: mm: xtensa, kasan: define KASAN_SHADOW_END

Common KASAN code might rely on the definitions of the shadow mapping start, end, and size. Define KASAN_SHADOW_END in addition to KASAN_SHADOW_START and KASAN_SHADOW_SIZE.

v1: kernel: Introduce a write lock/unlock wrapper for tasklist_lock

As a rwlock for tasklist_lock, there are multiple scenarios to acquire read lock which write lock needed to be waiting for. In freeze_process/thaw_processes it can take about 200+ms for holding read lock of tasklist_lock by walking and freezing/thawing tasks in commercial devices. And write_lock_irq will have preempt disabled and local irq disabled to spin until the tasklist_lock can be acquired. This leading to a bad responsive performance of current system.

文件系统

v1: virtiofs: Adjustments for two function implementations

Date: Fri, 29 Dec 2023 09:28:09 +0100

A few update suggestions were taken into account from static source code analysis.

v1: fuse: Improve error handling in two functions

Date: Thu, 28 Dec 2023 21:57:00 +0100

The kfree() function was called in two cases during error handling even if the passed variable contained a null pointer. This issue was detected by using the Coccinelle software.

v1: fuse: use page cache pages for writeback io when virtio_fs is in use

This patch just shows the idea, to see if I’m in the right direction 😊 And a quick prototype shows the performance improvement. If there’re no obvious concerns, I’ll try to make a formal patch and run the fstests

v1: fs: extract include/linux/fs_type.h

struct file_system_type is one of the things which could be extracted out of include/linux/fs.h easily.

Drop some useless forward declarations and externs too.

v2: Move fscrypt keyring destruction to after ->put_super

This series moves the fscrypt keyring destruction to after ->put_super, as this will be needed by the btrfs fscrypt support. To make this possible, it also changes f2fs to release its block devices after generic_shutdown_super() rather than before.

v1: sysctl: treewide: constify ctl_table_root::set_ownership

The set_ownership callback is not supposed to modify the ctl_table. Enforce this expectation via the typesystem.

This change also is a step to put “struct ctl_table” into .rodata throughout the kernel.

v1: blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion. Earlier, polling task used to sleep, relying on interrupt to wake it up. This made some IO take very long when interrupt-coalescing is enabled in NVMe.

网络设备

v1: sunrpc: Improve exception handling in krb5_etm_checksum()

Date: Sun, 31 Dec 2023 14:43:05 +0100

The kfree() function was called in one case by the krb5_etm_checksum() function during error handling even if the passed variable contained a null pointer. This issue was detected by using the Coccinelle software.

[net-next: PATCH] net: mvpp2: initialize port fwnode pointer

Update the port’s device structure also with its fwnode pointer with a recommended device_set_node() helper routine.

v1: tipc: Improve exception handling in tipc_bcast_init()

Date: Sun, 31 Dec 2023 12:20:06 +0100

The kfree() function was called in two cases by the tipc_bcast_init() function during error handling even if the passed variable contained a null pointer. This issue was detected by using the Coccinelle software.

v1: wifi: cfg80211: Replace a label in cfg80211_parse_ml_sta_data()

Date: Sun, 31 Dec 2023 11:22:42 +0100

The kfree() function was called in one case by the cfg80211_parse_ml_sta_data() function during error handling even if the passed variable contained a null pointer. This issue was detected by using the Coccinelle software.

v1: bpf: Adjustments for four function implementations

Date: Sat, 30 Dec 2023 20:51:23 +0100

A few update suggestions were taken into account from static source code analysis.

v1: Revert “net: ipv6/addrconf: clamp preferred_lft to the minimum required”

The commit had a bug and might not have been the right approach anyway.

v1: net-next: net/sched: sch_api: conditional netlink notifications

Implement conditional netlink notifications for Qdiscs and classes, which were missing in the initial patches that targeted tc filters and actions. Notifications will only be built after passing a check for ‘rtnl_notify_needed()’.

v1: net-next: net/sched: introduce ACT_P_BOUND return code

Bound actions always return ‘0’ and as of today we rely on ‘0’ being returned in order to properly skip bound actions in tcf_idr_insert_many. In order to further improve maintainability, introduce the ACT_P_BOUND return code.

v2: net-next: selftests/net: change shebang to bash to support “source”

The patch set [1] added a general lib.sh in net selftests, and converted several test scripts to source the lib.sh.

unicast_extensions.sh (converted in [1]) and pmtu.sh (converted in [2]) have a /bin/sh shebang which may point to various shells in different distributions, but “source” is only available in some of them. For example, “source” is a built-it function in bash, but it cannot be used in dash.

v1: net: rtnetlink: allow to set iface down before enslaving it

The below commit adds support for:

ip link set dummy0 down ip link set dummy0 master bond0 up

but breaks the opposite:

ip link set dummy0 up ip link set dummy0 master bond0 down

v1: bpf-next: bpf: add csum/ip_summed fields to __sk_buff

For now, we have to call some helpers when we need to update the csum, such as bpf_l4_csum_replace, bpf_l3_csum_replace, etc. These helpers are not inlined, which causes poor performance.

v3: net-next: virtio-net: support AF_XDP zero copy

AF_XDP

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero copy feature of xsk (XDP socket) needs to be supported by the driver. The performance of zero copy is very good. mlx5 and intel ixgbe already support this feature, This patch set allows virtio-net to support xsk’s zerocopy xmit feature.

v4: VMware hypercalls enhancements

VMware hypercalls invocations were all spread out across the kernel implementing same ABI as in-place asm-inline. With encrypted memory and confidential computing it became harder to maintain every changes in these hypercall implementations.

v3: posix-timers: add multi_clock_gettime system call

Some user space applications need to read some clocks. Each read requires moving from user space to kernel space. The syscall overhead causes unpredictable delay between N clocks reads Removing this delay causes better synchronization between N clocks.

v8: GenieZone hypervisor drivers

This series is based on linux-next, tag: next-20231222.

GenieZone hypervisor(gzvm) is a type-1 hypervisor that supports various virtual machine types and provides security features such as TEE-like scenarios and secure boot. It can create guest VMs for security use cases and has virtualization capabilities for both platform and interrupt. Although the hypervisor can be booted independently, it requires the assistance of GenieZone hypervisor kernel driver(gzvm-ko) to leverage the ability of Linux kernel for vCPU scheduling, memory management, inter-VM communication and virtio backend support.

v1: net-next: net: mctp: use deprecated parser in mctp_set_link_af

In mctp set_link_af implementation mctp_set_link_af, it uses strict parser nla_parse_nested to parse the nested attribute. This is fine in most cases but not here, as the rtnetlink uses bad magic in setlink code, see code snippet in function do_setlink.

v5: net-next: netdevsim: link and forward skbs between ports

This patchset adds the ability to link two netdevsim ports together and forward skbs between them, similar to veth. The goal is to use netdevsim for testing features e.g. zero copy Rx using io_uring.

v1: iwl-net: idpf: avoid compiler padding in virtchnl2_ptype struct

Config option in arm random config file is causing the compiler to add padding. Avoid it by using “__packed” structure attribute for virtchnl2_ptype struct.

v1: nfc: mei_phy: Adjustments for two function implementations

Date: Wed, 27 Dec 2023 16:53:21 +0100

A few update suggestions were taken into account from static source code analysis.

v1: ss: add option to suppress queue columns

Add a new option -Q/--no-queues to ss(8) to suppress the two standard columns Send-Q and Recv-Q. This helps to keep the output steady for monitoring purposes (like listening sockets).

[net-next PATCH 0/3] net: phy: at803x: even more generalization

This is part 3 of at803x required patches to split the PHY driver in more specific PHY Family driver.

While adding support for a new PHY Family qca807x it was notice lots of similarities with the qca808x cdt function. Hence this series is done to make things easier in the future when qca807x PHY will be submitted.

v2: net-next: MT7530 DSA Subdriver Improvements Act I

Hello!

This patch series simplifies the MT7530 DSA subdriver and improves the logic of the support for MT7530, MT7531, and the switch on the MT7988 SoC.

I have done a simple ping test to confirm basic communication on all switch ports on MCM and standalone MT7530, and MT7531 switch with this patch series applied.

v1: iproute2-next: bridge: mdb: Add flush support

Implement MDB flush functionality, allowing user space to flush MDB entries from the kernel according to provided parameters.

v1: net-next: virtio-net: support device stats

As the spec:

https://github.com/oasis-tcs/virtio-spec/commit/42f389989823039724f95bbbd243291ab0064f82

The virtio net supports to get device stats.

v2: net: wwan: t7xx: Add fastboot interface

To support cases such as firmware update or core dump, the t7xx device is capable of signaling the host that a special port needs to be created before the handshake phase.

v1: net-next: sockptr: Change sockptr_t to be a struct

The original commit for sockptr_t tried to use the pointer value to determine whether a pointer was user or kernel. This can’t work on some architectures and was buggy on x86. So the is_kernel discriminator was added after the union of pointers.

安全增强

v6: shrink lib/string.i via IWYU

This patch series changes the include list of string.c to minimize the preprocessing size. The patch series intends to remove REPEAT_BYE from kernel.h and move it into its own header file because word-at-a-time.h has an implicit dependancy on it but it is declared in kernel.h which is bloated.

BPF

v1: bpf-next: bpf: introduce BPF_MAP_TYPE_RELAY

The patch set introduce a new type of map, BPF_MAP_TYPE_RELAY, based on relay interface [0]. It provides a way for persistent and overwritable data transfer.

周边技术动态

Qemu

v2: target/riscv: SMBIOS support for RISC-V virt machine

Generate SMBIOS tables for the RISC-V mach-virt. Add CONFIG_SMBIOS=y to the RISC-V default config.

v1: target/riscv/tcg: do not set defaults for non-generic

riscv_cpu_options[] are exported using qdev and some of them are defined with default values. This is unfortunate since riscv_cpu_add_user_properties() is called after CPU instance init and there is no clear way to disable MMU/PMP for some CPUs.

v1: RISC-V: ACPI: Enable SPCR

This series focuses on enabling the Serial Port Console Redirection (SPCR) table for the RISC-V virt platform. Considering that ARM utilizes the same function, the initial patch involves migrating the build_spcr function to common code. This consolidation ensures that RISC-V avoids duplicating the function.

Buildroot

package/gdb: add support for GDB 14.1

commit: https://git.buildroot.net/buildroot/commit/?id=a9a56ab6fd98125ca09078bdeb7c8d55d53aa35e branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master

All patches are still relevant, and have been rebased on top of GDB 14.1.

configs/qemu_riscv64_virt_efi: new defconfig

commit: https://git.buildroot.net/buildroot/commit/?id=8219955118fee56ccd3ca8a13a6350d0e15de418 branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master

boot/grub2: add RISC-V 64bit EFI support

commit: https://git.buildroot.net/buildroot/commit/?id=f439b47ed6e987306c7de6d9c3be11de04935377 branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master

Grub can be built as a RISC-V UEFI application since commit [1]. This commit was first included in grub version 2.04.

U-Boot

v1: rtc: driver for Goldfish RTC

The Goldfish RTC is a virtual device which may be supplied by QEMU. It is enabled by default on QEMU’s RISC-V virt machine.

Provide a driver and enable it by default on RISC-V QEMU.

v1: smbios: riscv: set correct SMBIOS processor family value

Many value of processor type exceed 0xff and have to be stored as u16 value. In the type 4 table set processor_family = 0xfe signaling that field processor_family2 is used and write the actual value into the processor_family2 field.

GIT PULL: u-boot-riscv/next

The following changes since commit 4b151562bb8e54160adedbc6a1c0c749c00a2f84:

bootmeth: pass size to efi_binary_run() (2023-12-22 10:36:50 -0500)

are available in the Git repository at:

https://source.denx.de/u-boot/custodians/u-boot-riscv.git next



Read Album:

Read Related:

Read Latest: