泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

泰晓Linux系统盘,不用安装,即插即跑
请稍侯

RISC-V Linux 内核及周边技术动态第 96 期

呀呀呀 创作于 2024/06/18

时间:20240616
编辑:晓瑜
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v1: clk: thead: Add support for TH1520 AP_SUBSYS clock controller

This series adds support for the AP sub-system clock controller in the T-Head TH1520 .

v8: Linux RISC-V IOMMU Support

This patch series introduces support for RISC-V IOMMU architected hardware into the Linux kernel.

v2: RISC-V IOMMU HPM and nested IOMMU support

This series includes RISC-V IOMMU hardware performance monitor and nested IOMMU support.

v1: RISC-V: Dynamically allocate cpumasks and further increase range and default value of NR_CPUS

Currently default NR_CPUS is 64 for riscv64, since the latest QEMU virt machine supports up to 512 CPUS, so set default NR_CPUS 512 for riscv64.

v2: RISC-V: Detect and report speed of unaligned vector accesses

Adds support for detecting and reporting the speed of unaligned vector accesses on RISC-V CPUs.

v2: riscv: Per-thread envcfg CSR support

This series (or equivalent) is a prerequisite for both user-mode pointer masking and CFI support, as those are per-thread features are controlled by fields in the envcfg CSR.

v1: riscv: ftrace: atmoic patching and preempt improvements

This series makes atmoic code patching possible in riscv ftrace.

v3: riscv: dmi: Add SMBIOS/DMI support

Enable the dmi driver for riscv which would allow access the SMBIOS info through some userspace file(/sys/firmware/dmi/*).

v1: mmc-spi - support controllers incapable of getting as low as 400KHz

RFC for some stuff that I’ve got in-progress for a customer’s board where they want to use mmc-spi-slot with a QSPI controller that is incapable of getting as low as 400KHz with the way clocks have been configured on the system.

v2: PCI: microchip: support using either instance 1 or 2

This series splits the second reg property in two, with dedicated “control” and “bridge” entries so that either instance can be used.

v2: Add board support for Sipeed LicheeRV Nano

The LicheeRV Nano is a RISC-V SBC based on the Sophgo SG2002 chip. Adds minimal device tree files for this board to make it boot to a basic shell.

v5: vmalloc: Modify the alloc_vmap_area() error message for better diagnostics

With the update, the output gets modified to include the function parameters along with the start and end of the virtual memory range allowed.

v1: riscv: vdso: do not strip debugging info for vdso.so.dbg

The vdso.so.dbg is a debug version of vdso and could be used for debugging purpose.

v1: function_graph: ftrace_graph_ret_addr(); there can be only one!

Looking for an architecture that did not have it defined, I couldn’t find any. So I removed it.

v2: riscv: Add support for xtheadvector

All of the vector routines have been modified to support this alternative vector version based upon whether xtheadvector was determined to be supported at boot.

v2: riscv: Separate vendor extensions from standard extensions

This also allows each vendor to be conditionally enabled through Kconfig.

v6: Risc-V Svinval support

This patch adds support for the Svinval extension as defined in the Risc V Privileged specification.

LoongArch 架构支持

v6: LoongArch: KVM: Add PMU support

On LoongArch, the host and guest have their own PMU CSRs registers and they share PMU hardware resources. A set of PMU CSRs consists of a CTRL register and a CNTR register.

v1: LoongArch: Add Loongson-3 CPUFreq driver support

This series add architectural preparation and CPUFreq driver for Loongson-3 (based on LoongArch).

v2: LoongArch: KVM: Implement feature passing from user space

Currently features defined in cpucfg CPUCFG_KVM_FEATURE come from kvm kernel mode only.

进程调度

v1: sched/numa: scan the vma if it has not been scanned for a while

This patch is mainly to raise this question, and seek for suggestion from the community to handle it properly. Thanks in advance for any suggestion.

v3: sched/fair: Preempt if the current process is ineligible

This will increase the scheduling delay of other processes.

v1: sched/fair: prefer available idle cpu in select_idle_core

When the idle core cannot be found, the first sched idle cpu or first available idle cpu will be used if exsit.

v2: perf sched map: Add command-name, fuzzy-name options to filter the output map

By default, perf sched map prints sched-in events for all the tasks which may not be required all the time as it prints lot of symbols and rows to the terminal.

内存管理

v2: mm: swap: mTHP swap allocator base on swap cluster order

This is the short term solutiolns “swap cluster order” listed in my “Swap Abstraction” discussion slice 8 in the recent LSF/MM conference.

v2: add mseal to /proc/pid/smaps

Add mseal information in /proc/pid/smaps to indicate the VMA is sealed.

v1: Enhancements to Page Migration with Batch Offloading via DMA

This series introduces enhancements to the page migration code to optimize the “folio move” operations by batching them and enable offloading on DMA hardware accelerators.

v1: mm: truncate: flush lru cache for evicted inode

Flush lru cache to avoid folio->mapping uaf in case of inode teardown.

v5: mm: store zero pages to be swapped out in a bitmap

As shown in the patchseries that introduced the zswap same-filled optimization [1], 10-20% of the pages stored in zswap are same-filled. This is also observed across Meta’s server fleet.

v4: maple_tree: modified return type of mas_wr_store_entry()

Since the return value of mas_wr_store_entry() is not used, the return type can be changed to void.

v6: DAMON based tiered memory management for CXL memory

There was an RFC IDEA “DAMOS-based Tiered-Memory Management” previously posted at .

v1: um/mm: get max_low_pfn from memblock

It is intended to set max_low_pfn to the same value as max_pfn.

v8: Reclaim lazyfree THP without splitting

This series adds support for reclaiming PMD-mapped THP marked as lazyfree without needing to first split the large folio via split_huge_pmd_address().

v1: mm: memcontrol: add VM_BUG_ON_FOLIO() to catch lru folio in mem_cgroup_migrate()

The mem_cgroup_migrate() will clear the memcg data of the old folio, therefore, the callers must make sure the old folio is no longer on the LRU list, otherwise the old folio can not get the correct lruvec object without the memcg data, which could lead to potential problems .

v15: mm/gup: Introduce memfd_pin_folios() for pinning memfd folios

This is not desirable because the pages/folios may reside in Movable zone or CMA block.

**[v2: mm/mm_init.c: simplify logic of deferred_[initfree]_pages](http://lore.kernel.org/linux-mm/20240613114525.27528-1-richard.weiyang@gmail.com/)**

Function deferred_[init|free]_pages are only used in deferred_init_maxorder(), which makes sure the range to init/free is within MAX_ORDER_NR_PAGES size.

v1: asynchronously scan and free empty user PTE pages

This series aims to asynchronously scan and free empty user PTE pages.

v2: Improve the copy of task comm

Using {memcpy,strncpy,strcpy,kstrdup} to copy the task comm relies on the length of task comm.

v5: mm/memblock: Add “reserve_mem” to reserved named memory at boot up

Reserve unspecified location of physical memory from kernel command line

v1: mm: Do not start/end writeback for pages stored in zswap

Most of the work done in folio_start_writeback is reversed in folio_end_writeback. There is some extra work done in folio_end_writeback, however it is incorrect/not applicable to zswap

v3: -next: mm/hugetlb_cgroup: rework on cftypes

This patchset provides an intuitive view of the control files through static templates of cftypes, improve the readability of the code.

v2: Supports to use the default CMA when the device-specified CMA memory is not enough.

This patch will use the default cma region when the device’s specified CMA is not enough.

v1: Introduce tracepoint for hugetlbfs

Here we add some basic tracepoints for debugging hugetlbfs: {alloc, free, evict}_inode, setattr and fallocate.

v2: Enable P2PDMA in Userspace RDMA

This patch series enables P2PDMA memory to be used in userspace RDMA transfers.

文件系统

v2: fs: modify the annotation of vfs_mkdir() in fs/namei.c

modify the annotation of @dir and @dentry

v1: fs/file.c: optimize the critical section of

These 3 patches are created to reduce the critical section of file_lock in alloc_fd() and close_fd().

v1: KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group()

It looks up `stt` from tablefd, but then continues to use it after doing fdput() on the returned fd.

v1: stop lockref from degrading to locked-only ops

speed up parallel lookups of the same terminal inode

v1: Initial LKMM atomics support in Rust

This is a follow-up of [1]. Thanks for all the inputs from that thread.

v4: rcu-based inode lookup for iget*

Revamped the commit message for patch 1, explicitly spelling out a bunch of things and adding bpftrace output.

v2: inode_init_always zeroing i_state

I diffed this against fs-next + my inode hash patch v3 as it adds one i_state = 0 case.

**[v4: ioctl()-based API to query VMAs from /proc//maps](http://lore.kernel.org/linux-fsdevel/20240611110058.3444968-1-andrii@kernel.org/)**

Implement binary ioctl()-based interface to /proc//maps file to allow applications to query VMA information more efficiently than reading *all* VMAs nonselectively through text-based interface of /proc//maps file.

v1: vfs: partially sanitize i_state zeroing on inode creation

Additionally iget5_locked performs i_state = 0 assignment without any locks to begin with and the two combined look confusing at best.

v1: -mm: nilfs2: eliminate the call to inode_attach_wb()

This series eliminates the inode_attach_wb() call from nilfs2, which was introduced as a workaround for a kernel bug but is suspected of layer violation (in fact, it is undesirable since it exposes a reference to the backing device).

v8: block atomic writes

This series introduces a proposal to implementing atomic writes in the kernel for torn-write protection.

v1: UAF in acrn_irqfd_assign() and vfio_virqfd_enable()

I’m not familiar with the area, though, so that might be unfeasible for any number of reasons.

v2: Introduce user namespace capabilities

This patch series introduces a new user namespace capability set, as well as some plumbing around it (i.e. sysctl, secbit, lsm support).

v1: netfs: Switch debug logging to pr_debug()

Instead of inventing a custom way to conditionally enable debugging, just make use of pr_debug(), which also has dynamic debugging facilities and is more likely known to someone who hunts a problem in the netfs code.

v1: blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion.

网络设备

v1: s390/lcs: add missing MODULE_DESCRIPTION() macro

Add the missing invocation of the MODULE_DESCRIPTION() macro.

v1: net: tipc: force a dst refcount before doing decryption

On TIPC decryption path it has the same problem, and skb_dst_force() should be called before doing decryption to avoid a possible crash.

v1: net: wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values

We had a similar issue in sch_fq, fixed with commit d9e15a273306 (“pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM”)

v1: net: ipv6: prevent possible NULL dereference in rt6_probe()

syzbot caught a NULL dereference in rt6_probe() Bail out if __in6_dev_get() returns NULL.

v2: net-next: Introduce PHY mode 10G-QXGMII

This patch series adds 10G-QXGMII mode for PHY driver.

v2: net: neighbour: add RTNL_FLAG_DUMP_SPLIT_NLM_DONE to RTM_GETNEIGH

v4: net-next: net: stmmac: Enable TSO on VLANs

The TSO engine works well when the frames are not VLAN Tagged. But it will produce broken segments when frames are VLAN Tagged.

v1: virtio_net: Eliminate OOO packets during switching

Disable the network device & turn off carrier before modifying the number of queue pairs. Process all the in-flight packets and then turn on carrier, followed by waking up all the queues on the network device.

v2: net-next: net: mana: Add support for page sizes other than 4KB on ARM64

As defined by the MANA Hardware spec, the queue size for DMA is 4KB minimal, and power of 2. And, the HWC queue size has to be exactly 4KB.

v5: bpf-next: netfilter: Add the capability to offload flowtable in XDP layer

This series has been tested running the xdp_flowtable_offload eBPF program on an ixgbe 10Gbps NIC (eno2) in order to XDP_REDIRECT the TCP traffic to a veth pair (veth0-veth1) based on the content of the nf_flowtable as soon as the TCP connection is in the established state:

v1: qca_spi: Make interrupt remembering atomic

The whole mechanism to remember occurred SPI interrupts is not atomic, which could lead to unexpected behavior. So fix this by using atomic bit operations instead.

v3: net-next: net: pse-pd: Add new PSE c33 features

This patch series adds new c33 features to the PSE API.

v1: iproute2: Multiple Spanning Tree (MST) Support

This series adds support for:

v7: af_packet: Handle outgoing VLAN packets without hardware offloading

The issue initially stems from libpcap. The ethertype will be overwritten as the VLAN TPID if the network interface lacks hardware VLAN offloading.

[net-next,PATCH 0/2] Series to deliver Ethernet for STM32MP25

STM32MP25 is STM32 SOC with 2 GMACs instances.

v1: netns: Make get_net_ns() handle zero refcount net

Syzkaller hit a warning: refcount_t: addition on 0; use-after-free.

v1: net: tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()

Some applications were reporting ETIMEDOUT errors on apparently good looking flows, according to packet dumps.

v1: wifi: mt76: un-embedd netdev from mt76_dev

Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at

v2: net: missing check virtio

But this code is new, it complements what is done.

v6: net: Handle new Microchip KSZ 9897 Errata

These patches implement some suggested workarounds from the Microchip KSZ 9897 Errata

[net PATCH] net: stmmac: No need to calculate speed divider when offload is disabled

commit be27b8965297 (“net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters”) introduced a problem. Only when offload is enabled, speed divider needs to be calculated.

v1: net: ipv6: prevent possible NULL deref in fib6_nh_init()

syzbot reminds us that in6_dev_get() can return NULL.

v3: net/mlx5: Reclaim max 50K pages at once

In non FLR context, at times CX-5 requests release of 8 million FW pages.

v5: net-next: virtio-net: support AF_XDP zero copy

v5: net-next: net: A lightweight zero-copy notification

Original notification mechanism needs poll + recvmmsg which is not easy for applcations to accommodate. And, it also incurs unignorable overhead including extra system calls and usage of socket optmem.

v1: net-next: net: make for_each_netdev_dump() a little more bug-proof

I find the behavior of xa_for_each_start() slightly counter-intuitive.

v1: net-next: mlx5 misc patches 2023-06-13

This patchset contains small code cleanups and enhancements from the team to the mlx5 core and Eth drivers.

v11: VMware hypercalls enhancements

VMware hypercalls invocations were all spread out across the kernel implementing same ABI as in-place asm-inline. With encrypted memory and confidential computing it became harder to maintain every changes in these hypercall implementations.

v3: nfsd/sunrpc: allow starting/stopping pooled NFS server via netlink

This is a resend of the patchset I sent a little over a week ago, with a couple of new patches that allow setting the pool-mode via netlink.

v1: mlx5-next: RDMA/mlx5: Add Qcounters req_transport_retries_exceeded/req_rnr_retries_exceeded

The req_transport_retries_exceeded counter shows the number of times requester detected transport retries exceed error.

GIT PULL: Networking for v6.10-rc4

Slim pickings this time, probably a combination of summer, DevConf.cz, and the end of first half of the year at corporations.

v6: net-next: Introduce auxiliary bus IRQs sysfs

Today, PCI PFs and VFs, which are anchored on the PCI bus, display their IRQ information in the/msi_irqs/ sysfs files.

v1: net-next: net: phy: dp83867: add cable diag support

This series adds more diagnostics of the physical medium to the DP83867.

v1: fec_main: Register net device before initializing the MDIO bus

Registration of the FEC MDIO bus triggers a probe of all devices connected to that bus.

v1: net: add RTNL_FLAG_DUMP_SPLIT_NLM_DONE to RTM_GET(RULE/ROUTE)

v1: net-next: mlxsw: Handle MTU values

The driver uses two values for maximum MTU, but neither is accurate. Add test cases to check that the exposed values are really supported.

安全增强

v1: powerpc/pseries: Whitelist dtl slub object for copying to userspace

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-* results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as shown below.

v1: pstore: platform: add missing MODULE_DESCRIPTION() macro

Add the missing invocation of the MODULE_DESCRIPTION() macro.

v1: efi/arm: Disable LPAE PAN when calling EFI runtime services

EFI runtime services are remapped into the lower 1 GiB of virtual address space at boot, so they are guaranteed to be able to co-exist with the kernel virtual mappings without the need to allocate space for them in the kernel’s vmalloc region, which is rather small.

v4: net-next: net: mana: Allow variable size indirection table

Allow variable size indirection table allocation in MANA instead of using a constant value MANA_INDIRECT_TABLE_SIZE.

v1: Add per-core RAPL energy counter support for AMD CPUs

This patchset adds a new “power_per_core” PMU alongside the existing “power” PMU, which will be responsible for collecting the new “energy-per-core” event. This patchset applies cleanly on top of v6.10-rc3 as well as latest tip/master.

v1: can: treewide: decorate flexible array members with __counted_by()

A new __counted_by() attribute was introduced in [1].

异步 IO

v1: for-next: io_uring/io-wq: make io_wq_work flags atomic

The work flags can be set/accessed from different tasks, both the originator of the request, and the io-wq workers.

Rust For Linux

v3: Refactor perf python module build

Refactor the perf python module build to instead of building C files it links libraries. To support this make static libraries for tests, ui, util and pmu-events. Doing this allows fewer functions to be stubbed out, importantly parse_events is no longer stubbed out which will improve the ability to work with heterogeneous cores.

By not building .c files for the python module and for the build of perf, this should also help build times.

v6: Rust block device driver API and null block driver

This series provides an initial Rust block layer device driver API, and a very minimal null block driver to exercise the API. The driver has only one mode of operation and cannot be configured.

v2: Rust abstractions for Device & Firmware

as agreed in [1] this is the separate series for the device and firmware abstractions to unblock the inclusion of Fujita’s PHY driver.

Originally, those patches were part of the patch series [2][3].

v2: Tracepoints and static branch in Rust

This patch series adds support for calling tracepoints declared in C from Rust.

BPF

v1: bpf, devmap: Add .map_alloc_check

Use the .map_allock_check callback to perform allocation checks before allocating memory for the devmap.

v1: perf trace: Augment enum syscall arguments with BTF

In this patch, BTF is used to turn enum value to the corresponding enum variable name. There is only one system call that uses enum value as its argument, that is `landlock_add_rule()`.

[PATCH RESEND bpf-next v3 0/5] bpf: make trusted args nullable

Current verifier checks for the arg to be nullable after checking for certain pointer types.

v4: bpf-next: Regular expression support for test output matching

This version fixes v3 review comments from Eduard.

v6: bpf-next: bpf: support resilient split BTF

Split BPF Type Format (BTF) provides huge advantages in that kernel modules only have to provide type information for types that they do not share with the core kernel;

v4: perf trace: BTF-based enum pretty printing

v3: bpf-next: bpf: Track delta between “linked” registers.

v12: net-next: Device Memory TCP

v5: bpf-next: bpf: Support dumping kfunc prototypes from BTF

This patchset enables both detecting as well as dumping compilable prototypes for kfuncs.

v6: bpf-next: Support kCFI + BPF on arm64

Adds CFI checks to BPF dispatchers on aarch64.

v2: bpf-next: use network helpers, part 7

周边技术动态

Qemu

v4: Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions

Sorry for the quick update the version, this version fixes the cross-page probe checking bug that I forgot to apply to the v3 version.

v1: Support RISC-V CSR read/write in Qtest environment

These patches add functionality for unit testing RISC-V-specific registers. The first patch adds a Qtest backend, and the second implements a simple test.

v1: Implements RISC-V WorldGuard extension v0.4

This patchset implements Smwg/Smwgd/Sswg CPU extension and wgChecker device defined in WorldGuard spec v0.4.

v7: Support RISC-V IOPMP

This series implements basic functions of IOPMP specification v0.9.1 rapid-k model.

U-Boot

v2: clk: sophgo: milkv_duo: Add and enable clock controller driver

This series of patches introduces the clock controller driver for the Sophgo CV1800B SoC, updates the device tree sources to use the new clock controller, and enables the clock controller in the configuration for the Milk-V Duo board.



Read Album:

Read Related:

Read Latest: