泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

儿童Linux系统,可打字编程学数理化
请稍侯

RISC-V Linux 内核及周边技术动态第 54 期

呀呀呀 创作于 2023/07/26

时间:20230721
编辑:晓依
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v1: bpf-next: bpf, riscv: use BPF prog pack allocator in BPF JIT

BPF programs currently consume a page each on RISCV. For systems with many BPF programs, this adds significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system.

v4: riscv: entry: set a0 = -ENOSYS only when syscall != -1

When we test seccomp with 6.4 kernel, we found errno has wrong value. If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will get ENOSYS instead. We got same result with commit 9c2598d43510 (“riscv: entry: Save a0 prior syscall_enter_from_user_mode()”).

v2: Add SiFive Private L2 cache and PMU driver

This patch series adds the SiFive Private L2 cache controller driver and Performance Monitoring Unit (PMU) driver.

v1: riscv: add SBI SUSP extension support

RISC-V SBI spec 2.0 [1] introduces System Suspend Extension which can be used to suspend the platform via SBI firmware.

v1: Linux RISC-V IOMMU Support

The RISC-V IOMMU specification is now ratified as-per the RISC-V international process [1]. The latest frozen specifcation can be found at: https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf

GIT PULL: StarFive clock driver additions for v6.6

Please pull some clock driver additions for StarFive. I’ve had these commits, other than a rebase to pick up R-b tags from Emil, out for LKP to have a look at for a few days and they’ve gotten a clean bill of health. Some of the dt-binding stuff “only” has a review from me, but since I am a dt-binding maintainer that’s fine, although maybe not common knowledge yet.

v2: gpio: sifive: Module support

With the call to of_irq_count() removed, the SiFive GPIO driver can be built as a module. This helps to minimize the size of a multiplatform kernel, and is required by some downstream distributions (Android GKI).

v1: Risc-V Kvm Smstateen

This series adds support to detect the Smstateen extension for both, the host and the guest vcpu. It also adds senvcfg and sstateen0 to the ONE_REG interface and the vcpu context save/restore.

v6: Linux RISC-V AIA Support

The RISC-V AIA specification is now frozen as-per the RISC-V international process. The latest frozen specifcation can be found at: https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf

v1: Refactoring Microchip PolarFire PCIe driver

This patchset final purpose is add PCIe driver for StarFive JH7110 SoC. JH7110 using PLDA XpressRICH PCIe IP. Microchip PolarFire Using the same IP and have commit their codes, which are mixed with PLDA controller codes and Microchip platform codes.

v5: Add initialization of clock for StarFive JH7110 SoC

This patchset adds initial rudimentary support for the StarFive Quad SPI controller driver. And this driver will be used in StarFive’s VisionFive 2 board. In 6.4, the QSPI_AHB and QSPI_APB clocks changed from the default ON state to the default OFF state, so these clocks need to be enabled in the driver.At the same time, dts patch is added to this series.

v3: riscv: Reduce ARCH_KMALLOC_MINALIGN to 8

Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus it brings some bad effects to coherent platforms:

Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and kmalloc-8 slab caches don’t exist any more, they are replaced with either kmalloc-128 or kmalloc-64.

v1: asm-generic: ticket-lock: Optimize arch_spin_value_unlocked

Using arch_spinlock_is_locked would cause another unnecessary memory access to the contended value. Although it won’t cause a significant performance gap in most architectures, the arch_spin_value_unlocked argument contains enough information. Thus, remove unnecessary atomic_read in arch_spin_value_unlocked().

v2: riscv: entry: set a0 prior to syscall_enter_from_user_mode

When we test seccomp with 6.4 kernel, we found errno has wrong value. If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will get ENOSYS instead. We got same result with 9c2598d43510 (“riscv: entry: Save a0 prior syscall_enter_from_user_mode()”).

v1: riscv: Move the “Call Trace” to dump_backrace().

It would be appropriate to show “Call Trace” within the dump_backtrace function to ensure that some kernel dumps include this information.

v2: usb: Explicitly include correct DT includes

The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they “temporarily” include each other. They also include platform_device.h and of.h. As a result, there’s a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes.

v1: irqchip/sifive-plic: Avoid clearing the per-hart enable bits

Writes to the PLIC completion register are ignored if the enable bit for that (interrupt, hart) combination is cleared. This leaves the interrupt in a claimed state, preventing it from being triggered again.

v11: KVM: guest_memfd() and per-page attributes

This is the next iteration of implementing fd-based (instead of vma-based) memory for KVM guests. If you want the full background of why we are doing this, please go read the v10 cover letter[1].

The biggest change from v10 is to implement the backing storage in KVM itself, and expose it via a KVM ioctl() instead of a “generic” sycall. See link[2] for details on why we pivoted to a KVM-specific approach.

v1: riscv: dts: starfive: jh71x0: Add temperature sensor nodes and thermal-zones

These patches add temperature sensor nodes and thermal-zones for the StarFive JH71X0 SoC. I have tested them on the BeagleV Starlight board and StarFive VisionFive 1 / 2 board. Thanks.

v1: clk: Explicitly include correct DT includes

The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they “temporarily” include each other. They also include platform_device.h and of.h. As a result, there’s a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes.

v1: riscv: kernel: insert space before the open parenthesis ‘(‘

Fix below checkpatch error:

/riscv/kernel/smp.c:93:ERROR: space required before the open parenthesis ‘(‘

v7: Add PLL clocks driver and syscon for StarFive JH7110 SoC

This patch serises are to add PLL clocks driver and providers by writing and reading syscon registers for the StarFive JH7110 RISC-V SoC. And add documentation and nodes to describe StarFive System Controller(syscon) Registers. This patch serises are based on Linux 6.4.

v2: riscv: support PREEMPT_DYNAMIC with static keys

Currently, each architecture can support PREEMPT_DYNAMIC through either static calls or static keys. To support PREEMPT_DYNAMIC on riscv, we face three choices:

only add static calls support to riscv As Mark pointed out in commit 99cf983cc8bc (“sched/preempt: Add PREEMPT_DYNAMIC using static keys”), static keys “…should have slightly lower overhead than non-inline static calls, as this effectively inlines each trampoline into the start of its callee. This may avoid redundant work, and may integrate better with CFI schemes.” So even we add static calls(without inline static calls) to riscv, static keys is still a better choice.

v1: riscv: Add HAVE_IOREMAP_PROT support

Add pte_pgprot macro, then riscv could have HAVE_IOREMAP_PROT, which will enable generic_access_phys() code, it is useful for debug, eg, gdb.

Because generic_access_phys() would call ioremap_prot()-> pgprot_nx() to disable excutable attribute, add definition of pgprot_nx() for riscv.

v1: Add support for Allwinner D1 CAN controllers

This patch series adds support for the Allwinner D1 CAN controllers. It requires adding a new device tree compatible and driver support to work around some hardware quirks.

v9: Add support for Allwinner GPADC on D1/T113s/R329/T507 SoCs

This series adds support for general purpose ADC (GPADC) on new Allwinner’s SoCs, such as D1, T113s, T507 and R329. The implemented driver provides basic functionality for getting ADC channels data.

v1: bpf: riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework

Commit 6724a76cff85 (“riscv: ftrace: Reduce the detour code size to half”) optimizes the detour code size of kernel functions to half with T0 register and the upcoming DYNAMIC_FTRACE_WITH_DIRECT_CALLS of riscv is based on this optimization, we need to adapt riscv bpf trampoline based on this. One thing to do is to reduce detour code size of bpf programs, and the second is to deal with the return address after the execution of bpf trampoline. Meanwhile, add more comments and rename some variables to make more sense. The related tests have passed.

v1: pwm: Constistenly name pwm_chip variables “chip”

The first offenders I found were the core and the atmel-hlcdc driver. After I found these I optimistically assumed these were the only ones with the unusual names and send patches for these out individually before checking systematically.

v1: soc: microchip: Explicitly include correct DT includes

The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they “temporarily” include each other. They also include platform_device.h and of.h. As a result, there’s a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes.

v1: reset: Explicitly include correct DT includes

The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they “temporarily” include each other. They also include platform_device.h and of.h. As a result, there’s a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes.

v6: RISC-V: mm: Make SV48 the default address space

Make sv48 the default address space for mmap as some applications currently depend on this assumption. Users can now select a desired address space using a non-zero hint address to mmap. Previously, requesting the default address space from mmap by passing zero as the hint address would result in using the largest address space possible. Some applications depend on empty bits in the virtual address space, like Go and Java, so this patch provides more flexibility for application developers.

v1: Add ethernet nodes for StarFive JH7110 SoC

This series adds ethernet nodes for StarFive JH7110 RISC-V SoC, and has been tested on StarFive VisionFive-2 v1.2A and v1.3B SBC boards.

The first patch adds ethernet nodes for jh7110 SoC, the second patch adds ethernet nodes for visionfive 2 SBCs.

v4: RESEND: riscv: Introduce KASLR

The following KASLR implementation allows to randomize the kernel mapping:

  • virtually: we expect the bootloader to provide a seed in the device-tree
  • physically: only implemented in the EFI stub, it relies on the firmware to provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation hence the patch 3 factorizes KASLR related functions for riscv to take advantage.

v4: riscv: Introduce KASLR

The following KASLR implementation allows to randomize the kernel mapping:

  • virtually: we expect the bootloader to provide a seed in the device-tree
  • physically: only implemented in the EFI stub, it relies on the firmware to provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation hence the patch 3 factorizes KASLR related functions for riscv to take advantage.

进程调度

v1: sched/debug: Print tgid in sched_show_task()

Multiple blocked tasks are printed when the system hangs. They may have the same parent pid, but belong to different task groups.

Printing tgid lets users better know whether these tasks are from the same task group or not.

v9: sched/fair: Scan cluster before scanning LLC in wake-up path

This is the follow-up work to support cluster scheduler. Previously we have added cluster level in the scheduler for both ARM64[1] and X86[2] to support load balance between clusters to bring more memory bandwidth and decrease cache contention. This patchset, on the other hand, takes care of wake-up path by giving CPUs within the same cluster a try before scanning the whole LLC to benefit those tasks communicating with each other.

v2: sched: Optimize in_task() and in_interrupt() a bit

Except on x86, preempt_count is always accessed with READ_ONCE. Repeated invocations in macros like irq_count() produce repeated loads. These redundant instructions appear in various fast paths. In the one shown below, for example, irq_count() is evaluated during kernel entry if !tick_nohz_full_cpu(smp_processor_id()).

v2: sched/core: Use empty mask to reset cpumasks in sched_setaffinity()

Since commit 8f9ea86fdf99 (“sched: Always preserve the user requested cpumask”), user provided CPU affinity via sched_setaffinity(2) is perserved even if the task is being moved to a different cpuset. However, that affinity is also being inherited by any subsequently created child processes which may not want or be aware of that affinity.

v1: sched/fair: Add SMT4 group_smt_balance handling

For SMT4, any group with more than 2 tasks will be marked as group_smt_balance. Retain the behaviour of group_has_spare by marking the busiest group as the group which has the least number of idle_cpus.

GIT PULL: sched/urgent for v6.5-rc2

please pull two urgent scheduler fixes for 6.5.

v1: sched: Rename DIE domain

Thomas just tripped over the x86 topology setup creating a ‘DIE’ domain for the package mask :-)

Since these names are SCHED_DEBUG only, rename them. I don’t think anybody should be relying on this, but who knows.

v2: sched: Implement shared runqueue in CFS

This is v2 of the shared wakequeue (now called shared runqueue) patchset. The following are changes from the RFC v1 patchset (https://lore.kernel.org/lkml/20230613052004.2836135-1-void@manifault.com/).

v1: net: sched: Replace strlcpy with strscpy

strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy().

[PATCH AUTOSEL 4.14] sched/fair: Don’t balance task to its current running CPU

The new_dst_cpu is chosen from the env->dst_grpmask. Currently it contains CPUs in sched_group_span() and if we have overlapped groups it’s possible to run into this case. This patch makes env->dst_grpmask of group_balance_mask() which exclude any CPUs from the busiest group and solve the issue. For balancing in a domain with no overlapped groups the behaviour keeps same as before.

内存管理

v2: context_tracking,x86: Defer some IPIs until a user->kernel transition

The heart of this series is the thought that while we cannot remove NOHZ_FULL CPUs from the list of CPUs targeted by these IPIs, they may not have to execute the callbacks immediately. Anything that only affects kernelspace can wait until the next user->kernel transition, providing it can be executed “early enough” in the entry code.

v3: Convert several functions in page_io.c to use a folio

This patch series converts several functions in page_io.c to use a folio, which can remove several implicit calls to compound_head().

v3: Optimize large folio interaction with deferred split

[Sending v3 to replace yesterday’s v2 after Yu Zhou’s feedback]

This is v3 of a small series in support of my work to enable the use of large folios for anonymous memory (known as “FLEXIBLE_THP” or “LARGE_ANON_FOLIO”) [1]. It first makes it possible to add large, non-pmd-mappable folios to the deferred split queue. Then it modifies zap_pte_range() to batch-remove spans of physically contiguous pages from the rmap, which means that in the common case, we elide the need to ever put the folio on the deferred split queue, thus reducing lock contention and improving performance.

v4: mm/slub: Optimize slub memory usage

In the current implementation of the slub memory allocator, the slab order selection process follows these criteria:

1) Determine the minimum order required to serve the minimum number of objects (min_objects). This calculation is based on the formula (order = min_objects * object_size / PAGE_SIZE). 2) If the minimum order is greater than the maximum allowed order (slub_max_order), set slub_max_order as the order for this slab. 3) If the minimum order is less than the slub_max_order, iterate through a loop from minimum order to slub_max_order and check if the condition (rem <= slab_size / fract_leftover) holds true. Here, slab_size is calculated as (PAGE_SIZE « order), rem is (slab_size % object_size), and fract_leftover can have values of 16, 8, or 4. If the condition is true, select that order for the slab.

v3: Invalidate secondary IOMMU TLB on permission upgrade

The main change is to move secondary TLB invalidation mmu notifier callbacks into the architecture specific TLB flushing functions. This makes secondary TLB invalidation mostly match CPU invalidation while still allowing efficient range based invalidations based on the existing TLB batching code.

v2: mm: use memmap_on_memory semantics for dax/kmem

The dax/kmem driver can potentially hot-add large amounts of memory originating from CXL memory expanders, or NVDIMMs, or other ‘device memories’. There is a chance there isn’t enough regular system memory available to fit the memmap for this new memory. It’s therefore desirable, if all other conditions are met, for the kmem managed memory to place its memmap on the newly added memory itself.

v1: memory recharging for offline memcgs

This patch series implements the proposal in LSF/MM/BPF 2023 conference for reducing offline/zombie memcgs by memory recharging [1]. The main

v1: shmem: add support for user extended attributes

User extended attributes are not enabled in tmpfs because the size of the value is not limited and the memory allocated for it is not counted against any limit. Malicious non-privileged user can exhaust kernel memory by creating user.* extended attribute with very large value.

v1: mm,memblock: reset memblock.reserved to system init state to prevent UAF

The memblock_discard function frees the memblock.reserved.regions array, which is good.

However, if a subsequent memblock_free (or memblock_phys_free) comes in later, from for example ima_free_kexec_buffer, that will result in a use after free bug in memblock_isolate_range.

v2: mm/hugetlb: get rid of page_hstate()

Converts the last page_hstate() user to use folio_hstate() so page_hstate() can be safely removed.

v1: mm: memcg: use rstat for non-hierarchical stats

Currently, memcg uses rstat to maintain hierarchical stats. The rstat framework keeps track of which cgroups have updates on which cpus.

For non-hierarchical stats, as memcg moved to rstat, they are no longer readily available as counters. Instead, the percpu counters for a given stat need to be summed to get the non-hierarchical stat value. This causes a performance regression when reading non-hierarchical stats on kernels where memcg moved to using rstat. This is especially visible when reading memory.stat on cgroup v1. There are also some code paths internal to the kernel that read such non-hierarchical stats.

v2: mm: convert to vma_is_initial_heap/stack()

Add vma_is_initial_stack() and vma_is_initial_heap() helper and use them to simplify code.

v1: mm: hugetlb_vmemmap: use PageCompound() instead of PageReserved()

The ckeck of PageReserved() is easy to be broken in the future, PageCompound() is more stable to check if the page should be split.

v1: udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd

This patch series attempts to solve the coherency problem seen when a hole is punched in the region(s) of the mapping (associated with the memfd) that overlaps with pages registered with a udmabuf fd.

v2: udmabuf: Add back support for mapping hugetlb pages (v2)

The first patch ensures that the mappings needed for handling mmap operation would be managed by using the pfn instead of struct page. The second patch restores support for mapping hugetlb pages where subpages of a hugepage are not directly used anymore (main reason for revert) and instead the hugetlb pages and the relevant offsets are used to populate the scatterlist for dma-buf export and for mmap operation.

v3: mm: kfence: allocate kfence_metadata at runtime

kfence_metadata is currently a static array. For the purpose of allocating scalable __kfence_pool, we first change it to runtime allocation of metadata. Since the size of an object of kfence_metadata is 1160 bytes, we can save at least 72 pages (with default 256 objects) without enabling kfence.

v1: add page_ext_data to get client data in page_ext

Current client get data from page_ext by adding offset which is auto generated in page_ext core and expose the data layout design insdie page_ext core. This series adds a page_ext_data to hide offset from client. Thanks!

v1: mm/damon/core-test: Initialise context before test in damon_test_set_attrs()

Running kunit test for 6.5-rc1 hits one bug:

    ok 10 damon_test_update_monitoring_result

v4: Add support for memmap on memory feature on ppc64

This patch series update memmap on memory feature to fall back to memmap allocation outside the memory block if the alignment rules are not met. This makes the feature more useful on architectures like ppc64 where alignment rules are different with 64K page size.

v5: Add support for DAX vmemmap optimization for ppc64

This patch series implements changes required to support DAX vmemmap optimization for ppc64. The vmemmap optimization is only enabled with radix MMU translation and 1GB PUD mapping with 64K page size. The patch series also split hugetlb vmemmap optimization as a separate Kconfig variable so that architectures can enable DAX vmemmap optimization without enabling hugetlb vmemmap optimization. This should enable architectures like arm64 to enable DAX vmemmap optimization while they can’t enable hugetlb vmemmap optimization. More details of the same are in patch “mm/vmemmap optimization: Split hugetlb and devdax vmemmap optimization”

v1: 5.15.y: mm/damon/ops-common: atomically test and clear young on ptes and pmds

commit c11d34fa139e4b0fb4249a30f37b178353533fa1 upstream.

It is racy to non-atomically read a pte, then clear the young bit, then write it back as this could discard dirty information. Further, it is bad practice to directly set a pte entry within a table. Instead clearing young must go through the arch-provided helper, ptep_test_and_clear_young() to ensure it is modified atomically and to give the arch code visibility and allow it to check (and potentially modify) the operation.

文件系统

v1: Various Rust bindings for files

This contains bindings for various file related things that binder needs to use.

I would especially like feedback on the SAFETY comments. Particularly, the safety comments in patch 4 and 5 are non-trivial. For example:

v1: vboxsf: Use flexible arrays for trailing string member

The declaration of struct shfl_string used trailing fake flexible arrays for the string member. This was tripping FORTIFY_SOURCE since commit df8fc4e934c1 (“kbuild: Enable -fstrict-flex-arrays=3”). Replace the utf8 and utf16 members with actual flexible arrays, drop the unused ucs2 member, and retriain a 2 byte padding to keep the structure size the same.

v1: fs/nls: make load_nls() take a const parameter

load_nls() take a char * parameter, use it to find nls module in list or construct the module name to load it.

This change make load_nls() take a const parameter, so we don’t need do some cast like this:

    ses->local_nls = load_nls((char *)ctx->local_nls->charset);

Also remove the cast in cifs code.

v1: fstests: add helper to canonicalize devices used to enable persistent disks

The filesystem configuration file does not allow you to use symlinks to devices given the existing sanity checks verify that the target end device matches the source.

Using a symlink is desirable if you want to enable persistent tests across reboots. For example you may want to use /dev/disk/by-id/nvme-eui.* so to ensure that the same drives are used even after reboot. This is very useful if you are testing for example with a virtualized environment and are using PCIe passthrough with other qemu NVMe drives with one or many NVMe drives.

v3: Support negative dentries on case-insensitive ext4 and f2fs

V3 applies the fixes suggested by Eric Biggers (thank you for your review!). Changelog inlined in the patches.

Retested with xfstests for ext4 and f2fs.

cover letter from v1.

This patchset enables negative dentries for case-insensitive directories in ext4/f2fs. It solves the corner cases for this feature, including those already tested by fstests (generic/556). It also solves an existing bug with the existing implementation where old negative dentries are left behind after a directory conversion to case-insensitive.

v1: nfsd: inherit required unset default acls from effective set

A well-formed NFSv4 ACL will always contain OWNER@/GROUP@/EVERYONE@ ACEs, but there is no requirement for inheritable entries for those entities. POSIX ACLs must always have owner/group/other entries, even for a default ACL.

v1: fs: export emergency_sync

emergency_sync forces a filesystem sync in emergency situations. Export this function so it can be used by modules.

v4: io_uring getdents

This series introduce getdents64 to io_uring, the code logic is similar with the snychronized version’s. It first try nowait issue, and offload it to io-wq threads if the first try fails.

v2: xarray: Document necessary flag in alloc functions

Adds a new line to the docstrings of functions wrapping __xa_alloc() and __xa_alloc_cyclic(), informing about the necessity of flag XA_FLAGS_ALLOC being set previously.

The documentation so far says that functions wrapping __xa_alloc() and __xa_alloc_cyclic() are supposed to return either -ENOMEM or -EBUSY in case of an error. If the xarray has been initialized without the flag XA_FLAGS_ALLOC, however, they fail with a different, undocumented error code.

GIT PULL: Create large folios in iomap buffered write path

The following changes since commit 5b8d6e8539498e8b2fa67fbcce3fe87834d44a7a:

Merge tag ‘xtensa-20230716’ of https://github.com/jcmvbkbc/linux-xtensa (2023-07-16 14:12:49 -0700)

v1: fs/filesystems.c: ERROR: “(foo*)” should be “(foo *)”

Fix five occurrences of the checkpatch.pl error: ERROR: “(foo*)” should be “(foo *)”

v2: fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.

When running UnixBench/Shell Scripts, we observed high false sharing for accessing i_mmap against i_mmap_rwsem.

v1: fs: inode: return proper errno on bmap()

It better returns -EOPNOTSUPP instead of -EINVAL which has meaning of the argument is an inappropriate value. It doesn’t make sense in the case of that a file system doesn’t support bmap operation.

-EINVAL could make confusion in the userspace perspective.

v1: exfat: release s_lock before calling dir_emit()

WARNING: possible circular locking dependency detected 6.4.0-next-20230707-syzkaller #0 Not tainted syz-executor330/5073 is trying to acquire lock: ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock_killable include/linux/mmap_lock.h:151 [inline] ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: get_mmap_lock_carefully mm/memory.c:5293 [inline] ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: lock_mm_and_find_vma+0x369/0x510 mm/memory.c:5344 but task is already holding lock: ffff888019f760e0 (&sbi->s_lock){+.+.}-{3:3}, at: exfat_iterate+0x117/0xb50 fs/exfat/dir.c:232

[fstests PATCH] generic: add a test for multigrain timestamps

Ensure that the mtime and ctime apparently change, even when there are multiple writes in quick succession. Older kernels didn’t do this, but there are patches in flight that should help ensure it in the future.

v5: fs: implement multigrain timestamps

The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes.

v2: procfs: block chmod on /proc/thread-self/comm

Due to an oversight in commit 1b3044e39a89 (“procfs: fix pthread cross-thread naming if !PR_DUMPABLE”) in switching from REG to NOD, chmod operations on /proc/thread-self/comm were no longer blocked as they are on almost all other procfs files.

v4: RESEND: shmem: Add user and group quota support for tmpfs

This is a resend of the quota support for tmpfs. This has been rebased on today Linus’s TOT. These patches conflicted with Luis Chamberlain’s series to include ‘noswap’ mount option to tmpfs, there was no code change since the previous version, other than moving the implementation of quota options ‘after’ ‘noswap’.

v1: exfat: check if filename entries exceeds max filename length

exfat_extract_uni_name copies characters from a given file name entry into the ‘uniname’ variable. This variable is actually defined on the stack of the exfat_readdir() function. According to the definition of the ‘exfat_uni_name’ type, the file name should be limited 255 characters (+ null teminator space), but the exfat_get_uniname_from_ext_entry() function can write more characters because there is no check if filename entries exceeds max filename length. This patch add the check not to copy filename characters when exceeding max filename length.

v1: fs: proc: Add error checking for d_hash_and_lookup()

In case of failure, d_hash_and_lookup() returns NULL or an error pointer. The proc_fill_cache() needs to add the handling of the error pointer returned by d_hash_and_lookup().

v25: Implement IOCTL to get and optionally clear info about PTEs

Changes in v25:

  • Do proper filtering on hole as well (hole got missed earlier)

v1: eventfd: simplify signal helpers

This simplifies the eventfd_signal() and eventfd_signal_mask() helpers by removing the count argument which is effectively unused.

v1: More filesystem folio conversions for 6.6

Remove the only spots in affs which actually use a struct page; there are a few places where one is mentioned, but it’s part of the interface.

网络设备

v3: net-next: vsock/virtio/vhost: MSG_ZEROCOPY preparations

this patchset is first of three parts of another big patchset for MSG_ZEROCOPY flag support: https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/

GIT PULL: Networking for v6.5-rc3

The following changes since commit b1983d427a53911ea71ba621d4bf994ae22b1536:

Merge tag ‘net-6.5-rc2’ of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2023-07-13 14:21:22 -0700)

**[v5: bpf-next: Support defragmenting IPv(46) packets in BPF](http://lore.kernel.org/netdev/cover.1689884827.git.dxu@dxuuu.xyz/)**

In the context of a middlebox, fragmented packets are tricky to handle. The full 5-tuple of a packet is often only available in the first fragment which makes enforcing consistent policy difficult. There are really only two stateless options, neither of which are very nice:

Enforce policy on first fragment and accept all subsequent fragments.This works but may let in certain attacks or allow data exfiltration.

v1: iproute2: bridge/mdb.c: include limits.h

Include limits.h in bridge/mdb.c to fix this issue. This change is based on one in Alpine Linux, but the author there had no plans to submit: https://git.alpinelinux.org/aports/commit/main/iproute2/include.patch?id=bd46efb8a8da54948639cebcfa5b37bd608f1069

v4: net-next: ionic: add FLR support

Add support for handing and recovering from a PCI FLR event. This patchset first moves some code around to make it usable from multiple paths, then adds the PCI error handler callbacks for reset_prepare and reset_done.

v1: net-next: page_pool: add a lockdep check for recycling in hardirq

Page pool use in hardirq is prohibited, add debug checks to catch misuses. IIRC we previously discussed using DEBUG_NET_WARN_ON_ONCE() for this, but there were concerns that people will have DEBUG_NET enabled in perf testing. I don’t think anyone enables lockdep in perf testing, so use lockdep to avoid pushback and arguing :)

v3: bpf-next: bpf, xdp: Add tracepoint to xdp attaching failure

This series introduces a new tracepoint in bpf_xdp_link_attach(). By this tracepoint, error message will be captured when error happens in dev_xdp_attach(), e.g. invalid attaching flags.

v6: bpf-next: Add SO_REUSEPORT support for TC bpf_sk_assign

We want to replace iptables TPROXY with a BPF program at TC ingress. To make this work in all cases we need to assign a SO_REUSEPORT socket to an skb, which is currently prohibited. This series adds support for such sockets to bpf_sk_assing.

v1: net-next: net: dsa: microchip: provide Wake on LAN support

This series of patches provides Wake on LAN support for the KSZ9477 family of switches. It was tested on KSZ8565 Switch with PME pin attached to an external PMIC.

v2: net-next: devlink: introduce dump selector attr and use it for per-instance dumps

For SFs, one devlink instance per SF is created. There might be thousands of these on a single host. When a user needs to know port handle for specific SF, he needs to dump all devlink ports on the host which does not scale good.

v5: Add motorcomm phy pad-driver-strength-cfg support

The motorcomm phy (YT8531) supports the ability to adjust the drive strength of the rx_clk/rx_data, and the default strength may not be suitable for all boards. So add configurable options to better match the boards.(e.g. StarFive VisionFive 2)

v1: net-next: genetlink: add explicit ordering break check for split ops

Currently, if cmd in the split ops array is of lower value than the previous one, genl_validate_ops() continues to do the checks as if the values are equal. This may result in non-obvious WARN_ON() hit in these check.

v2: net: vxlan: calculate correct header length for GPE

VXLAN-GPE does not add an extra inner Ethernet header. Take that into account when calculating header length.

This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is cached.

v4: net-next: virtio-net: don’t busy poll for cvq command

The code used to busy poll for cvq command which turns out to have several side effects:

1) infinite poll for buggy devices 2) bad interaction with scheduler

So this series tries to use cond_resched() in the waiting loop. Before doing this we need first make sure the cvq command is not executed in atomic environment, so we need first convert rx mode handling to a workqueue.

v2: net-next: eth: bnxt: handle invalid Tx completions more gracefully

bnxt trusts the events generated by the device which may lead to kernel crashes. These are extremely rare but they do happen. For a while I thought crashing may be intentional, because device reporting invalid completions should never happen, and having a core dump could be useful if it does. But in practice I haven’t found any clues in the core dumps, and panic_on_warn exists.

v1: net-next: net: Use sockaddr_storage for getsockopt(SO_PEERNAME).

Commit df8fc4e934c1 (“kbuild: Enable -fstrict-flex-arrays=3”) started applying strict rules to standard string functions.

v1: net: phy: prevent stale pointer dereference in phy_init()

mdio_bus_init() and phy_driver_register() both have error paths, and if those are ever hit, ethtool will have a stale pointer to the phy_ethtool_phy_ops stub structure, which references memory from a module that failed to load (phylib).

v1: net: tcp: add missing annotations

This series was inspired by one syzbot (KCSAN) report.

do_tcp_getsockopt() does not lock the socket, we need to annotate most of the reads there (and other places as well).

v8: net/tcp: Add TCP-AO support

This is version 8 of TCP-AO support. I base it on master and there weren’t any conflicts on my tentative merge to linux-next.

The good news is that all pre-required patches have merged to Torvald’s/master. Thanks to Herbert, crypto clone-tfm just works on master for all TCP-AO supported algorithms.

安全增强

v1: kunit: Add test attributes API

This patch series adds a test attributes framework to KUnit.

There has been interest in filtering out “slow” KUnit tests. Most notably, a new config, CONFIG_MEMCPY_SLOW_KUNIT_TEST, has been added to exclude a particularly slow memcpy test (https://lore.kernel.org/all/20230118200653.give.574-kees@kernel.org/).

v1: HotBPF: Prevent Kernel Heap-based Exploitation

Request for Comments, a hot eBPF patch to prevent kernel heap exploitation.

SLUB exploitation poses a significant threat to kernel security. The exploitation takes advantage of the fact that kernel objects share kmalloc slub caches. This sharing setting allows to create overlapping between vulnerable objects that introduce corruption, and other objects that contains sensitive data. To mitigate this, we introduce HotBPF.

v1: next: fs: omfs: Use flexible-array member in struct omfs_extent

Memory for ‘struct omfs_extent’ and a ‘e_extent_count’ number of extent entries is indirectly allocated through ‘bh->b_data’, which is a pointer to data within the page. This implies that the member ‘e_entry’ (which is the start of extent entries) functions more like an array than a single object of type ‘struct omfs_extent_entry’.

v5: Randomized slab caches for kmalloc()

When exploiting memory vulnerabilities, “heap spraying” is a common technique targeting those related to dynamic memory allocation (i.e. the “heap”), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It’s usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc.

v2: igc: Ignore AER reset when device is suspended

The issue is that the PTM requests are sending before driver resumes the device. Since the issue can also be observed on Windows, it’s quite likely a firmware/hardware limitation.

So avoid resetting the device if it’s not resumed. Once the device is fully resumed, the device can work normally.

v1: tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure

For backward compatibility, older tooling expects to see the kernel_stack event with a “caller” field that is a fixed size array of 8 addresses. The code now supports more than 8 with an added “size” field that states the real number of entries. But the “caller” field still just looks like a fixed size to user space.

v2: ACPI: APEI: Use ERST timeout for slow devices

Slow devices such as flash may not meet the default 1ms timeout value, so use the ERST max execution time value that they provide as the timeout if it is larger.

v3: pstore: Replace crypto API compression with zlib calls

The pstore layer implements support for compression of kernel log output, using a variety of compression algorithms provided by the [deprecated] crypto API ‘comp’ interface.

This appears to have been somebody’s pet project rather than a solution to a real problem: the original deflate compression is reasonably fast, compresses well and is comparatively small in terms of code footprint, and so the flexibility that the crypto API integration provides does little more than complicate the code for no reason.

v1: libxfs: Redefine 1-element arrays as flexible arrays

To allow for code bases that include libxfs (e.g. the Linux kernel) and build with strict flexible array handling (-fstrict-flex-arrays=3), FORTIFY_SOURCE, and/or UBSAN bounds checking, redefine the remaining 1-element trailing arrays as true flexible arrays, but without changing their structure sizes. This is done via a union to retain a single element (named “legacy_padding”). As not all distro headers may yet have the UAPI stddef.h __DECLARE_FLEX_ARRAY macro, include it explicitly in platform_defs.h.in.

v1: wifi: mwifiex: Replace strlcpy with strscpy

strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy().

异步 IO

v1: io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq

io-wq assumes that an issue is blocking, but it may not be if the request type has asked for a non-blocking attempt. If we get -EAGAIN for that case, then we need to treat it as a final result and not retry or arm poll for it.

v1: io_uring: Use io_schedule* in cqring wait

I observed poor performance of io_uring compared to synchronous IO. That turns out to be caused by deeper CPU idle states entered with io_uring, due to io_uring using plain schedule(), whereas synchronous IO uses io_schedule().

v1: io_uring: don’t audit the capability check in io_uring_create()

The check being unconditional may lead to unwanted denials reported by LSMs when a process has the capability granted by DAC, but denied by an LSM. In the case of SELinux such denials are a problem, since they can’t be effectively filtered out via the policy and when not silenced, they produce noise that may hide a true problem or an attack.

v1: io_uring: Redefined the meaning of io_alloc_async_data’s return value

Usually, successful memory allocation returns true and failure returns false, which is more in line with the intuitive perception of most people. So it is necessary to redefine the meaning of io_alloc_async_data’s return value.

Rust For Linux

v1: rust: kunit: Support KUnit tests with a user-space like syntax

This series was originally written by José Expósito, and can be found here: https://github.com/Rust-for-Linux/linux/pull/950

Add support for writing KUnit tests in Rust. While Rust doctests are already converted to KUnit tests and run, they’re really better suited for examples, rather than as first-class unit tests.

v1: rust: doctests: Use tabs for indentation in generated C code

While Rust uses 4 spaces for indentation, we should use tabs in the generated C code. This does result in some scary-looking tab characters in a .rs file, but they’re in a string literal, so shouldn’t make anything complain too much.

v2: Quality of life improvements for pin-init

This patch series adds several improvements to the pin-init api:

  • a derive macro for the Zeroable trait,
  • makes hygiene of fields in initializers behave like normal struct initializers would behave,
  • prevent stackoverflow without optimizations
  • add ..Zeroable::zeroed() syntax to zero missing fields.
  • support arbitrary paths in initializer macros

v1: kbuild: rust: avoid creating temporary files

rustc outputs by default the temporary files (i.e. the ones saved by -Csave-temps, such as *.rcgu* files) in the current working directory when -o and --out-dir are not given (even if --emit=x=path is given, i.e. it does not use those for temporaries).

v1: rust: init: Implement Zeroable::zeroed()

By analogy to Default::default(), this just returns the zeroed representation of the type directly. init::zeroed() is the version that returns an initializer.

v2: Rust abstractions for Crypto API

This patchset adds minimum Rust abstractions for Crypto API; message digest and random number generator.

v1: rust: add improved version of ForeignOwnable::borrow_mut

Previously, the ForeignOwnable trait had a method called borrow_mut that was intended to provide mutable access to the inner value. However, the method accidentally made it possible to change the address of the object being modified, which usually isn’t what we want. (And when we want that, it can be done by calling from_foreign and into_foreign, like how the old borrow_mut was implemented.)

v2: Rust abstractions for network device drivers

This patchset adds minimum Rust abstractions for network device drivers and an example of a Rust network device driver, a simpler version of drivers/net/dummy.c.

BPF

v1: bpf: bpf/memalloc: Allow non-atomic alloc_bulk

This series attempts to add ways where the allocation could occur non-atomically, allowing the allocator to take mutexes, perform IO, and/or sleep.

v1: dwarves: dwarves: detect BTF kinds supported by kernel

When a newer pahole is run on an older kernel, it often knows about BTF kinds that the kernel does not support, and adds them to the BTF representation. This is a problem because the BTF generated is then embedded in the kernel image. When it is later read - possibly by a different older toolchain or by the kernel directly - it is not usable.

v3: bpf-next: bpf: Support new insns from cpu v4

This patch set added kernel support for insns proposed in [1] except BPF_ST which already has full kernel support. Beside the above proposed insns, LLVM will generate BPF_ST insn as well under -mcpu=v4 ([2]).

v2: bpf-next: selftests/bpf: improve ringbuf benchmark output

The ringbuf benchmarks print headers for each section of benchmarks. The naming conventions lead a user of the benchmarks to some confusion. This change is a cosmetic update to the output of that benchmark; no changes were made to what the script actually executes.

v3: bpf-next: XDP metadata via kfuncs for ice

This series introduces XDP hints via kfuncs [0] to the ice driver.

Series brings the following existing hints to the ice driver:

  • HW timestamp
  • RX hash with type

v1: bpf-next: bpf: sync tools/ uapi header with

Seeing the following:

Warning: Kernel ABI header at ‘tools/include/uapi/linux/bpf.h’ differs from latest version at ‘include/uapi/linux/bpf.h’

…so sync tools version missing some list_node/rb_tree fields.

v6: bpf-next: BPF link support for tc BPF programs

This series adds BPF link support for tc BPF programs. We initially presented the motivation, related work and design at last year’s LPC conference in the networking & BPF track [0], and a recent update on our progress of the rework during this year’s LSF/MM/BPF summit [1]. The main changes are in first two patches and the last two have an extensive batch of test cases we developed along with it, please see individual patches for details. We tested this series with tc-testing selftest suite as well as BPF CI/selftests. Thanks!

v7: bpf-next: xsk: multi-buffer support

This series of patches add multi-buffer support for AF_XDP. XDP and various NIC drivers already have support for multi-buffer packets. With this patch set, programs using AF_XDP sockets can now also receive and transmit multi-buffer packets both in copy as well as zero-copy mode. ZC multi-buffer implementation is based on ice driver.

v1: net-next: page_pool: split types and declarations from page_pool.h

Split types and pure function declarations from page_pool.h and add them in page_page_types.h, so that C sources can include page_pool.h and headers should generally only include page_pool_types.h as suggested by jakub.

v1: bpf-next: bpf, x86: initialize the variable “first_off” in save_args()

As Dan Carpenter reported, the variable “first_off” which is passed to clean_stack_garbage() in save_args() can be uninitialized, which can cause runtime warnings with KMEMsan. Therefore, init it with 0.

v2: bpf-next: allow bpf_map_sum_elem_count for all program types

This series is a follow up to the recent change [1] which added per-cpu insert/delete statistics for maps. The bpf_map_sum_elem_count kfunc presented in the original series was only available to tracing programs, so let’s make it available to all.

v12: vhost: virtio core prepares for AF_XDP

So rethinking this, firstly, we can support premapped-dma only for devices with VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it, they have to update the device to support VIRTIO_F_RING_RESET, and they can also enable the device’s VIRTIO_F_ACCESS_PLATFORM feature.

v2: net: bpf: do not return NET_XMIT_xxx values on bpf_redirect

skb_do_redirect handles returns error code from both rx and tx path. The tx path codes are special, e.g. NET_XMIT_CN: they are non-negative, and can conflict with LWTUNNEL_XMIT_xxx values. Directly returning such code can cause unexpected behavior. We found at least one bug that will panic the kernel through KASAN report when we are redirecting packets to a down or carrier-down device at lwt xmit hook:

https://gist.github.com/zhaiyan920/8fbac245b261fe316a7ef04c9b1eba48

v5: net-next: virtio/vsock: support datagrams

This series introduces support for datagrams to virtio/vsock.

It is a spin-off (and smaller version) of this series from the summer:https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/

Please note that this is an RFC and should not be merged until associated changes are made to the virtio specification, which will follow after discussion from this series.

v1: bpf-next: bpf, net: Introduce skb_pointer_if_linear().

Network drivers always call skb_header_pointer() with non-null buffer. Remove !buffer check to prevent accidental misuse of skb_header_pointer(). Introduce skb_pointer_if_linear() instead.

v1: V2,net-next: net: mana: Add page pool for RX buffers

Add page pool for RX buffers for faster buffer cycle and reduce CPU usage.

The standard page pool API is used.

v1: bpf: lwt: do not return NET_XMIT_xxx values on bpf_redirect

skb_do_redirect handles returns error code from both rx and tx path. The tx path codes are special, e.g. NET_XMIT_CN: they are non-negative, and can conflict with LWTUNNEL_XMIT_xxx values. Directly returning such code can cause unexpected behavior.

v5: bpf-next: bpf: Force to MPTCP

As is described in the “How to use MPTCP?” section in MPTCP wiki [1]:

“Your app can create sockets with IPPROTO_MPTCP as the proto: ( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be forced to create and use MPTCP sockets instead of TCP ones via the mptcpize command bundled with the mptcpd daemon.”

v2: bpf-next: BPF Refcount followups 2: owner field

This series adds an ‘owner’ field to bpf_{list,rb}_node structs, to be used by the runtime to determine whether insertion or removal operations are valid in shared ownership scenarios. Both the races which the series fixes and the fix itself are inspired by Kumar’s suggestions in [0].

v1: net: igc: Prevent garbled TX queue with XDP ZEROCOPY

In normal operation, each populated queue item has next_to_watch pointing to the last TX desc of the packet, while each cleaned item has it set to 0. In particular, next_to_use that points to the next (necessarily clean) item to use has next_to_watch set to 0.

v2: net-next: virtio_net: add per queue interrupt coalescing support

Currently, coalescing parameters are grouped for all transmit and receive virtqueues. This patch series add support to set or get the parameters for a specified virtqueue.

When the traffic between virtqueues is unbalanced, for example, one virtqueue is busy and another virtqueue is idle, then it will be very useful to control coalescing parameters at the virtqueue granularity.

周边技术动态

Qemu

v5: for-8.2: riscv: add ‘max’ CPU, deprecate ‘any’

I’m sending this new version based on another observation I made during another follow-up work (I’ll post it shortly).

‘mmu’ and ‘pmp’ aren’t really extensions in the most tradicional sense, they’re more like features. So, in patch 1, I moved both to the new riscv_cpu_options array.

v1: target/riscv: add missing riscv,isa strings

Found these 2 instances while working in more 8.2 material.

I believe both are safe for freeze but I won’t lose my sleep if we decide to postpone it.

v1: QEMU RISC-V IOMMU Support

This series introduces a RISC-V IOMMU device emulation implementation with two stage address translation logic, device and process translation context mapping and queue interfaces, along with riscv/virt machine bindings (patch 5) and memory attributes extensions for PASID support (patch 3,4).

This series is based on incremental patches created during RISC-V International IOMMU Task Group discussions and specification development process, with original series available in the the maintainer’s repository branch [2].

v1: riscv-to-apply queue

The following changes since commit 361d5397355276e3007825cc17217c1e4d4320f7:

Merge tag ‘block-pull-request’ of https://gitlab.com/stefanha/qemu into staging (2023-07-17 15:49:27 +0100)

are available in the Git repository at:

https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230719-1

for you to fetch changes up to 32be32509987fbe42cf5c2fd3cea3c2ad6eae179:

target/riscv: Fix LMUL check to use VLEN (2023-07-19 14:37:26 +1000)

v1: risc-v: Add ISA extension smcntrpmf support

This patch series adds the support for RISC-V ISA extension smcntrpmf (cycle and privilege mode filtering) [1]. QEMU only calculates dummy cycles and instructions, so there is no actual means to stop the icount in QEMU. Therefore, this series only add the read/write behavior of the relevant CSRs such that the implemented firmware support [2] can work without causing unnecessary illegal instruction exceptions.

[1] https://github.com/riscv/riscv-smcntrpmf [2] https://github.com/rivosinc/opensbi/tree/dev/kaiwenx/smcntrpmf_upstream

v1: target/riscv: Clearing the CSR values at reset and syncing the MPSTATE with the host

Fix the guest reboot error when using KVM There are two issues when rebooting a guest using KVM

  1. When the guest initiates a reboot the host is unable to stop the vcpu
  2. When running a SMP guest the qemu monitor system_reset causes a vcpu crash

This can be fixed by clearing the CSR values at reset and syncing the MPSTATE with the host.

v1: for-8.2: target/riscv: add zicntr and zihpm flags

I decided to include flags for both timer/counter extensions to make it easier for us later on when dealing with the RVA22 profile (which includes both).

The features were already implemented by Atish Patra some time ago, but back then these 2 extensions weren’t introduced yet. This means that, aside from extra stuff in riscv,isa FDT no other functional changes were made.

v1: target/riscv/cpu.c: check priv_ver before auto-enable zca/zcd/zcf

Commit bd30559568 made changes in how we’re checking and disabling extensions based on env->priv_ver. One of the changes was to move the extension disablement code to the end of realize(), being able to disable extensions after we’ve auto-enabled some of them.

v3: for-8.2: target/riscv: add ‘max’ CPU, deprecate

This version has changes suggested in v2. The most significant change is the deprecation of the ‘any’ CPU in patch 8.

The reasoning behind it is that Alistair mentioned that the ‘any’ CPU intended to work like the newly added ‘max’ CPU, so we’re better of removing the ‘any’ CPU since it’ll be out of place. We can’t just remove the CPU out of the gate so we’ll have to make it do with deprecation first.

v6: Add RISC-V KVM AIA Support

This series adds support for KVM AIA in RISC-V architecture.

In order to test these patches, we require Linux with KVM AIA support which can be found in the riscv_kvm_aia_hwaccel_v1 branch at https://github.com/avpatel/linux.git

v2: for-8.2: target/riscv: add ‘max’ CPU type

This second version has smalls tweak in patch 6 that I found out missing while chatting with Conor in the v1 review.

riscv kvm breakage

This breakage crept in while cross-riscv64-system was otherwise broken in configure:

https://gitlab.com/qemu-project/qemu/-/jobs/4633277557#L4165

v3: target/riscv: Add Zihintntl extension ISA string to DTS

In v2, I rebased the patch on https://github.com/alistair23/qemu/tree/riscv-to-apply.next However, I forgot to add “Reviewed-by” in v2, so I add them in v3.

v8: riscv: Add support for the Zfa extension

Since QEMU does not support the RISC-V quad-precision floating-point ISA extension (Q), this patch does not include the instructions that depend on this extension. All other instructions are included in this patch.

Buildroot

boot/edk2: bump to version edk2-stable202305

The main motivation of this bump is the RISC-V QEMU Virt support introduced in edk2-stable202302 (not yet supported in Buildroot).

U-Boot

v7: Add StarFive JH7110 PCIe drvier support

These PCIe series patches are based on the JH7110 RISC-V SoC and VisionFive V2 board.

The PCIe driver depends on gpio, pinctrl, clk and reset driver to do init. The PCIe dts configuation includes all these setting.

The PCIe drivers codes has been tested on the VisionFive V2 boards. The test devices includes M.2 NVMe SSD and Realtek 8169 Ethernet adapter.

Pull request: u-boot-spi/master

The following changes since commit bf5152d0108683bbaabf9d7a7988f61649fc33f4:

Merge branch ‘master’ of https://source.denx.de/u-boot/custodians/u-boot-riscv (2023-07-12 13:10:04 -0400)

are available in the Git repository at:

https://source.denx.de/u-boot/custodians/u-boot-spi master

for you to fetch changes up to 4a31e145217cecc3d421f96eafcd2cfd9c670929:

mtd: spi-nor: Add support for w25q256jwm (2023-07-13 14:17:40 +0530)

Please pull u-boot-marvell/master

please pull the following Marvell MVEBU related patches into master:

  • mvebu: Thecus: Misc enhancement and cleanup (Tony)
  • mvebu: Add AC5X Allied Telesis x240 board support incl NANDcontroller enhancements for this SoC (Chris)

Here the Azure build, without any issues:

https://dev.azure.com/sr0718/u-boot/_build/results?buildId=305&view=results

v2: riscv: Initial support for Lichee PI 4A board

Sipeed’s Lichee PI 4A board is based on T-HEAD’s TH1520 SoC which consists of quad core XuanTie C910 CPU, plus one C906 CPU and one E902 CPU.

In this series, we add a basic device tree, including UART CPU, PLIC, make it capable of running into a serial console.



Read Album:

Read Related:

Read Latest: