泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

儿童Linux系统,可打字编程学数理化
请稍侯

RISC-V Linux 内核及周边技术动态第 82 期

呀呀呀 创作于 2024/03/13

时间:20240310
编辑:晓怡
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v9: riscv: sophgo: add clock support for Sophgo CV1800/SG2000 SoCs

Add clock controller support for the Sophgo CV1800B, CV1812H and SG2000.

v9: riscv: Use Kconfig to set unaligned access speed

If the hardware unaligned access speed is known at compile time, it is possible to avoid running the unaligned access speed probe to speedup boot-time.

v16: Linux RISC-V AIA Support

The RISC-V AIA specification is ratified as-per the RISC-V international process. The latest ratified AIA specifcation can be found at: https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf

v1: riscv: dmi: Add SMBIOS/DMI support

Enable the dmi driver for riscv which would allow access the SMBIOS info through some userspace file(/sys/firmware/dmi/*).

The change was based on that of arm64 and has been verified by dmidecode tool.

GIT PULL: KVM/riscv changes for 6.9

The following changes since commit d206a76d7d2726f3b096037f2079ce0bd3ba329b:

Linux 6.8-rc6 (2024-02-25 15:46:06 -0800)

are available in the Git repository at:

https://github.com/kvm-riscv/linux.git tags/kvm-riscv-6.9-1

v3: clocksource: timer-riscv: Clear timer interrupt on timer initialization

In the RISC-V specification, the stimecmp register doesn’t have a default value. To prevent the timer interrupt from being triggered during timer initialization, clear the timer interrupt by writing stimecmp with a maximum value.

v1: riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS

This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V. This allows each ftrace callsite to provide an ftrace_ops to the common ftrace trampoline, allowing each callsite to invoke distinct tracer functions without the need to fall back to list processing or to allocate custom trampolines for each callsite. This significantly speeds up cases where multiple distinct trace functions are used and callsites are mostly traced by a single tracer.

[v2 PATCH 0/3] arch: mm, vdso: consolidate PAGE_SIZE definition

Naresh noticed that the newly added usage of the PAGE_SIZE macro in include/vdso/datapage.h introduced a build regression. I had an older patch that I revived to have this defined through Kconfig rather than through including asm/page.h, which is not allowed in vdso code.

v5: riscv: add initial support for Canaan Kendryte K230

K230 is an ideal chip for RISC-V Vector 1.0 evaluation now. Add initial support for it to allow more people to participate in building drivers to mainline for it.

GIT PULL: RISC-V firmware drivers for v6.9

Feel free to cherry pick this either if it makes more sense (or even send it to fixes instead). It’s a fix for an oversize allocation, so given how late it is in the merge window I opted to send it as 6.9 material.

v1: drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported

RISC-V perf does not yet support branch sampling. Two riscv bpf testcases get_branch_snapshot and perf_branches/perf_branches_hw failed due to not disabling such sampling.

v1: Inconsistent sifive,fu540-c000-uart binding.

note that the driver has a trailing 0 in the binding while the yaml description and the DT part does not. The ‘sifive,uart’ has a trailing 0 where the 0 denotes the version UART IP.

v4: riscv: pwm: sophgo: add pwm support for CV1800

The Sophgo CV1800 chip provides a set of four independent PWM channel outputs. This series adds PWM controller support for Sophgo cv1800.

进程调度

v1: sched: Add missing memory barrier in switch_mm_cid

Many architectures’ switch_mm() (e.g. arm64) do not have an smp_mb() which the core scheduler code has depended upon since commit:

commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")

v1: -v1: sched/balancing: Standardize the naming of scheduler load-balancing functions

Over the years we’ve grown a colorful zoo of scheduler load-balancing function names - both following random, idiosyncratic patterns, and gaining historic misnomers that are not accurate anymore.

v1: sched: Deprecate DOUBLE_TICK feature

Upon examining commit 5e963f2bd465, titled “sched/fair: Commit to EEVDF,”

v1: RESEND: kernel/sched: use seq_putc instead of seq_puts

Using seq_putc for newline characters is faster and more appropriate than seq_puts, since only one character is passed and there is no need to use a more powerful and less fast function

v1: sched/fair: simplify __calc_delta()

Based on how __calc_delta() is called now, the input parameter, weight is always NICE_0_LOAD. I think we don’t need it as an input parameter now?

Also, when weight is always NICE_0_LOAD, the initial fact value is always 2^10, and the first fact_hi will always be 0. Thus, we can get rid of the first if bock.

内存管理

v1: mm/slub: mark racy accesses on slab->slabs

The reads of slab->slabs are racy because it may be changed by put_cpu_partial concurrently. And in slabs_cpu_partial_show ->slabs is only used for output. Data-racy reads from shared variables that are used only for diagnostic purposes should typically use data_race(), since it is normally not a problem if the values are off by a little.

v1: mm/vmalloc.c: optimize to reduce arguments of alloc_vmap_area()

If called by __get_vm_area_node(), by open coding the field assignments of ‘struct vm_struct *vm’, and move the vm->flags and vm->caller assignments into __get_vm_area_node(), the passed in arguments ‘flags’ and ‘caller’ can be removed.

v2: Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy

This patchset is to optimize the cross-socket memory access with MPOL_PREFERRED_MANY policy.

v2: reclaim contended folios asynchronously instead of promoting them

Commit 6d4675e60135 (“mm: don’t be stuck to rmap lock on reclaim path”) prevents the reclaim path from becoming stuck on the rmap lock. However, it reinserts those folios at the head of the LRU during shrink_folio_list, even if those folios are very cold.

v3: bpf-next: bpf: Introduce BPF arena.

The work on bpf_arena was inspired by Barret’s work: https://github.com/google/ghost-userspace/blob/main/lib/queue.bpf.h that implements queues, lists and AVL trees completely as bpf programs using giant bpf array map and integer indices instead of pointers.

v2: mm/kmemleak: Don’t hold kmemleak_lock when calling printk()

When some error conditions happen (like OOM), some kmemleak functions call printk() to dump out some useful debugging information while holding the kmemleak_lock. This may cause deadlock as the printk() function may need to allocate additional memory leading to a create_object() call acquiring kmemleak_lock again.

v2: mm: Kill ->launder_folio()

invalidate_inode_pages2_range() and its wrappers are frequently used to invalidate overlapping folios prior to and after doing direct I/O. This calls ->launder_folio() to flush dirty folios out to the backing store, keeping the folio lock across the I/O - presumably to prevent the folio from being redirtied and thereby prevent it from being removed.

v5: filemap: avoid unnecessary major faults in filemap_fault()

The major fault occurred when using mlockall(MCL_CURRENT | MCL_FUTURE) in application, which leading to an unexpected issue[1].

This caused by temporarily cleared PTE during a read+clear/modify/write update of the PTE, eg, do_numa_page()/change_pte_range().

v3: ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512

Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere Computing) are planning to ship systems with 512 CPUs. So that all CPUs on these systems can be used with defconfig, we’d like to bump NR_CPUS to 512. Therefore this patch increases the default NR_CPUS from 256 to 512.

v2: mm: Add an explicit smp_wmb() to UFFDIO_CONTINUE

Users of UFFDIO_CONTINUE may reasonably assume that a write memory barrier is included as part of UFFDIO_CONTINUE. That is, a user may believe that all writes it has done to a page that it is now UFFDIO_CONTINUE’ing are guaranteed to be visible to anyone subsequently reading the page through the newly mapped virtual memory region.

v1: mm: Replace ->launder_folio() with flush and wait

Here’s a patch to have a go at getting rid of ->launder_folio(). Since it’s failable and cannot guarantee that pages in the range are removed, I’ve tried to replace laundering with just flush-and-wait, dropping the folio lock around the I/O.

v5: Memory allocation profiling

Rebased over mm-unstable.

Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production.

v6: STABLE: mm/migrate: set swap entry values of THP tail pages properly.

The tail pages in a THP can have swap entry information stored in their private field. When migrating to a new page, all tail pages of the new page need to update ->private to avoid future data corruption.

v3: device backed vmemmap crash dump support

Hello folks,

Compared with the V2[1] I posted a long time ago, this time it is a completely new proposal design.

Background and motivate overview

v2: make the hugetlb migration strategy consistent

As discussed in previous thread [1], there is an inconsistency when handling hugetlb migration. When handling the migration of freed hugetlb, it prevents fallback to other NUMA nodes in alloc_and_dissolve_hugetlb_folio().

v2: mm: hold PTL from the first PTE while reclaiming a large folio

Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE modifications preceded by pte clear. While iterating over PTEs of a large folio, it only starts acquiring PTL from the first valid (present) PTE. PTE modifications can temporarily set PTEs to pte_none.

v13: DEPT(Dependency Tracker)

I added a document describing DEPT, that would help you understand what DEPT is and how DEPT works. You can use DEPT just with CONFIG_DEPT on and by checking dmesg in runtime.

v1: RESEND: Split IOMMU DMA mapping operation to two steps

This is posted as RFC to get a feedback on proposed split, but RDMA, VFIO and DMA patches are ready for review and inclusion, the NVMe patches are still in progress as they require agreement on API first.

文件系统

v8: rust: xarray: Add an abstraction for XArray

This abstraction is part of the set of dependencies I need to upstream rustgem, a virtual GEM provider driver in the DRM [1]. Also, this abstraction will be useful for the upstreaming process of the drm/asahi driver.

v10: Landlock: IOCTL support

Introduce the LANDLOCK_ACCESS_FS_IOCTL_DEV right, which restricts the use of ioctl(2) on block and character devices.

v1: ext4: Add support for ext4_map_blocks_atomic()

Currently ext4 exposes [fsawu_min, fsawu_max] size as [blocksize, clustersize] (given the hw block device constraints are larger than FS atomic write units).

v1: unicode: make utf8 test count static

The variables failed_tests and total_tests are not used outside of the utf8-selftest.c file so make them static to avoid the following warnings:

v1: fiemap extension to add physical extent length

For many years, various btrfs users have written programs to discover the actual disk space used by files, using root-only interfaces. However, this information is a great fit for fiemap: it is inherently

GIT PULL: vfs uuid

/* Summary */ This adds two new ioctl()s for getting the filesystem uuid and retrieving the sysfs path based on the path of a mounted filesystem. The bcachefs pull request should include a merge of this as well as it depends on the two new ioctls. Getting the filesystem uuid has been implemented in filesystem specific code for a while it’s now lifted as a generic ioctl.

v1: eventpoll: record task that adds to monitor list.

Recording task_struct in involved eppoll_entry’s wait_queue_entry, allows us to check this using a probe (say dtrace) at this function. We could also achieve this by checking wait_queue_entry on eventpoll’s wait_queue_head itself, but that would involve more indirections.

v2: statx: stx_subvol

Add a new statx field for (sub)volume identifiers, as implemented by btrfs and bcachefs.

This includes bcachefs support; we’ll definitely want btrfs support as well.

v2: isofs: convert isofs to use the new mount API

This also renames iso9660_options to isofs_options, for consistency.

v2: fs_parser: handle parameters that can be empty and don’t have a value

While investigating an ext4/053 fstest failure, I realised that there was an issue when the flag ‘fs_param_can_be_empty’ is set in a parameter and it doesn’t have a value

v2: minix: convert minix to use the new mount api

Convert the minix filesystem to use the new mount API.

Tested using mount and remount on minix device.

v2: bpf-next: add new acquire/release BPF kfuncs

The original cover letter providing background context and motivating factors around the needs for the BPF kfuncs introduced within this patch series can be found here [0], so please do reference that if need be.

v1: vfs: convert debugfs & tracefs to the new mount API

Since debugfs and tracefs are cut & pasted one way or the other, do these at the same time.

Both of these patches originated in dhowells’ tree at https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=mount-api-viro

v1: reiserfs: Convert to writepages

Use buffer_migrate_folio to handle folio migration instead of writing out dirty pages and reading them back in again. Use writepages to write out folios more efficiently. We now only do that wait_on_write_block check once per call to writepages instead of once per page.

v1: coredump: get machine check errors early rather than during iov_iter

The commit f1982740f5e7 (“iov_iter: Convert iterate*() to inline funcs”) leads to deadloop in generic_perform_write()[1], due to return value of copy_page_from_iter_atomic() changed from non-zero value to zero.

v1: xattr: restrict vfs_getxattr_alloc() allocation size

The vfs_getxattr_alloc() interface is a special-purpose in-kernel api that does a racy query-size+allocate-buffer+retrieve-data. It is used by EVM, IMA, and fscaps to retrieve xattrs.

v1: fanotify: allow freeze when waiting response for permission events

This is a long-standing issue that uninterruptible sleep in fanotify could make system hibernation fail if the usperspace server gets frozen before the process waiting for the response (as reported e.g. [1][2]).

v1: fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion

The first kiocb_set_cancel_fn() argument may point at a struct kiocb that is not embedded inside struct aio_kiocb. With the current code, depending on the compiler.

v5: fs-verity support for XFS

Here’s v5 of my patchset of adding fs-verity support to XFS.

v1: tmpfs: don’t interrupt fallocate with EINTR

I have a program that sets up a periodic timer with 10ms interval. When the program attempts to call fallocate on tmpfs, it goes into an infinite loop.

v1: Revert “fs/aio: Make io_cancel() generate completions again”

Patch “fs/aio: Make io_cancel() generate completions again” is based on the assumption that calling kiocb->ki_cancel() does not complete R/W requests.

v2: block atomic writes for XFS

This series expands atomic write support to filesystems, specifically XFS. Extent alignment is based on new feature forcealign (again), and we do not rely on XFS rtvol extent alignment this time.

网络设备

v1: net-next: trace: use TP_STORE_ADDRS macro

Using the macro for other tracepoints use to be more concise. No functional change.

v5: net-next: net: Provide SMP threads for backlog NAPI

The RPS code and “deferred skb free” both send IPI/ function call to a remote CPU in which a softirq is raised. This leads to a warning on PREEMPT_RT because raising softiqrs from function call led to undesired behaviour in the past. I had duct tape in RT for the “deferred skb free” and Wander Lairson Costa reported the RPS case.

v2: ethtool HW timestamping statistics

The goal of this patch series is to introduce a common set of ethtool statistics for hardware timestamping that a driver implementer can hook into. The statistics counters added are based on what I believe are common patterns/behaviors found across various hardware timestamping implementations seen in the kernel tree today. The mlx5 family of devices is used as the PoC for this patch series. Other vendors are more than welcome to chim in on this series.

v1: dt-bindings: net: wireless: brcm,bcm4329-fmac: Add CYW43439 DT binding

CYW43439 is a Wi-Fi + Bluetooth combo device from Infineon. The WiFi part is capable of 802.11 b/g/n. This chip is present e.g. on muRata 1YN module. Extend the binding with its DT compatible.

v1: net-next: netlink: specs: support generating code for genl socket priv

The family struct is auto-generated for new families, support use of the sock_priv_* mechanism added in commit a731132424ad (“genetlink: introduce per-sock family private storage”).

v2: ip link: hsr: Add support for passing information about INTERLINK device

The HSR capable device can operate in two modes of operations - Doubly Attached Node for HSR (DANH) and RedBOX.

The latter one allows connection of non-HSR aware device to HSR network. This node is called SAN (Singly Attached Network) and is connected via INTERLINK network device.

v2: nfp: flower: handle acti_netdevs allocation failure

The kmalloc_array() in nfp_fl_lag_do_work() will return null, if the physical memory has run out. As a result, if we dereference the acti_netdevs, the null pointer dereference bugs will happen.

v8: net-next: net: intel: start The Great Code Dedup + Page Pool for iavf

Here’s a two-shot: introduce {,Intel} Ethernet common library (libeth and libie) and switch iavf to Page Pool. Details are in the commit messages; here’s a summary:

v3: net-next: net/packet: Add getsockopt support for PACKET_COPY_THRESH

Currently getsockopt does not support PACKET_COPY_THRESH, and we are unable to get the value of PACKET_COPY_THRESH socket option through getsockopt.

v1: net-next: mlxsw: Support for nexthop group statistics

ECMP is a fundamental component in L3 designs. However, it’s fragile. Many factors influence whether an ECMP group will operate as intended: hash policy (i.e. the set of fields that contribute to ECMP hash calculation), neighbor validity, hash seed (which might lead to polarization) or the type of ECMP group used (hash-threshold or resilient).

[net PATCH] octeontx2-pf: Do not use HW TSO when gso_size < 16bytes

Hardware doesn’t support packet segmentation when segment size is < 16 bytes. Hence add an additional check and use SW segmentation in such case.

[Intel-wired-lan] v7: iwl-next: ice: Support 5 layer Tx scheduler topology

For performance reasons there is a need to have support for selectable Tx scheduler topology. Currently firmware supports only the default 9-layer and 5-layer topology. This patch series enables switch from default to 5-layer topology, if user decides to opt-in.

v1: net-next: net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID

Currently getsockopt does not support NETLINK_LISTEN_ALL_NSID, and we are unable to get the value of NETLINK_LISTEN_ALL_NSID socket option through getsockopt.

This patch adds getsockopt support for NETLINK_LISTEN_ALL_NSID.

v1: net-next: annotate data-races around sysctl_tcp_wmem[0]

Adding simple READ_ONCE() can avoid reading the sysctl knob meanwhile someone is trying to change it.

v2: net :mana: Add per-cpu stats for MANA device

Extend ‘ethtool -S’ output for mana devices to include per-CPU packet stats

v1: net-next: net: gro: move two declarations to include/net/gro.h

Move gro_find_receive_by_type() and gro_find_complete_by_type() to include/net/gro.h where they belong.

v1: net-next: doc/netlink/specs: Add vlan attr in rt_link spec

With command:# ./tools/net/ynl/cli.py
–spec Documentation/netlink/specs/rt_link.yaml
–do getlink –json ‘{“ifname”: “eno1.2”}’

v1: net-next: devlink: Add comments to use netlink gen tool

Add the comment to remind people not to manually modify the net/devlink/netlink_gen.c, but to use tools/net/ynl/ynl-regen.sh to generate it.

v1: net-next: Documentation: Add documentation for eswitch attribute

Provide devlink documentation for three eswitch attributes: mode, inline-mode, and encap-mode.

v3: net-next: ice: lighter locking for PTP time reading

This series removes the use of the heavy-weight PTP hardware semaphore in the gettimex64 path. Instead, serialization of access to the time register is done using a host-side spinlock. The timer hardware is shared between PFs on the PCI adapter, so the spinlock must be shared between ice_pf instances too.

v1: net-next: udp: no longer touch sk->sk_refcnt in early demux

After commits ca065d0cf80f (“udp: no longer use SLAB_DESTROY_BY_RCU”) and 7ae215d23c12 (“bpf: Don’t refcount LISTEN sockets in sk_assign()”) UDP early demux no longer need to grab a refcount on the UDP socket.

v7: VMware hypercalls enhancements

VMware hypercalls invocations were all spread out across the kernel implementing same ABI as in-place asm-inline. With encrypted memory and confidential computing it became harder to maintain every changes in these hypercall implementations.

v1: net-next: r8169: switch to new function phy_support_eee

Switch to new function phy_support_eee. This allows to simplify the code because data->tx_lpi_enabled is now populated by phy_ethtool_get_eee().

v1: net-next: net: phy: simplify a check in phy_check_link_status

Handling case err == 0 in the other branch allows to simplify the code. In addition I assume in “err & phydev->eee_cfg.tx_lpi_enabled” it should have been a logical and operator. It works as expected also with the bitwise and, but using a bitwise and with a bool value looks ugly to me.

v1: ipvs: allow netlink configuration from non-initial user namespace

Configuring ipvs in a non-initial user namespace using the genl netlink interface, e.g., by ‘ipvsadm’ is currently resulting in an ‘-EPERM’. This is due to the use of GENL_ADMIN_PERM flag in ‘ip_vs_ctl.c’.

[PATCH net-next -v5] net/core/dev.c: enable timestamp static key if CPU isolation is configured

For systems that use CPU isolation (via nohz_full), creating or destroying a socket with SO_TIMESTAMP, SO_TIMESTAMPNS or SO_TIMESTAMPING with flag SOF_TIMESTAMPING_RX_SOFTWARE will cause a static key to be enabled/disabled. This in turn causes undesired IPIs to isolated CPUs.

v1: net-next: ipv4: raw: check sk->sk_rcvbuf earlier

There is no point cloning an skb and having to free the clone if the receive queue of the raw socket is full.

v1: net-next: nexthop: Simplify dump error handling

The only error that can happen during a nexthop dump is insufficient space in the skb caring the netlink messages (EMSGSIZE). If this happens and some messages were already filled in, the nexthop code returns the skb length to signal the netlink core that more objects need to be dumped.

v1: net-next: net: phy: Don’t suspend/resume device not in use

In the case when an MDIO bus contains PHY device not attached to the any netdev or is attached to the external netdev, controlled by another driver and the driver is disabled, the bus, when PM suspend occurs, is trying to suspend/resume also the unattached phydev.

v1: net: openvswitch: Add sample multicasting.

** Background ** Currently, OVS supports several packet sampling mechanisms (sFlow, per-bridge IPFIX, per-flow IPFIX). These end up being translated into a userspace action that needs to be handled by ovs-vswitchd’s handler threads only to be forwarded to some third party application that will somehow process the sample and provide observability on the datapath.

安全增强

v1: randomize_kstack: Improve entropy diffusion

The kstack_offset variable was really only ever using the low bits for kernel stack offset entropy. Add a ror32() to increase bit diffusion.

v1: pstore/zone: Don’t clear memory twice

There is no need to call memset(…, 0, …) on memory allocated by kcalloc(). It is already zeroed.

Remove the redundant call.

v6: arm64: qcom: add AIM300 AIoT board support

Add AIM300 AIoT support along with usb, ufs, regulators, serial, PCIe, and PMIC functions. AIM300 Series is a highly optimized family of modules designed to support AIoT applications. It integrates QCS8550 SoC, UFS and PMIC chip etc.

v4: pstore: add multi-backend suuport

This is the 4th version of the patch set. In this patchset we aim to add pstore multi-backend support then user can register more than one pstore backend.

v2: overflow: Change DEFINE_FLEX to take __counted_by member

The norm should be flexible array structures with __counted_by annotations, so DEFINE_FLEX() is updated to expect that. Rename the non-annotated version to DEFINE_RAW_FLEX(), and update the few existing users.

v3: Add support for QoS configuration

This series adds QoS support for QNOC type device which can be found on SC7280 platform. It adds support for programming priority, priority forward disable and urgency forwarding. This helps in priortizing the traffic originating from different interconnect masters at NOC(Network On Chip).

v3: scsi: replace deprecated strncpy

This series contains multiple replacements of strncpy throughout the scsi subsystem.

v1: Bring kstack randomized perf closer to unrandomized

Currently with kstack randomization there is somewhere on the order of 5x worse variation in response latencies vs unrandomized syscalls. This is down from 10x on pre 6.2 kernels where the RNG reseeding was moved out of the syscall path, but get_random_uXX() still contains a fair amount of additional global state manipulation which is problematic.

v2: slab: Introduce dedicated bucket allocator

Repeating the commit logs for patch 4 here:

Dedicated caches are available For fixed size allocations via
kmem_cache_alloc(), but for dynamically sized allocations there is only
the global kmalloc API's set of buckets available. This means it isn't
possible to separate specific sets of dynamically sized allocations into
a separate collection of caches.

v3: sock: Use unsafe_memcpy() for sock_copy()

While testing for places where zero-sized destinations were still showing up in the kernel, sock_copy() and inet_reqsk_clone() were found, which are using very specific memcpy() offsets for both avoiding a portion of struct sock, and copying beyond the end of it (since struct sock is really just a common header before the protocol-specific allocation). Instead of trying to unravel this historical lack of container_of(), just switch to unsafe_memcpy(), since that’s effectively what was happening already (memcpy() wasn’t checking 0-sized destinations while the code base was being converted away from fake flexible arrays).

v2: greybus: Avoid fake flexible array for response data

FORTIFY_SOURCE has been ignoring 0-sized destinations while the kernel code base has been converted to flexible arrays. In order to enforce the 0-sized destinations (e.g. with __counted_by), the remaining 0-sized destinations need to be handled. Instead of converting an empty struct into using a flexible array, just directly use a pointer without any additional indirection. Remove struct gb_bootrom_get_firmware_response and struct gb_fw_download_fetch_firmware_response.

异步 IO

v1: Send and receive bundles

I went back to the drawing board a bit on the send multishot, and this is what came out.

First support was added for provided buffers for send. This works like provided buffers for recv/recvmsg, and the intent here to use the buffer ring queue as an outgoing sequence for sending.

v1: io_uring/net: correctly handle multishot recvmsg retry setup

If we loop for multishot receive on the initial attempt, and then abort later on to wait for more, we miss a case where we should be copying the io_async_msghdr from the stack to stable storage. This leads to the next retry potentially failing, if the application had the msghdr on the stack.

Rust For Linux

GIT PULL: Rust for v6.9

This is the next round of the Rust support.

All the commits have been in linux-next for more than a week.

v1: rust: don’t select CONSTRUCTORS

This was originally part of commit 4b9a68f2e59a0 (“rust: add support for static synchronisation primitives”) from the old Rust branch, which used module constructors to initialize globals containing various synchronisation primitives with pin-init. That commit has never been upstreamed, but the select CONSTRUCTORS statement ended up being included in the patch that initially added Rust support to the Linux Kernel.

v2: rust: add flags for shadow call stack sanitizer

Add flags to support the shadow call stack sanitizer, both in the dynamic and non-dynamic modes.

BPF

v1: bpf-next: selftests/bpf: add fexit and kretprobe triggering benchmarks

We already have kprobe and fentry benchmarks. Let’s add kretprobe and fexit ones for completeness.

v2: bpf-next: bpf: move sleepable flag from bpf_prog_aux to bpf_prog

prog->aux->sleepable is checked very frequently as part of (some) BPF program run hot paths. So this extra aux indirection seems wasteful and on busy systems might cause unnecessary memory cache misses.

v2: bpf-next: bpftool: Mount bpffs on provided dir instead of parent dir

When pinning programs/objects under PATH (eg: during “bpftool prog loadall”) the bpffs is mounted on the parent dir of PATH in the following situations:

  • the given dir exists but it is not bpffs.
  • the given dir doesn’t exist and the parent dir is not bpffs.

v1: bpf-next: bpf: cap BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()

On some architectures like ARM64, PMD_SIZE can be really large in some configurations. Like with CONFIG_ARM64_64K_PAGES=y the PMD_SIZE is 512MB.

v1: bpf-next: bpf: Allow helper bpf_get_ns_current_pid_tgid() in cgroup/sk_msg programs

Currently bpf_get_current_pid_tgid() is allowed in tracing, cgroup and sk_msg progs while bpf_get_ns_current_pid_tgid() is only allowed in tracing progs.

v4: bpf-next: bpf: arena prerequisites

These are bpf_arena prerequisite patches. Useful on its own.

v1: bpf: cpumap: Zero-initialise xdp_rxq_info struct before running XDP program

When running an XDP program that is attached to a cpumap entry, we don’t initialise the xdp_rxq_info data structure being used in the xdp_buff that backs the XDP program invocation. Tobias noticed that this leads to random values being returned as the xdp_md->rx_queue_index value for XDP programs running in a cpumap.

v1: bpf-next: Add bpf_link support for sk_msg prog

One of our internal services started to use sk_msg program and currently it used existing prog attach/detach2 as demonstrated in selftests. But attach/detach of all other bpf programs are based on bpf_link. Consistent attach/detach APIs for all programs will make things easy to undersand and less error prone. So this patch added bpf_link support for BPF_PROG_TYPE_SK_MSG.

v1: drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported

RISC-V perf does not yet support branch sampling. Two riscv bpf testcases get_branch_snapshot and perf_branches/perf_branches_hw failed due to not disabling such sampling.

v3: DONOTMERGE: Add minimal XDP support to TI AM65 CPSW Ethernet driver

This patch adds XDP support to TI AM65 CPSW Ethernet driver.

v3: bpf-next: bpf: Add a generic bits iterator

Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been added for the new bpf_iter_bits functionality. These kfuncs enable the iteration of the bits from a given address and a given number of bits.

v4: bpf-next: mm: Enforce ioremap address space and introduce sparse vm_area

There are various users of kernel virtual address space: vmalloc, vmap, ioremap, xen.

  • vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag and these areas are treated differently by KASAN.

v6: net-next: Device Memory TCP

This revision largely rebases on top of net-next and addresses the little feedback RFCv5 received.

v1: bpf-next: arm64, bpf: Use bpf_prog_pack for arm64 bpf trampoline

We used bpf_prog_pack to aggregate bpf programs into huge page to relieve the iTLB pressure on the system. This was merged for ARM64[1] We can apply it to bpf trampoline as well. This would increase the preformance of fentry and struct_ops programs.

周边技术动态

Qemu

v10: riscv: set vstart_eq_zero on mark_vs_dirty

This version has changes in the wording on patch 9 subject and commit msg. The previous subject, “target/riscv: Clear vstart_qe_zero flag”, isn’t accurate. We’re not clearing (i.e. setting to false/zero) the flag, we’re setting the flag to ‘true’ in the end of each insns.

v1: target/riscv: raise an exception when CSRRS/CSRRC writes a read-only CSR

Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects.

v1: riscv-to-apply queue

The following changes since commit 8f6330a807f2642dc2a3cdf33347aa28a4c00a87:

Merge tag ‘pull-maintainer-updates-060324-1’ of https://gitlab.com/stsquad/qemu into staging (2024-03-06 16:56:20 +0000)

v2: riscv: QEMU RISC-V IOMMU Support

This is the second version of the work Tomasz sent in July 2023 [1]. I’ll be helping Tomasz upstreaming it.

v1: target/riscv: Support Zve32x and Zve64x extensions

This patch series adds the support for Zve32x and Zvx64x and makes vector registers visible in GDB if any of the V/Zve/Zvk extensions is enabled.

v1: target/riscv/vector_helper.c: Avoid shifting negative in fractional LMUL checking

When vlmul is larger than 5, the original fractional LMUL checking may gets unexpected result.

v1: target/riscv: Implement dynamic establishment of custom decoder

In this patch, we modify the decoder to be a freely composable data structure instead of a hardcoded one. It can be dynamically builded up according to the extensions. This approach has several benefits:

  1. Provides support for heterogeneous cpu architectures. As we add decoder in CPUArchState, each cpu can have their own decoder, and the decoders can be different due to cpu’s features.

v1: Add RISC-V Server Platform Reference Board

The RISC-V Server Platform specification[1] defines a standardized set of hardware and software capabilities, that portable system software, such as OS and hypervisors can rely on being present in a RISC-V server platform. This patchset provides a RISC-V Server Platform (RVSP) reference implementation on qemu which is in compliance with the spec as faithful as possible.

Buildroot

v1: next: toolchain/toolchain-external/toolchain-external-bootlin: bump to 2024.02

support in Buildroot. Notable changes:

  • Bleeding edge toolchains now use binutils 2.42, and stable toolchains use binutils 2.41

U-Boot

v1: board: sophgo: milkv_duo: Add ethernet support for Milk-V Duo board

This series add init code for cv1800b ethernet phy and enable ethernet support for Sophgo Milk-V Duo board.

v2: mmc: sophgo: milkv_duo: Add SD card support for Milk-V Duo board

This series add sdhci driver for cv1800b SoC and enable SD card support for Sophgo Milk-V Duo board.

v2: riscv: cpu: Add support for cv1800b SoC

This series add basic support for cv1800b SoC and enable dcache support.

The cv1800b utilizes CSR instructions to manipulate the first and second bits in the MHCR register (0x7C1) to indicate the activation status of icache and dcache.

v1: cmd: sbi: Correctly display unknown implementation IDs

was shown. The number 16777216 is not the implementation ID.

  • Show the correct number
  • Use a hexadecimal output format
  • Add a missing line feed

v1: riscv: dts: jh7110: Enable PLL node in SPL

Previously PLL node was missing from SPL dts. This caused BUS_ROOT to stay on OSC clock (24Mhz). As a result, all peripherals have to run at a much lower frequency, and loading from sdcard/emmc is slow. Thus, enabling PLL node in dts to fix this.

v1: riscv: cpu: improve multi-letter extension detection in supports_extension()

The first multi-letter extension after the single-letter extensions does not have to be preceded by an underscore, which could cause the parser to mistakenly find a single-letter extension after the start of the multi-letter portion of the string.

v2: arm64: Enable CONFIG_64BIT for static analysis

The makefiles currently pass -m32 to Smatch static checker when I’m building on arm64. Also the arch is set to “arm” and Smatch thinks “arm” is 32 bits and “arm64” is 64 bits. With this patchset we pass -m64 and Smatch works correctly.



Read Album:

Read Related:

Read Latest: