RISC-V Linux 内核及周边技术动态第 75 期

呀呀呀创作于 2024/01/28

时间：20240121
编辑：晓怡
仓库：RISC-V Linux 内核技术调研活动
赞助：PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v2: RISC-V: mm: do not treat hint addr on mmap as the upper bound to search

Previous patch series[1] changes a mmap behavior that treats the hint address as the upper bound of the mmap address range. The motivation of the previous patch series is that some user space software may assume 48-bit address space and use higher bits to encode some information, which may collide with large virtual address space mmap may return.

GIT PULL: RISC-V Patches for the 6.8 Merge Window, Part 4

The following changes since commit cb51bfee7f62a8e26b694f9d84c0041b3e3ccc71:
Merge patch series “riscv: hwprobe: add Zicond, Zacas and Ztso support” (2024-01-09 20:14:51 -0800)

GIT PULL: RISC-V Patches for the 6.8 Merge Window, Part 3

The following changes since commit cb51bfee7f62a8e26b694f9d84c0041b3e3ccc71:
Merge patch series “riscv: hwprobe: add Zicond, Zacas and Ztso support” (2024-01-09 20:14:51 -0800)

v1: riscv: Use kcalloc() instead of kzalloc()

As noted in the “Deprecated Interfaces, Language Features, Attributes, and Conventions” documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.

v1: Add camera subsystem for StarFive

This series add camera subsystem node for StarFive JH7110 SoC. It based on top of the master branch of media_stage repository.

GIT PULL: KVM/riscv changes for 6.8 part #2

We have the following additional KVM RISC-V changes for 6.8: 1) Zbc extension support for Guest/VM 2) Scalar crypto extensions support for Guest/VM 3) Vector crypto extensions support for Guest/VM 4) Zfh[min] extensions support for Guest/VM 5) Zihintntl extension support for Guest/VM 6) Zvfh[min] extensions support for Guest/VM 7) Zfa extension support for Guest/VM

v2: riscv: lib: Check if output in asm goto supported

The output field of an asm goto statement is not supported by all compilers. If it is not supported, fallback to the non-optimized code.

v3: -next: RISC-V: ACPI: Add LPI support

This series adds support for Low Power Idle (LPI) on ACPI based platforms.
LPI is described in the ACPI spec [1]. RISC-V FFH spec required to enable this is available at [2].

v1: init: refactor the generic cpu_to_node for NUMA

(0) We list the ARCHs which support the NUMA:arm64, loongarch, powerpc, riscv,sparc, mips, s390, x86,
(1) Some ARCHs in (0) override the generic cpu_to_node(), such as:sparc, mips, s390, x86.
Since these ARCHs have their own cpu_to_node(), we do not care
about them.

进程调度

v2: RESEND: sched/fair: Do not scan non-movable tasks several times

If busiest rq is small, nr_running < SCHED_NR_MIGRATE_BREAK and all tasks are not movable, detach_tasks() should not iterate more than tasks available in the busiest rq.

v1: sched/eevdf: using leftmost improves the readability of the code

Using ‘leftmost’ enhances code readability, without involving any logical changes.

内存管理

v1: mm: zswap: simplify zswap_swapoff()

This mini-series simplifies aims to zswap_swapoff(), which should simplify in progress work touching it (zswap tree split, rbtree to xarray conversion).

v1: Hugetlb pages should not be reserved by shmat() if SHM_NORESERVE

For shared memory of type SHM_HUGETLB, hugetlb pages are reserved in shmget() call. If SHM_NORESERVE flags is specified then the hugetlb pages are not reserved. However when the shared memory is attached with the shmat() call the hugetlb pages are getting reserved incorrectly for SHM_HUGETLB shared memory created with SHM_NORESERVE.

v1: filemap: add mapping_mapped check in filemap_unaccount_folio()

Recently, we discovered a syzkaller issue that triggers VM_BUG_ON_FOLIO in filemap_unaccount_folio() with CONFIG_DEBUG_VM enabled, or bad page without CONFIG_DEBUG_VM.

v1: mm: writeback: ratelimit stat flush from mem_cgroup_wb_stats

One of our workloads (Postgres 14) has regressed when migrated from 5.10 to 6.1 upstream kernel. The regression can be reproduced by sysbench’s oltp_write_only benchmark. It seems like the always on rstat flush in mem_cgroup_wb_stats() is causing the regression. So, rate limit that specific rstat flush. One potential consequence would be the dirty throttling might be decided on stale memcg stats.

v1: fs: binfmt_elf_efpic: don’t use missing interpreter’s properties

Static FDPIC executable may get an executable stack even when it has non-executable GNU_STACK segment. This happens when STACK segment has rw permissions, but does not specify stack size. In that case FDPIC loader uses permissions of the interpreter’s stack, and for static executables with no interpreter it results in choosing the arch-default permissions for the stack.

v1: kasan: introduce mem track feature

KASAN is a tools for detecting memory bugs like out-of-bounds and use-after-free. In Generic KASAN mode, it use shadow memory to record the accessible information of the memory. After we allocate a memory from kernel, the shadow memory corresponding to this memory will be marked as accessible. In our daily development, memory problems often occur. If a task accidentally modifies memory that does not belong to itself but has been allocated, some strange phenomena may occur.

v4: hugetlb: parallelize hugetlb page init on boot

This version is tested on next-20240112.
Update Summary:
Make padata_do_multithreaded dispatch all jobs with a global iterator
Revise commit message
Rename some functions
Collect Tested-by and Reviewed-by

v1: writeback: move wb_wakeup_delayed defination to fs-writeback.c

The wb_wakeup_delayed is only used in fs-writeback.c. Move it to fs-writeback.c after defination of wb_wakeup and make it static.

v1: writeback: avoid to move skipped wb in offline_cgwbs list

There is no need to move skipped wb to local list. Only move wb which is going to be cleanup to avoid unnecessary work.

v1: RFC: zswap tree use xarray instead of RB tree

The RB tree shows some contribution to the swap fault long tail latency due to two factors: 1) RB tree requires re-balance from time to time. 2) The zswap RB tree has a tree level spin lock protecting the tree access.

v3: kexec: Allow preservation of ftrace buffers

Kexec today considers itself purely a boot loader: When we enter the new kernel, any state the previous kernel left behind is irrelevant and the new kernel reinitializes the system.

v2: mm: memory: move mem_cgroup_charge() into alloc_anon_folio()

mem_cgroup_charge() uses the GFP flags in a fairly sophisticated way. In addition to checking gfpflags_allow_blocking(), it pays attention to __GFP_NORETRY and __GFP_RETRY_MAYFAIL to ensure that processes within this memcg do not exceed their quotas. Using the same GFP flags ensures that we handle large anonymous folios correctly, including falling back to smaller orders when there is plenty of memory available in the system but this memcg is close to its limits.

v3: tools/mm: Add thpmaps script to dump THP usage info

With the proliferation of large folios for file-backed memory, and more recently the introduction of multi-size THP for anonymous memory, it is becoming useful to be able to see exactly how large folios are mapped into processes.

v1: mm/zswap: Improve with alloc_workqueue() call

The core-api create_workqueue is deprecated, this patch replaces the create_workqueue with alloc_workqueue. The previous implementation workqueue of zswap was a bounded workqueue, this patch uses alloc_workqueue() to create an unbounded workqueue.

v1: selftests/mm: run_vmtests.sh: add missing tests

Add missing tests to run_vmtests.sh. The mm kselftests are run through run_vmtests.sh. If a test isn’t present in this script, it’ll not run with run_tests or make -C tools/testing/selftests/mm run_tests.

v4: x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

In a CoCo VM, when transitioning memory from encrypted to decrypted, or vice versa, the caller of set_memory_encrypted() or set_memory_decrypted() is responsible for ensuring the memory isn’t in use and isn’t referenced while the transition is in progress. The transition has multiple steps, and the memory is in an inconsistent state until all steps are complete.

v1: reading proc/pid/maps under RCU

The issue this patchset is trying to address is mmap_lock contention when a low priority task (monitoring, data collecting, etc.) blocks a higher priority task from making updated to the address space. The contention is due to the mmap_lock being held for read when reading proc/pid/maps. With maple_tree introduction, VMA tree traversals are RCU-safe and per-vma locks make VMA access RCU-safe. this provides an opportunity for lock-less reading of proc/pid/maps.

v1: readahead: use ilog2 instead of a while loop in page_cache_ra_order()

A while loop is used to adjust the new_order to be lower than the ra->size. ilog2 could be used to do the same instead of using a loop.

v1: uprobes: use pagesize-aligned virtual address when replacing pages

uprobes passes an unaligned page mapping address to folio_add_new_anon_rmap(), which ends up triggering a VM_BUG_ON() we recently extended in commit 372cbd4d5a066 (“mm: non-pmd-mappable, large folios for folio_add_new_anon_rmap()”).

v1: mm, pcp: add high order page info in /proc/zoneinfo

With /proc/zoneinfo we can simply get the number of pages used each cpu, but we can’t get more detailed information about the distribution of those pages, such as the count of high order pages, through these patches, we can know the usage of each order page in detail, which will be helpful for us to analyze the pcp memory usage of application on the related cpus.

v6: Reduce TLB flushes by 94% by improving folio migration

I’m sorry for the spam-like consecutive posting. I introduced build errors in v5, in case of CONFIG_MIGRATION disabled or CONFIG_HWPOISON_INJECT moduled. I’m reposting the fixed version.

v1: DAMON based 2-tier memory management for CXL memory

There was an RFC IDEA “DAMOS-based Tiered-Memory Management” previously posted at [1].
It says there is no implementation of the demote/promote DAMOS action are made. This RFC is about its implementation for physical address space.

文件系统

v3: Try exact-match comparison ahead of case-insensitive match

Linus, Al, Eric,
This small series implement the exact-match comparison ahead of the case-insensitive comparison as suggested by Linus. The first patch only exposes dentry_string_cmp in a header file so we can use it instead of memcmp and the second actually do the optimization in the case-insensitive comparison code.

v3: Set casefold/fscrypt dentry operations through sb->s_d_op

The only difference of v3 from v2 is a fix from an issue reported by kernel test robot in patch 4. Please consider this version instead.

v2: mm/mempolicy: weighted interleave mempolicy and sysfs extension

Can you please replace the patches on mm-unstable with this line, it has bulk-allocator bug fixes and some design changes at the request of Ying Huang. Full v2 notes are just before the test info.

v2: ovl: require xwhiteout feature flag on layer roots

Add a check on each lower layer for the xwhiteout feature. This prevents unnecessary checking the overlay.whiteouts xattr when reading a directory if this feature is not enabled, i.e. most of the time.

v3: tracing: Support to dump instance traces by ftrace_dump_on_oops

Currently ftrace only dumps the global trace buffer on an OOPs. For debugging a production usecase, instance trace will be helpful to check specific problems since global trace buffer may be used for other purposes.

GIT PULL: BPF token for v6.8

This is BPF token patches freshly rebased onto latest bpf/master with feedback received on last revision addressed and changes applied to appropriate patches. Plus a few more selftests are added around LSM and BPF token interactions.

GIT PULL: xfs: More code changes for 6.8

Please pull this branch containing a bug fix for xfs for 6.8-rc1.
The following changes since commit bcdfae6ee520b665385020fa3e47633a8af84f12:
xfs: use the op name in trace_xlog_intent_recovery_failed (2023-12-29 13:37:05 +0530)
are available in the Git repository at:
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git xfs-6.8-merge-4

v1: pagecache_isize_extended

I hadn’t looked at pagecache_isize_extended() before, and I’m not entirely sure it’s doing the right thing for large folios. Usually we decline to create folios which extend past EOF [1], and we try to split folios on truncate. But folio splitting can fail, and I think we might run into problems with a store to a folio which straddles i_size.

v3: fsnotify: optimize the case of no parent watcher

If parent inode is not watching, check for the event in masks of sb/mount/inode masks early to optimize out most of the code in __fsnotify_parent() and avoid calling fsnotify().

v1: buffer: Use KMEM_CACHE instead of kmem_cache_create()

Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches.

v3: eventfd: move ‘eventfd-count’ printing out of spinlock

When printing eventfd->count, interrupts will be disabled and a spinlock will be obtained, competing with eventfd_write(). By moving the “eventfd-count” print out of the spinlock and merging multiple seq_printf() into one, it could improve a bit, just like timerfd_show().

v9: security: Move IMA and EVM to the LSM infrastructure

IMA and EVM are not effectively LSMs, especially due to the fact that in the past they could not provide a security blob while there is another LSM active.

v1: proc: proc_sysctl: Optimize insert_links()

Optimize the err variable assignment location so that the err variable is manually modified when an error occurs.

v1: proc: proc_sysctl: Optimize proc_sys_fill_cache() variable

The ino and type variables are assigned values before use, and they do not need to be assigned values when defined.

v1: blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion. Earlier, polling task used to sleep, relying on interrupt to wake it up. This made some IO take very long when interrupt-coalescing is enabled in NVMe.

网络设备

v1: net: dsa: mv88e6xxx: Make unsupported C45 reads return 0xffff

When there is no device on the bus for a given address, the pull up resistor on the data line results in the read returning 0xffff. The phylib core code understands this when scanning for devices on the bus, and a number of MDIO bus masters make use of this as a way to indicate they cannot perform the read.

v6: convert write_threads, write_version and write_ports to netlink commands

Introduce write_threads, write_version and write_ports netlink commands similar to the ones available through the procfs.

v2: net-next: tcp: add support for read with offset when using MSG_PEEK

When reading received messages from a socket with MSG_PEEK, we may want to read the contents with an offset, like we can do with pread/preadv() when reading files. Currently, it is not possible to do that.

v1: iproute2: vxlan: add support for flowlab inherit

By default, VXLAN encapsulation over IPv6 sets the flow label to 0, with an option for a fixed value. This commits add the ability to inherit the flow label from the inner packet, like for other tunnel implementations. This enables devices using only L3 headers for ECMP to correctly balance VXLAN-encapsulated IPv6 packets.

v1: nfc: hci: Save a few bytes of memory when registering a ‘nfc_llc’ engine

nfc_llc_register() calls pass a string literal as the ‘name’ parameter.
So kstrdup_const() can be used instead of kfree() to avoid a memory allocation in such cases.

v2: iwl: i40e: print correct hw max rss count in kernel ring buffer

pf->rss_size_max is hardcoded and always prints max rss count as 64.
Eg:kernel: i40e 0000:af:00.1: User requested queue count/HW max RSS count: 104/64
whereas ethtool reports the correct value from “vsi->num_queue_pairs”

v1: net: selftest: Don’t reuse port for SO_INCOMING_CPU test.

Jakub reported that ASSERT_EQ(cpu, i) in so_incoming_cpu.c seems to fire somewhat randomly.

v1: RFC: Allow busy poll to be set per epoll instance

Greetings:
TL;DR This RFC builds on bf3b9f6372c4 (“epoll: Add busy poll support to epoll with socket fds.”) by adding two fcntl knobs for enabling epoll-based busy poll on a per epoll basis instead of the current system-wide sysctl. This change makes epoll-based busy poll much more usable.

v1: iwl-net: ice: Add check for lport extraction to LAG init

To fully support initializing the LAG support code, a DDP package that extracts the logical port from the metadata is required. If such a package is not present, there could be difficulties in supporting some bond types.

v4: tcp: Add memory barrier to tcp_push()

On CPUs with weak memory models, reads and updates performed by tcp_push to the sk variables can get reordered leaving the socket throttled when it should not. The tasklet running tcp_wfree() may also not observe the memory updates in time and will skip flushing any packets throttled by tcp_push(), delaying the sending. This can pathologically cause 40ms extra latency due to bad interactions with delayed acks.

v1: iproute2: tc: unify clockid handling

There are three places in tc which all have same code for handling clockid (copy/paste). Move it into tc_util.c.

v1: i40e: print correct hw max rss count in kernel ring buffer

The value printed for “HW max RSS count” is wrong in kernel dmesg for i40e NICs:
… i40e 0000:63:00.0: User requested queue count/HW max RSS count: 48/64
whereas ethtool reports the correct value from “vsi->num_queue_pairs”

v6: ipsec-next: xfrm: introduce forwarding of ICMP Error messages

This commit aligns with RFC 4301, Section 6, and addresses the requirement to forward unauthenticated ICMP error messages that do not match any xfrm policies. It utilizes the ICMP payload as an skb and performs a reverse lookup. If a policy match is found, forward the packet.

v1: net-next: nl80211/cfg80211: add nla_policy for S1G band

Our detector has identified another case of an incomplete policy. Specifically, the commit df78a0c0b67d (“nl80211: S1G band and channel definitions”) introduced the NL80211_BAND_S1GHZ attribute to nl80211_band, but it neglected to update the nl80211_match_band_rssi_policy accordingly.

v1: net-next: neighbour: complement nl_ntbl_parm_policy

In the neightbl_set function, the attributes array is parsed and validated using the nl_ntbl_parm_policy policy. However, this policy overlooks the NDTPA_QUEUE_LENBYTES attribute since the commit 6b3f8674bccb (“[NEIGH]: Convert neighbour table modification to new netlink api”). As a result, no validation is performed when accessing the NDTPA_QUEUE_LENBYTES attribute.

GIT PULL: BPF token for v6.8

This is BPF token patches freshly rebased onto latest bpf/master with feedback received on last revision addressed and changes applied to appropriate patches. Plus a few more selftests are added around LSM and BPF token interactions.

GIT PULL: Networking for v6.8-rc1

The following changes since commit 3e7aeb78ab01c2c2f0e1f784e5ddec88fcd3d106:
Merge tag ‘net-next-6.8’ of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next (2024-01-11 10:07:29 -0800)

v1: net: idpf: distinguish vports by the dev_port attribute

idpf registers multiple netdevs (virtual ports) for one PCI function, but it does not provide a way for userspace to distinguish them with sysfs attributes. Per Documentation/ABI/testing/sysfs-class-net, it is a bug not to set dev_port for independent ports on the same PCI bus, device and function.

v1: net: llc: make llc_ui_sendmsg() more robust against bonding changes

syzbot was able to trick llc_ui_sendmsg(), allocating an skb with no headroom, but subsequently trying to push 14 bytes of Ethernet header [1]

v2: net: vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING

In the vlan_changelink function, a loop is used to parse the nested attributes IFLA_VLAN_EGRESS_QOS and IFLA_VLAN_INGRESS_QOS in order to obtain the struct ifla_vlan_qos_mapping. These two nested attributes are checked in the vlan_validate_qos_map function, which calls nla_validate_nested_deprecated with the vlan_map_policy.

v2: net-next: Implement irq_domain for TXGBE

Implement irq_domain for the MAC interrupt and handle the sub-irqs.

v6: iproute2: ss: pretty-printing BPF socket-local storage

BPF allows programs to store socket-specific data using BPF_MAP_TYPE_SK_STORAGE maps. The data is attached to the socket itself, and Martin added INET_DIAG_REQ_SK_BPF_STORAGES, so it can be fetched using the INET_DIAG mechanism.

v2: Introduce switch mode support for ICSSG driver

This series adds support for switch-mode for ICSSG driver. This series also introduces helper APIs to configure firmware maintained FDB (Forwarding Database) and VLAN tables. These APIs are later used by ICSSG driver in switch mode.

v1: NCSI: Add propety: no-channel-monitor and start-redo-probe

Add property start-redo-probe to redo probe, because Mellanox cx7 nic card cannot’t get mac address after nic card hot-plug. Setup start-redo-probe property so that nic card can get MAC address again. Also setup no-channel-monitor property so that the log won’t keep popping up when nic card host-plug.

v1: net: mvpp2: Add EEE get/set to mvpp2 driver

Fill in the missing .get_eee and .set_eee functions for the mvpp2 driver.

v4: net: tcp: make sure init the accept_queue’s spinlocks once

When I run syz’s reproduction C program locally, it causes the following issue: pvqspinlock: lock 0xffff9d181cd5c660 has corrupted value 0x0! WARNING: CPU: 19 PID: 21160 at __pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508) Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011

v1: net: selftests: bonding: Increase timeout to 1200s

I ran the test in slightly different VMs (including one without HW virtualization support) and got runtimes of 13m39.760s, 13m31.238s, and 13m2.956s. Use a 1.5x “safety factor” and set the timeout to 1200s.

v7: net_sched: Introduce eBPF based Qdisc

I am continuing the work of ebpf-based Qdisc based on Cong’s previous RFC. The followings are some use cases of eBPF Qdisc

v1: SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()

Use the proper size when setting up the bio_vec, as otherwise only zero-length UDP packets will be sent.

v1: net: i40e: Include types.h to some headers

Commit 56df345917c0 (“i40e: Remove circular header dependencies and fix headers”) redistributed a number of includes from one large header file to the locations they were needed. In some environments, types.h is not included and causing compile issues. The driver should not rely on implicit inclusion from other locations; explicitly include it to these files.

v2: Add support for ICSSG-based Ethernet on SR1.0 devices

This series extends the current ICSSG-based Ethernet driver to support Silicon Revision 1.0 devices.
Notable differences between the Silicon Revisions are that there is no TX core in SR1.0 with this being handled by the firmware, requiring extra DMA channels to communicate commands to the firmware (with the firmware being different as well) and in the packet classifier.

v1: PCI: introduce the concept of power sequencing of PCIe devices

The responses to the RFC were rather positive so here’s a proper series.
During last year’s Linux Plumbers we had several discussions centered around the need to power-on PCI devices before they can be detected on the bus.
The consensus during the conference was that we need to introduce a class of “PCI slot drivers” that would handle the power-sequencing.

v1: net-next: vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING

In the vlan_changelink function, a loop is used to parse the nested attributes IFLA_VLAN_EGRESS_QOS and IFLA_VLAN_INGRESS_QOS in order to obtain the struct ifla_vlan_qos_mapping. These two nested attributes are checked in the vlan_validate_qos_map function, which calls nla_validate_nested_deprecated with the vlan_map_policy.

安全增强

v1: accel/habanalabs: use kcalloc() instead of kzalloc()

As noted in the “Deprecated Interfaces, Language Features, Attributes, and Conventions” documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.

v1: MIPS: Alchemy: Use kcalloc() instead of kzalloc()

As noted in the “Deprecated Interfaces, Language Features, Attributes, and Conventions” documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.

v1: Documentation: power: Use kcalloc() instead of kzalloc()

As noted in the “Deprecated Interfaces, Language Features, Attributes, and Conventions” documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.

v1: wifi: iwlegacy: Use kcalloc() instead of kzalloc()

As noted in the “Deprecated Interfaces, Language Features, Attributes, and Conventions” documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.

v1: pstore/ram_core: Improve exception handling in persistent_ram_new()

Date: Thu, 18 Jan 2024 14:57:21 +0100
Omit an initialisation (for the variable “ret”) which became unnecessary with this refactoring because a memory allocation failure will be directly indicated by a corresponding return statement in an if branch.

v1: pstore/zone: Add a null pointer check to the psz_kmsg_read

kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Ensure the allocation was successful by checking the pointer validity.

v4: Add device tree for IBM system1 BMC

This patchset adds device tree for IBM system1 bmc board.

v1: perf/x86/amd/uncore: Use kcalloc() instead of kzalloc()

As noted in the “Deprecated Interfaces, Language Features, Attributes, and Conventions” documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.

v2: eventfs: Use kcalloc() instead of kzalloc()

As noted in the “Deprecated Interfaces, Language Features, Attributes, and Conventions” documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.

v2: scsi: csiostor: Use kcalloc() instead of kzalloc()

Use 2-factor multiplication argument form kcalloc() instead of kzalloc().
Also, it is preferred to use sizeof(*pointer) instead of sizeof(type) due to the type of the variable can change and one needs not change the former (unlike the latter).

异步 IO

v2: iouring:added boundary value check for io_uring_group systl

/proc/sys/kernel/io_uring_group takes gid as input added boundary value check to accept gid in range of 0<=gid<=4294967294 & Documentation is updated for same

v7: io_uring: Statistics of the true utilization of sq threads.

Count the running time and actual IO processing time of the sqpoll thread, and output the statistical data to fdinfo.
Variable description: “work_time” in the code represents the sum of the jiffies of the sq thread actually processing IO, that is, how many milliseconds it actually takes to process IO. “total_time” represents the total time that the sq thread has elapsed from the beginning of the loop to the current time point, that is, how many milliseconds it has spent in total.

v1: io_uring/register: guard compat syscall with CONFIG_COMPAT

Add compat.h include to avoid a potential build issue:
io_uring/register.c:281:6: error: call to undeclared function ‘in_compat_syscall’; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]

Rust For Linux

v1: rust: kernel: documentation improvements

This patch set aims to make small improvements to the documentation of the kernel crate. It engages in a few different activities:
fixing trivial typos (commit #1)
updating code examples to better reflect an idiomatic coding style (commits #2,6)
increasing the consistency within the crate’s documentation as a whole (commits #3,5,7,8,9,12,13)
adding more intra-doc links as well as srctree-relative links to C header files (commits #4,10,11)

v1: rust: task: use safe current! macro

Refactor the Task::pid_in_current_ns() to use the safe abstraction current!() instead of the unsafe bindings::get_current() binding.

v1: rust: task: add as_raw() to Task

Added new function Task::as_raw() which returns the raw pointer for the underlying task struct. I also refactored Task to instead use the newly created function instead of self.0.get() as I feel like self.as_raw() is more intuitive.

BPF

v17: bpf-next: Registrating struct_ops types from modules

Given the current constraints of the current implementation, struct_ops cannot be registered dynamically. This presents a significant limitation for modules like coming fuse-bpf, which seeks to implement a new struct_ops type. To address this issue, a new API is introduced that allows the registration of new struct_ops types from modules.

v1: bpf-next: bpftool: add support for split BTF to gen min_core_btf

Enables a user to generate minimized kernel module BTF.
If an eBPF program probes a function within a kernel module or uses types that come from a kernel module, split BTF is required. The split module BTF contains only the BTF types that are unique to the module. It will reference the base/vmlinux BTF types and always starts its type IDs at X+1 where X is the largest type ID in the base BTF.

v1: bpf-next: libbpf: call dup2() syscall directly

We’ve ran into issues with using dup2() API in production setting, where libbpf is linked into large production environment and ends up calling uninteded custom implementations of dup2(). These custom implementations don’t provide atomic FD replacement guarantees of dup2() syscall, leading to subtle and hard to debug issues.

v1: bpf-next: Enable the inline of kptr_xchg for arm64

The patch set is just a follow-up for “bpf: inline bpf_kptr_xchg()”. It enables the inline of bpf_kptr_xchg() and kptr_xchg_inline test for arm64.

GIT PULL: BPF token for v6.8

This is BPF token patches freshly rebased onto latest bpf/master with feedback received on last revision addressed and changes applied to appropriate patches. Plus a few more selftests are added around LSM and BPF token interactions.

GIT PULL: Networking for v6.8-rc1

The following changes since commit 3e7aeb78ab01c2c2f0e1f784e5ddec88fcd3d106:
Merge tag ‘net-next-6.8’ of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next (2024-01-11 10:07:29 -0800)

v1: bpf-next: bpf: Define struct bpf_tcp_req_attrs when CONFIG_SYN_COOKIES=n.

kernel test robot reported the warning below:
net/core/filter.c:11842:13: warning: declaration of ‘struct bpf_tcp_req_attrs’ will not be visible outside of this function [-Wvisibility]
Let’s move struct bpf_tcp_req_attrs definition outside of CONFIG_SYN_COOKIES guard.

v1: bpf-next: bpf: Add cookies retrieval for perf/kprobe multi links

this patchset adds support to retrieve cookies from existing tracing links that still did not support it plus changes to bpftool to display them. It’s leftover we discussed some time ago [1].

v3: bpf: Tighten up arg:ctx type enforcement

We also adjust selftests to do similar feature detection (much simpler, but potentially breaking due to kernel source code refactoring, which is fine for selftests), and skip tests expecting libbpf’s BTF type rewrites.

v16: bpf-next: Registrating struct_ops types from modules

Given the current constraints of the current implementation, struct_ops cannot be registered dynamically. This presents a significant limitation for modules like coming fuse-bpf, which seeks to implement a new struct_ops type. To address this issue, a new API is introduced that allows the registration of new struct_ops types from modules.

v1: bpf: Refactor ptr alu checking rules to allow alu explicitly

Current checking rules are structured to disallow alu on particular ptr types explicitly, so default cases are allowed implicitly. This may lead to newly added ptr types being allowed unexpectedly. So restruture it to allow alu explicitly. The tradeoff is mainly a bit more cases added in the switch. The following table from Eduard summarizes the rules

v3: bpf-next: bpf: Add bpf_iter_cpumask

Three new kfuncs, namely bpf_iter_cpumask_{new,next,destroy}, have been added for the new bpf_iter_cpumask functionality. These kfuncs enable the iteration of percpu data, such as runqueues, system_group_pcpu, and more.

v1: net-next: virtio-net: support AF_XDP zero copy (3/3)

This is the third part of virtio-net support AF_XDP zero copy.
The whole patch set http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com

周边技术动态

Qemu

v2: target/riscv: Add support for Zaamo & Zalrsc

Introduce support for the proposed new (fast-track) Zaamo and Zalrsc extensions [1] which represent the AMO and LR/SC subsets of the A extension.
The motivation for the subsets being available separately is that certain classes of CPUs may choose to only implement a subset for architectural convenience.

v2: RISC-V: ACPI: Enable SPCR

This series focuses on enabling the Serial Port Console Redirection (SPCR) table for the RISC-V virt platform. Considering that ARM utilizes the same function, the initial patch involves migrating the build_spcr function to common code. This consolidation ensures that RISC-V avoids duplicating the function.

U-Boot

v1: riscv: Support building with Clang

This is a minimal patchset for making U-Boot build with Clang on RISC-V, something I stumbled upon while writing U-Boot build scripts for SerenityOS’s RISC-V port. The only change is a (for unclear reasons…)

Pull request efi-2024-04-rc1-2

The following changes since commit 043ca8c8a9b181cf6f17441e9b89b5ee33206309:
Merge tag ‘qcom-2024.04-rc1’ of https://gitlab.denx.de/u-boot/custodians/u-boot-snapdragon (2024-01-16
are available in the Git repository at:
https://source.denx.de/u-boot/custodians/u-boot-efi.git tags/efi-2024-04-rc1-2
for you to fetch changes up to 21c856797e2735fbd4e8b900803e6c42eae8d434

v2: riscv: sophgo: milkv_duo: add support for Milk-V Duo board

The Milk-V Duo board is built upon Sophgo’s CV1800B SoC, featuring two XuanTie C906 CPUs running at 1.0GHz and 700MHz, respectively.

[置顶] 泰晓 RISC-V 实验箱，配套 30+ 讲嵌入式 Linux 系统开发公开课

RISC-V Linux 内核及周边技术动态第 75 期

内核动态

RISC-V 架构支持

进程调度

内存管理

文件系统

网络设备

安全增强

异步 IO

Rust For Linux

BPF

周边技术动态

Qemu

U-Boot

猜你喜欢：

Read Album:

Read Related:

Read Latest:

支付宝打赏￥9.68元		微信打赏￥9.68元
	请作者喝杯咖啡吧