[置顶] 泰晓 RISC-V 实验箱,配套 30+ 讲嵌入式 Linux 系统开发公开课
RISC-V Linux 内核及周边技术动态第 92 期
时间:20240519
编辑:晓瑜
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS
内核动态
RISC-V 架构支持
v2: bpf-next: Use bpf_prog_pack for RV64 bpf trampoline
We used bpf_prog_pack to aggregate bpf programs into huge page to relieve the iTLB pressure on the system.
v2: dt-bindings: interrupt-controller: riscv,cpu-intc: convert to dtschema
Convert the RISC-V Hart-Level Interrupt Controller (HLIC) to newer DT schema, Created DT schema based on the .txt file which had `compatible`, `#interrupt-cells` and `interrupt-controller` as required properties.
v4: bpf-next: Add 12-argument support for RV64 bpf trampoline
This patch adds 12 function arguments support for riscv64 bpf trampoline.
v5: Add support for a few Zc* extensions, Zcmop and Zimop
Add support for (yet again) more RVA23U64 missing extensions. Add support for Zimop, Zcmop, Zca, Zcf, Zcd and Zcb extensions ISA string parsing, hwprobe and kvm support.
This adds I2C support in the device tree of the T-Head TH1520 RISCV-SoC and a default configuration for the BeagleV-Ahead.
v1: Add the core reset for UARTs of StarFive JH7110
The UART of StarFive JH7110 needs two reset signals (apb, core) to initialize. This patch series adds the missing core reset.
v3: riscv, bpf: Optimize zextw insn with Zba extension
The Zba extension provides add.uw insn which can be used to implement zext.w with rs2 set as ZERO.
v1: riscv: Allow vlenb to be probed from DT
Adding vlenb to the DT will allow the kernel to detect the inconsistency early and not waste time trying to boot harts that it doesn’t support.
v1: riscv: Separate vendor extensions from standard extensions
All extensions, both standard and vendor, live in one struct “riscv_isa_ext”. Allows each vendor to be conditionally enabled through Kconfig.
v1: RISC-V: separate Zbb optimisations requiring and not requiring toolchain support
Zbb support has always depended on alternatives, so while adjusting the config options guarding optimisations, remove any checks for whether or not alternatives are enabled.
[PATCH v4 0/2 RESEND] Add StarFive’s StarLink Cache Controller
StarFive’s StarLink Cache Controller flush/invalidates cache using non- conventional RISC-V Zicbom extension instructions. This driver provides the cache handling on StarFive RISC-V SoC.
v5: Linux RISC-V IOMMU Support
This patch series introduces support for RISC-V IOMMU architected hardware into the Linux kernel.
v2: riscv: Memory Hot(Un)Plug support
v1: riscv: Extend sv39 linear mapping max size to 128G
This harmonizes all virtual addressing modes which can now all map (PGDIR_SIZE * PTRS_PER_PGD) / 4 of physical memory.
v2: Add support for GPIO based CS
The Microchip PolarFire SoC SPI “hard” controller supports eight chip selects. However, only one chip select is physically wired. Therefore, use GPIO descriptors to configure additional chip select lines.
进程调度
v1: sched: Adjust affinity according to change of housekeeping cpumask
The housekeeping CPU masks, set up by the “isolcpus” and “nohz_full” boot command line options, are used at boot time to exclude selected CPUs from running some kernel housekeeping facilities to minimize disturbance to latency sensitive userspace applications such as DPDK.
内存管理
v1: mm: batch unlink_file_vma calls in free_pgd_range
Execs of dynamically linked binaries at 20-ish cores are bottlenecked on the i_mmap_rwsem semaphore, while the biggest singular contributor is free_pgd_range inducing the lock acquire back-to-back for all consecutive mappings of a given file.
v2: Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)
This is the continuation of the RFC v1 series “Reimplement huge pages without hugepd on powerpc 8xx”. It now get rid of hugepd completely after handling also e500 and book3s/64
v1: kasan, fortify: properly rename memintrinsics
After commit 69d4c0d32186 (“entry, kasan, x86: Disallow overriding mem*() functions”) and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled, even though the compiler instruments meminstrinsics by generating calls to _asan/__hwasan prefixed functions, FORTIFY_SOURCE still uses uninstrumented memset/memmove/memcpy as the underlying functions.
v3: mm/huge_memory: don’t unpoison huge_zero_folio
When I did memory failure tests recently, below panic occurs:
v1: mm/cma: get nid from physical address
The nid passed to cma_declare_contiguous_nid() may be NUMA_NO_NODE, which is not the actual nid. To get the correct nid, we can get the nid from physical address.
v1: percpu_counter: reimplement _add_batch with __this_cpu_cmpxchg
This replaces the expensive cli/sti pair with not-lock-prefixed cmpxchg.
v1: tmpfs: don’t interrupt fallocate with EINTR
Have a program that sets up a periodic timer with 10ms interval. Change the signal_pending() check in shmem_fallocate() loop to fatal_signal_pending(). This solves the problem of shmem_fallocate() constantly restarting.
v1: mm: convert to folio_alloc_mpol()
v1: mm: refactor folio_undo_large_rmappable()
All folio_undo_large_rmappable() callers will check folio_test_large() which already checked by folio_order(), so only add the check folio_test_large_rmappable() into the function to avoid repeated calls.
v1: [LSF/MM/BPF RFC] shmem/tmpfs: add large folios support
In preparation for the LSF/MM/BPF 2024 discussion , the patches below add support for large folios in shmem for the write and fallocate paths.
v2: mm/huge_memory: mark racy access on huge_anon_orders_always
huge_anon_orders_always is accessed lockless, it is better to use the READ_ONCE() wrapper. This is not fixing any visible bug, hopefully this can cease some KCSAN complains in the future. Also do that for huge_anon_orders_madvise.
v1: KVM: SEV: Replace KVM_EXIT_VMGEXIT with KVM_EXIT_SNP_REQ_CERTS
So, rather than trying to anticipate future use-cases and have a single union structure to manage the associated parameters, just use a common KVM_EXIT_SNP_* prefix, but otherwise treat these as separate events, and go ahead and convert the only VMGEXIT type currently defined, KVM_USER_VMGEXIT_REQ_CERTS, over to KVM_EXIT_SNP_REQ_CERTS.
v1: introduce precised blk-throttle control
This series patches would like to introduce the helper function to provide the bytes budgt and apply it on readahead.
v1: mm: vmscan: restore incremental cgroup iteration
Currently, reclaim always walks the entire cgroup tree in order to ensure fairness between groups. While overreclaim is limited in shrink_lruvec(), many of our systems have a sizable number of active groups, and an even bigger number of idle cgroups with cache left behind by previous jobs; the mere act of walking all these cgroups can impose significant latency on direct reclaimers. The shared iterator state is maintaned inside the target cgroup, so fair and incremental walks are performed during both global reclaim and cgroup limit reclaim of complex subtrees.
v1: -next: memcg: don’t handle event_list for v2 when offlining
The event_list for memcg is only valid for v1 and not used for v2, so it’s unnessesary to handle event_list for v2.
v1: mm: add missing MODULE_DESCRIPTION() macros
This fixes the instances of “WARNING: modpost: missing MODULE_DESCRIPTION()” that I’m seeing in ‘mm’.
v1: memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING`
`MFD_NOEXEC_SEAL` should remove the executable bits and set `F_SEAL_EXEC` to prevent further modifications to the executable bits as per the comment in the uapi header file:
not executable and sealed to prevent changing to executable
v5: RESEND: Reclaim lazyfree THP without splitting
This series adds support for reclaiming PMD-mapped THP marked as lazyfree without needing to first split the large folio via split_huge_pmd_address().
v2: add mTHP support for anonymous shmem
Anonymous pages have already been supported for multi-size (mTHP) allocation through commit 19eaf44954df, that can allow THP to be configured through the sysfs interface located at ‘/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled’.
v5: Reclaim lazyfree THP without splitting
This series adds support for reclaiming PMD-mapped THP marked as lazyfree without needing to first split the large folio via split_huge_pmd_address().
v2: IDEA: mm/damon: introduce Access/Contiguity-aware Memory Auto-scaling (ACMA)
Extend DAMOS for access-aware gradual contiguous memory regions allocation, and implement a module for efficiently and automatically scaling system memory using the feature.
This is not a valid patchset but a summary of the idea and pseudo-code level partial implementation examples of the idea. The implementation examples are only for helping people’s understanding of the idea and how it would be implemented. The code is not tested at all. It is even not attempted to be compiled ever.
v4: DAMON based tiered memory management for CXL memory
It says there is no implementation of the demote/promote DAMOS action are made. This RFC is about its implementation for physical address space.
v1: mm/rmap: optimize folio_move_anon_rmap()
The above changes may improve the performance of vm faults in some scenarios, because the performance loss caused by WRITE_ONCE() is much more than the performance loss caused by add a judgment.
v12: mm: report per-page metadata information
we want to describe the amount of memory that is going towards per-page metadata, which can vary depending on build configuration, machine architecture, and system use.
v1: linux-next: mm/huge_memory: mark racy access on huge_anon_orders_always
This is not fixing any visible bug, hopefully this can cease some KCSAN complains in the future. Also do that for huge_anon_orders_madvise.
文件系统
v5: ext4: support adding multi-delalloc blocks
v2: iomap: avoid redundant fault_in_iov_iter_readable() judgement when use larger chunks
So this will get a correct bytes before fault_in_iov_iter_readable() to let iomap work well in non-large folio case.
v3: Rust PuzzleFS filesystem driver
This series is the third version of the proof-of-concept PuzzleFS filesystem driver, an open-source next-generation container filesystem , designed to address the limitation of the existing OCI format. It supports direct mounting of container filesystems without an extraction step, and it uses content defined chunking (CDC) in order to often achieve substantial disk space savings.
v1: fs: nls: add missing MODULE_DESCRIPTION() macros
Fix the following allmodconfig “make W=1” issues
GIT PULL: sysctl changes for v6.10-rc1
Summary
- Removed sentinel elements from ctl_table structs in kernel/*
v1: Introduce user namespace capabilities
It’s that time of the year again where we debate security settings for user namespaces.This also serves as a good foundation and could always be extended if the need arises in the future.
v1: hostfs: convert hostfs to use the new mount API
Convert the hostfs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem.
v1: zonefs: move super block reading from page to folio
Move reading of the on-disk superblock from page to kmalloc()ed memory.
This series introduces Rust abstractions that allow read-only file systems to be written in Rust.
v1: zonefs: enable support for large folios
Enable large folio support on zonefs.
v1: hostfs: convert hostfs to use the new mount api
Convert the hostfs filesystem to use the new mount API.
v1: genirq/proc: Speed up show_interrupts()
Since there are irq number allocation holes, we can jump over those holes in order to speed up show_interrupts(). In addition, the percpu kstat_irqs access logic can be refined.
v1: Expose raw access to GuC log over debugfs
We already provide the content of the GuC log in debugsfs, but it is in a text format where each log dword is printed as hexadecimal number, which does not scale well with large GuC log buffers.
v2: vfs: move dentry shrinking outside the inode lock in ‘rmdir()’
Yafang Shao reports that he has seen loads that generate billions of negative dentries in a directory, which then when the directory is removed causes excessive latencies for other users because the dentry shrinking is done under the directory inode lock.
There seems to be no actual reason for holding the inode lock any more by the time we get rid of the now uninteresting negative dentries, and it’s an effect of the calling convention.
v1: -next: fs: fsconfig: intercept for non-new mount API in advance for FSCONFIG_CMD_CREATE_EXCL
fsconfig with FSCONFIG_CMD_CREATE_EXCL command requires the new mount api, here we should return -EOPNOTSUPP in advance to avoid extra procedure.
v1: blk: optimization for classic polling
This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion. Earlier, polling task used to sleep, relying on interrupt to wake it up. This made some IO take very long when interrupt-coalescing is enabled in NVMe.
网络设备
v1: net: set struct net_device::name earlier
Make name copying much earlier for smoother debugging experience.
v1: net-next: Add CPSW Proxy Client driver
This series introduces the CPSW Proxy Client driver to interface with Ethernet Switch Firmware (EthFw) running on a remote core on TI’s K3 SoCs. Further details are in patch 01/28 which adds documentation for the driver and describes the intended use-case, design and execution of the driver.
v2: bpf-next: netfilter: Add the capability to offload flowtable in XDP layer
This series has been tested running the xdp_flowtable_offload eBPF program on an ixgbe 10Gbps NIC (eno2) in order to XDP_REDIRECT the TCP traffic to a veth pair (veth0-veth1) based on the content of the nf_flowtable as soon as the TCP connection is in the established state:
v2: Bluetooth: hci_core: Refactor hci_get_dev_list() function
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .
v1: net: af_unix: Annotate data-races around sk->sk_hash.
syzkaller reported data-race of sk->sk_hash in unix_autobind() [0], and the same ones exist in unix_bind_bsd() and unix_bind_abstract(). There could be a chance that sk->sk_hash changes after the lockless read. The KCSAN splat is false-positive, but let’s use WRITE_ONCE() and READ_ONCE() to silence it.
v1: net: af_unix: Annotate data-race around unix_sk(sk)->addr.
Once unix_sk(sk)->addr is assigned under net->unx.table.locks, *(unix_sk(sk)->addr) and unix_sk(sk)->path are fully set up, and unix_sk(sk)->addr is never changed.
v2: net: mhi: set skb mac header before entering RX path
skb->mac_header must be set before passing the skb to the network stack, because skb->mac_len is calculated from skb->mac_header in __netif_receive_skb_core.
Some network stack components, like xfrm, are using skb->mac_len to check for an existing MAC header, which doesn’t exist in this case. This leads to memory corruption.
v3: tty: rfcomm: refactor rfcomm_get_dev_list() function
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows . This code was detected with the help of Coccinelle, and audited and modified manually.
The patch was based on kernel 6.6.8, the skb properties as mentioned in .
v1: net-next: netfilter: nft_fib: allow from forward/input without iif selector
This removes the restriction of needing iif selector in the forward/input hooks for fib lookups when requested result is oif/oifname.
Removing this restriction allows “loose” lookups from the forward hooks.
v2: wifi: mac80211: Avoid address calculations via out of bounds array indexing
req->n_channels must be set before req->channels[] can be used.
This patch fixes one of the issues encountered in [1].
v1: vsock/virtio: Add support for multi-devices
Vsock is a lightweight and widely used data exchange mechanism between host and guest. Kata Containers, a secure container runtime, leverages the capability to exchange control data between the shim and the kata-agent.
[iwl-net]v3: e1000e: move force SMBUS near the end of enable_ulp function
The commit 861e8086029e (“e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue”) introduces a regression on PCH_MTP_I219_LM18 (PCIID: 0x8086550A).
v1: dt-bindings: net: dp8386x: Add MIT license along with GPL-2.0
This allows for Linux kernel files to be used in other Operating System ecosystems such as Zephyr or FreeBSD.
While at this, update the TI copyright year to sync with current year to indicate license change.
v2: net-next: net: phy: mediatek: Introduce mtk-phy-lib and add 2.5Gphy support
Re-organize MTK ethernet phy drivers and integrate common manipulations into mtk-phy-lib. Also, add support for build-in 2.5Gphy on MT7988.
v1: net-next: icmp: Add icmp_timestamp_ignore_all to control ICMP_TIMESTAMP
The CVE-1999-0524 became a medium risk vulnerability in May of this year.
In some embedded systems, firewalls such as iptables maybe cannot to use. For embedded systems where firewalls can’t be used and devices that don’t require icmp timestamp, provide the icmp_timestamp_ignore_all interface, which ignores all icmp timestamp messages to circumvent the vulnerability.
v1: net-next: tcp: break the limitation of initial receive window
Since in 2018 one commit a337531b942b (“tcp: up initial rmem to 128KB and SYN rwin to around 64KB”) limited received window within 65535, most CDN team would not benefit from this change because they cannot have a large window to receive a big packet one time especially in long RTT.
v1: powerpc64/bpf: jit support for cpuv4 instructions
Add support for recently added cpuv4 instructions fixing test_bpf module failures. This is mostly based on 8ecf3c1dab1c6 (powerpc/bpf/32: Fix failing test_bpf tests, 2024-03-05)
v19: net-next: Add Realtek automotive PCIe driver
This series includes adding realtek automotive ethernet driver and adding rtase ethernet driver entry in MAINTAINERS file.
This ethernet device driver for the PCIe interface of Realtek Automotive Ethernet Switch,applicable to RTL9054, RTL9068, RTL9072, RTL9075, RTL9068, RTL9071.
v2: net-next: net: xilinx_gmii2rgmii: Add clock support
Add input clock support to gmii_to_rgmii IP. Add “clocks” bindings for the input clock.
v2: can: j1939: Initialize unused data in j1939_send_one()
syzbot reported kernel-infoleak in raw_recvmsg() [1]. j1939_send_one() creates full frame including unused data, but it doesn’t initialize it. This causes the kernel-infoleak issue. Fix this by initializing unused data.
v2: net-next: net: ethernet: mtk_eth_soc: add missing check for rhashtable_init
Add check for the return value of rhashtable_init() and return the error if it fails in order to catch the error.
v1: vringh: add MODULE_DESCRIPTION()
Fix the allmodconfig ‘make w=1’ issue:
[PATCH stable 5.4 0/3] net: bcmgenet: protect contended accesses
Some registers may be modified by parallel execution contexts and require protections to prevent corruption.
[PATCH stable 5.4 0/2] net: bcmgenet: revisit MAC reset
This commit set provides such an alternative. This replacement implementation should be applied to the stable branches wherever commit 3a55402c9387 (“net: bcmgenet: use RGMII loopback for MAC reset”) has been applied.
v2: net: openvswitch: Set the skbuff pkt_type for proper pmtud support.
This issue is periodically encountered in complex setups, such as large openshift deployments, where multiple sets of tunnel traversal occurs.
v2: net: Always descend into dsa/ folder with CONFIG_NET_DSA enabled
Stephen reported that he was unable to get the dsa_loop driver to get probed, and the reason ended up being because he had CONFIG_FIXED_PHY=y in his kernel configuration.
v1: net: enic: Validate length of nl attributes in enic_set_vf_port
These attributes are validated (in the function do_setlink in rtnetlink.c) using the nla_policy ifla_port_policy. The policy defines IFLA_PORT_PROFILE as NLA_STRING, IFLA_PORT_INSTANCE_UUID as NLA_BINARY and IFLA_PORT_HOST_UUID as NLA_STRING.
v3: net: selftests: net: local_termination: annotate the expected failures
The bridge driver fares particularly badly […] mainly becauseit does not implement IFF_UNICAST_FLT.
v1: iwl-net: ice: implement AQ download pkg retry
ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due to FW issue. Fix this by retrying five times before moving to Safe Mode.
v1: net/sched: unregister root_lock_key in the error path of qdisc_alloc()
The following slab-use-after-free problem was reported by syzbot
v10: iwl-next: ice: Add get/set hw address for VFs using devlink commands
Changing the MAC address of the VFs is currently unsupported via devlink. Add the function handlers to set and get the HW address for the VFs.
v1: vhost: use pr_err for vq_err
Use pr_err to print out error message without enabling DEBUG. This could make people catch error easier.
v1: net: Always descend into dsa/ folder
Stephen reported that he was unable to get the dsa_loop driver to get probed, and the reason ended up being because he had CONFIG_FIXED_PHY=y in his kernel configuration.
GIT PULL: Enable IORING_CQE_F_SOCK_NONEMPTY for accept requests
This adds support for IORING_CQE_F_SOCK_NONEMPTY for io_uring accept requests. This is very similar to previous work that enabled the same hint for doing receives on sockets. By far the majority of the work here is refactoring to enable the networking side to pass back whether or not the socket had more pending requests after accepting the current one, the last patch just wires it up for io_uring.
v1: net: af_packet: do not call packet_read_pending() from tpacket_destruct_skb()
trafgen performance considerably sank on hosts with many cores after the blamed commit.
packet_read_pending() is very expensive, and calling it in af_packet fast path defeats Daniel intent in commit b013840810c2 (“packet: use percpu mmap tx frame pending refcount”)
v1: vhost/vsock: always initialize seqpacket_allow
There are two issues around seqpacket_allow To fix:- initialize seqpacket_allow after allocation- set it unconditionally in set_features
v2: rds: rdma: Add ability to force GFP_NOIO
This series enables RDS and the RDMA stack to be used as a block I/O device. This to support a filesystem on top of a raw block device which uses RDS and the RDMA stack as the network transport layer.
v1: net: netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()
syzbot reported that nf_reinject() could be called without rcu_read_lock()
安全增强
v1: dma-buf/fence-array: Add flex array to struct dma_fence_array
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .
v1: selftests: hid: Do not open-code TEST_HARNESS_MAIN
Avoid open-coding TEST_HARNESS_MAIN. (It might change, for example.)
v2: ntp: safeguard against time_constant overflow case
Using syzkaller with the recently reintroduced signed integer overflow sanitizer produces this UBSAN report
v1: hpfs: Annotate struct hpfs_dirent with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).
v8: arm64: qcom: add AIM300 AIoT board support
Add AIM300 AIoT support along with usb, ufs, regulators, serial, PCIe, and PMIC functions.
v1: net: prestera: Add flex arrays to some structs
Use the preferred way in the kernel declaring flexible arrays . This code was detected with the help of Coccinelle, and audited and modified manually.
v1: Bluetooth: hci_core: Prefer struct_size over open coded arithmetic
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .
v2: tty: rfcomm: prefer struct_size over open coded arithmetic
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .
v1: perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2].
v3: perf/ring_buffer: Prefer struct_size over open coded arithmetic
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows .
异步 IO
v2: Introduce per-task io utilization boost
The feature is implemented by checking for task wakeups that have the in_iowait flag set and boost the CPU of the rq accordingly (implemented through cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT)).
GIT PULL: Enable IORING_CQE_F_SOCK_NONEMPTY for accept requests
This one was deferred, as it both depended on the net branch and the io_uring changes for 6.10. Sending it now as both have landed.
This adds support for IORING_CQE_F_SOCK_NONEMPTY for io_uring accept requests.
v4: io_uring: releasing CPU resources when polling
This patch is intended to release the CPU resources of io_uring in polling mode. When IO is issued, the program immediately polls for check completion, which is a waste of CPU resources when IO commands are executed on the disk.
v4: io_uring/rsrc: coalescing multi-hugepage registered buffers
This patch series enables coalescing registered buffers with more than one hugepages. It optimizes the DMA-mapping time and saves memory for these kind of buffers.
v1: liburing: test: add test cases for hugepage registered buffers
Add a test file for hugepage registered buffers, to make sure the fixed buffer coalescing feature works safe and soundly.
Rust For Linux
All the commits have been in linux-next for a week or more, except for the fix on top, which has been in 3 linux-next tags.
A simple conflict with the Kbuild pull expected. It is resolved in linux-next. No changes to the C side.
v1: Rust block device driver API and null block driver
This series provides an initial Rust block layer device driver API, and a very minimal null block driver to exercise the API. The driver has only one mode of operation and cannot be configured.
These patches are an updated and trimmed down version of the v2 RFC . One of the requests for the v2 RFC was to split the abstractions into smaller pieces that are easier to review. This is the first part of the split patches.
v1: bpf: constify member bpf_sysctl_kern::table
The sysctl core is preparing to only expose instances of struct ctl_table as “const”. This will also affect the ctl_table argument of sysctl handlers, for which bpf_sysctl_kern::table is also used.
v2: bpf-next: API to access btf_dump emit queue and print single type
This is a follow-up to the following discussion: https://lore.kernel.org/bpf/20240503111836.25275-1-jose.marchesi@oracle.com/
v1: selftests: harness: refactor __constructor_order
This series refactors __constructor_order because __constructor_order_last() is unneeded.
v1: dwarves: btf_encoder: add “distilled_base” BTF feature to split BTF generation
Adding “distilled_base” to –btf_features when generating split BTF will create split and .BTF.base BTF - the latter allows us to map references from split BTF to base BTF, even if that base BTF has changed. It does this by providing just enough information about the base types in the .BTF.base section.
v4: bpf-next: bpf: support resilient split BTF
Split BPF Type Format (BTF) provides huge advantages in that kernel modules only have to provide type information for types that they do not share with the core kernel; for core kernel types, split BTF refers to core kernel BTF type ids.
v8: bpf-next: bpf: Add a generic bits iterator
Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been added for the new bpf_iter_bits functionality. These kfuncs enable the iteration of the bits from a given address and a given number of bits. The bits iterator can be used in any context and on any address.
v1: bpf: selftests/bpf: Adjust test_access_variable_array after a kernel function name change
After commit 4c3e509ea9f2 (“sched/balancing: Rename load_balance() => sched_balance_rq()”), the load_balance kernel function is renamed to sched_balance_rq.
This patch adjusts the fentry program in test_access_variable_array.c to reflect this kernel function name change.
v1: bpf: selftests/bpf: Adjust btf_dump test to reflect recent change in file_operations
This patch changes the test_btf_dump_struct_data() to reflect this change.
v1: bpf-next: selftests/bpf: Enable INET_XFRM_TUNNEL in config
The kconfigs CONFIG_INET_XFRM_TUNNEL and CONFIG_INET6_XFRM_TUNNEL are needed by test_tunnel tests. This patch enables them together with the dependent kconfigs CONFIG_INET_IPCOMP and CONFIG_INET6_IPCOMP.
v2: bpf-next: add netns helpers
This patchset addresses Alexei’s comment for commit “Handle SIGINT when creating netns” [1]. Export local helpers create_netns() and cleanup_netns() defined in mptcp.c into network_helpers.c as generic ones. For this another helper unshare_netns() is added to replace the existing local helpers create_netns().
v12: Reduce overhead of LSMs with static calls
This series is a respin of the RFC proposed by Paul Renauld (renauld@google.com) and Brendan Jackman (jackmanb@google.com)
v1: bpf-next: Zero overhead PROBE_MEM
This is a critical use case for applications that implement kernel tracing, and observability functionality using BPF programs, and provides users with much needed visibility and context into a running kernel.
v5: perf/core: Check sample_type in sample data saving helper functions
We use helper functions to save raw data, callchain and branch stack in perf_sample_data. These functions update perf_sample_data->dyn_size without checking event->attr.sample_type, which may result in unused space allocated in sample records. To prevent this from happening, this patchset enforces checking sample_type of an event in these helper functions.
v4: First try to replace page_frag with page_frag_cache
This patchset tries to unfiy the page frag implementation by replacing page_frag with page_frag_cache for sk_page_frag() first. net_high_order_alloc_disable_key for the implementation in net/core/sock.c doesn’t seems matter that much now have have pcp support for high-order pages in commit 44042b449872 (“mm/page_alloc: allow high-order pages to be stored on the per-cpu lists”).
As the related change is mostly related to networking, so targeting the net-next. And will try to replace the rest of page_frag in the follow patchset.
v1: bpf-next: bpf: tcp: Improve bpf write tcp opt performance
Set the full package write tcp option, the test found that the loss will be 20%. If a package wants to write tcp option, it will trigger bpf prog three times, and call “tcp_send_mss” calculate mss_cache, call “tcp_established_options” to reserve tcp opt len, call “bpf_skops_write_hdr_opt” to write tcp opt, but “tcp_send_mss” before TSO. Through bpftrace tracking, it was found that during the pressure test, “tcp_send_mss” call frequency was 90w/s. Considering that opt len does not change often, consider caching opt len for optimization.
v2: bpf-next: use network helpers, part 5
This patchset uses post_socket_cb and post_connect_cb callbacks of struct network_helper_opts to refactor do_test() in bpf_tcp_ca.c to move dctcp test dedicated code out of do_test() into test_dctcp().
Patch 3 adds a new member in post_socket_opts and patch 4 adds a new callback in network_helper_opts. I’m not sure if this is going too far.
GIT PULL: Networking for v6.10
Full disclosure I hit a KASAN OOB read warning in BPF when testing on Meta’s production servers (which load a lot of BPF). BPF folks aren’t super alarmed by it, and also they are partying at LSFMM so I don’t think it’s worth waiting for the fix. But you may feel differently… https://pastebin.com/0fzqy3cW
v4: bpf-next: bpftool: introduce btf c dump sorting
Sort bpftool c dump output; aiming to simplify vmlinux.h diffing and forcing more natural type definitions ordering.
Definitions are sorted first by their BTF kind ranks, then by their base type name and by their own name.
v1: net-next: net: mana: Enable MANA driver on ARM64 with 4K page size
Change the Kconfig dependency, so this driver can be built and run on ARM64 with 4K page size.
v3: bpf-next: bpf: bpftool: Support dumping kfunc prototypes from BTF
This patchset enables both detecting as well as dumping compilable prototypes for kfuncs.
Users will be able to look at BTF inside vmlinux (or modules) and check if the kfunc they want is available.
For developer convenience, we also support dumping kfunc prototypes from bpftool.
v4: bpf-next: Support kCFI + BPF on arm64
For the BPF summit meeting tomorrow, I might as well have a mergable version. I took a look back on BPF-CFI patches to check the status and found that there had been no updates for around a month, so I went ahead and made the fixes suggested in v2.
v3: bpf: powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
The patch splitup is not ideal, but that’s not what I’m interested in here. What I want to hear is the results of testing - does this switch of the RGMII/SGMII “pcs” stuff to a phylink_pcs work for this driver?
v1: bpf, sockmap: defer sk_psock_free_link() using RCU
If a BPF program is attached to kfree() event, calling kfree() with psock->link_lock held triggers lockdep warning.
Defer kfree() using RCU so that the attached BPF program runs without holding psock->link_lock.
v2: bpf-next: bpf: make list_for_each_entry portable
This patch adds a new macro can_loop to bpf_experimental, that implements the same logic than cond_break but evaluates to a boolean expression. The patch also changes all the current instances of usage of cond_break withing the header of loop accordingly.
v1: bpf-next: bpf: disable strict aliasing in test_global_func9.c
The BPF selftest test_global_func9.c performs type punning and breaks srict-aliasing rules.
v1: bpf-next: selftests/bpf: Free strdup memory in xdp_hw_metadata
The strdup() function returns a pointer to a new string which is a duplicate of the string “ifname”. Memory for the new string is obtained with malloc(), and need to be freed with free().
This patch adds this missing “free(saved_hwtstamp_ifname)” in cleanup() to avoid a potential memory leak in xdp_hw_metadata.c.
周边技术动态
Qemu
v1: target/riscv: zvbb implies zvkb
According to RISC-V crypto spec, Zvkb extension is a proper subset of the Zvbb extension.
Reference: https://github.com/riscv/riscv-crypto/blob/1769c2609bf4535632e0c0fd715778f212bb272e/doc/vector/riscv-crypto-vector-zvkb.adoc?plain=1#L10
v2: target/riscv: Support RISC-V privilege 1.13 spec
Based on the change log for the RISC-V privilege 1.13 spec, add the support for ss1p13.
v1: hw/riscv/virt: Add hotplugging and virtio-md-pci support
Virtio-based memory devices allows for dynamic resizing of virtual machine memory, and requires proper hotplugging (add/remove) support to work.
Enable virtio-md-pci with the corresponding missing hotplugging callbacks for the RISC-V “virt” machine.
v1: dias/riscv: Decode all of the pmpcfg and pmpaddr CSRs
Previously we only listed a single pmpcfg CSR and the first 16 pmpaddr CSRs. This patch fixes this to list all 16 pmpcfg and all 64 pmpaddr CSRs are part of the dissassembly.
Buildroot
package/kvmtool: enable build for riscv
kvmtool now supports riscv, enable it and select BR2_PACKAGE_DTC which is needed to build it.
package/kvmtool: bump package version to 4d2c017f41
The current version dates back to 2017 and is lacking riscv support. Bump the version to a more recent one (4d2c017f41) which supports riscv and contains a large number of updates as well a CVE fixes. Since kvmtool does not seems to have releases, just bump to the current git HEAD.
arch: allow riscv32 noMMU configuration
commit: https://git.buildroot.net/buildroot/commit/?id=e32d404f6c4fb5fce2a65996efac53281f97ce5c branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master
v1: configs/qemu_riscv32_nommu_virt_defconfig: New defconfig
Add new defconfig for Qemu RISCV32 w/o MMU.
U-Boot
v1: pwm: sunxi: Add support Allwinner D1 PWM
This patch series adds support for the Allwinner D1, T113 and R329 PWM.
This code isn’t based on any kernel code but instead written from scratch with the goal of handling the PWM pairs deterministically.
The following changes since commit c8ffd1356d42223cbb8c86280a083cc3c93e6426:
Merge patch series “arm: dts: am62-beagleplay: Fix Beagleplay Ethernet” (2024-05-13 09:15:51 -0600)
are available in the Git repository at
v4: board: starfive: add Milk-V Mars CM support
With this series the Milk-V Mars CM board can be booted.
NVMe, SD-card, Ethernet, UART are working but not USB.
The first series Milk-V Mars CM Lite board (the version without eMMC) uses incorrect series numbers indicating eMMC presence. For these CONFIG_STARFIVE_NO_EMMC=y must be set to indicate that eMMC is not present.
猜你喜欢:
- 我要投稿:发表原创技术文章,收获福利、挚友与行业影响力
- 泰晓资讯:汇总一周技术趣闻与文章,查看「Linux 资讯」
- 知识星球:独家 Linux 实战经验与技巧,订阅「Linux知识星球」
- 视频频道:泰晓学院,B 站,发布各类 Linux 视频课
- 开源小店:欢迎光临泰晓科技自营店,购物支持泰晓原创
- 技术交流:Linux 用户技术交流微信群,联系微信号:tinylab
支付宝打赏 ¥9.68元 | 微信打赏 ¥9.68元 | |
请作者喝杯咖啡吧 |
Read Album:
- Stratovirt 的 RISC-V 支持(二):库的 RISC-V 适配
- Stratovirt 的 RISC-V 虚拟化支持(一):环境配置
- TinyBPT 和面向 buildroot 的二进制包管理服务(3):服务端说明
- TinyBPT 和面向 buildroot 的二进制包管理服务(2):客户端说明
- TinyBPT 和面向 buildroot 的二进制包管理服务(1):设计简介与框架