[置顶] 泰晓 RISC-V 实验箱,配套 30+ 讲嵌入式 Linux 系统开发公开课
RISC-V Linux 内核及周边技术动态第 48 期
时间:20230604
编辑:晓依
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS
内核动态
RISC-V 架构支持
v1: tools/nolibc: add two new syscall helpers
When I worked on adding new syscalls and the related library routines, I have seen most of the library routines share the same syscall call and return logic, this patchset adds two macros to simplify and shrink them.
v3: nolibc: add part2 of support for rv32
This is the v3 part2 of support for rv32, differs from the v2 part2 [1], we only fix up compile issues in this patchset.
With the v3 generic part1 [2] and this patchset, we can compile nolibc for rv32 now.
This is based on the idea of suggestions from Arnd [3], instead of ‘#error’ on the unsupported syscall on a target platform, a ‘return -ENOSYS’ allow us to compile it at first and then allow we fix up the test failures reported by nolibc-test one by one.
v3: nolibc: add generic part1 of prepare for rv32
This is the v3 generic part1 for rv32, all of the found issues of v2 part1 [1] have been fixed up, several generic patches have been fixed up and merged from v2 part2 [2] to this series, the standalone test_fork patch [4] is merged with a Reviewed-by line into this series too.
v2: Use MMU read lock for clear-dirty-log
This series is on top of kvmarm/next as I needed to also modify Eager page splitting logic in clear-dirty-log API. Eager page splitting is not present in Linux 6.4-rc4.
v2: Add initialization of clock for StarFive JH7110 SoC
This patchset adds initial rudimentary support for the StarFive Quad SPI controller driver. And this driver will be used in StarFive’s VisionFive 2 board. In 6.4, the QSPI_AHB and QSPI_APB clocks changed from the default ON state to the default OFF state, so these clocks need to be enabled in the driver.At the same time, dts patch is added to this series.
v1: Add DRM driver for StarFive SoC JH7110
This series is a DRM driver for StarFive SoC JH7110, which includes a display controller driver for Verisilicon DC8200 and an HMDI driver.
We use GEM framework for buffer management and allocate memory by using DMA APIs.
v2: gpio: sifive: Add missing check for platform_get_irq
Add the missing check for platform_get_irq and return error code if it fails.
v2: Add support for Allwinner GPADC on D1/T113s/R329/T507 SoCs
This series adds support for general purpose ADC (GPADC) on new Allwinner’s SoCs, such as D1, T113s, T507 and R329. The implemented driver provides basic functionality for getting ADC channels data.
v2: riscv/purgatory: Do not use fortified string functions
This means that the memcpy() calls with “buf” as a destination in sha256.c’s code will attempt to perform run-time bounds checking, which could lead to calling missing functions, specifically a potential WARN_ONCE, which isn’t callable from purgatory.
module_alloc() is used everywhere as a mean to allocate memory for code.
Beside being semantically wrong, this unnecessarily ties all subsystmes that need to allocate code, such as ftrace, kprobes and BPF to modules and puts the burden of code allocation to the modules code.
v4: StarFive’s Pulse Width Modulation driver support
This patchset adds initial rudimentary support for the StarFive Pulse Width Modulation controller driver. And this driver will be used in StarFive’s VisionFive 2 board.The first patch add Documentations for the device and Patch 2 adds device probe for the module.
v3: Split ptdesc from struct page
The MM subsystem is trying to shrink struct page. This patchset introduces a memory descriptor for page table tracking - struct ptdesc.
This patchset introduces ptdesc, splits ptdesc from struct page, and converts many callers of page table constructor/destructors to use ptdescs.
v2: riscv: mm: Pre-allocate PGD entries for vmalloc/modules area
The RISC-V port requires that kernel PGD entries are to be synchronized between MMs. This is done via the vmalloc_fault() function, that simply copies the PGD entries from init_mm to the faulting one.
v3: RISC-V: KVM: Ensure SBI extension is enabled
Ensure guests can’t attempt to invoke SBI extension functions when the SBI extension’s probe function has stated that the extension is not available.
v1: selftests/nolibc: add user-space ‘efault’ handler
This is not really for merge, but only let it work as a demo code to test whether it is possible to restore the next test when there is a bad pointer access in user-space [1].
v1: fdt: Mark “/reserved-memory” nodes as nosave if !reusable
In the RISC-V kernel, the firmware does not mark the region it uses as “no-map” so that the kernel can avoid having holes in the linear mapping and then use larger pages.
v2: nolibc: add part3 of support for rv32
Hi, Willy
These two patches are based on part2 of support for rv32 [1], I have forgotten to send them together.
v1: riscv: mm: Pre-allocate PGD entries vmalloc/modules area
The RISC-V port requires that kernel PGD entries are to be synchronized between MMs. This is done via the vmalloc_fault() function, that simply copies the PGD entries from init_mm to the faulting one.
v5: Add JH7110 MIPI DPHY RX support
This patchset adds mipi dphy rx driver for the StarFive JH7110 SoC. It is used to transfer CSI camera data. The series has been tested on the VisionFive 2 board.
v1: riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idle
With this configuration opened, the basic platform-independent s2idle is provided by the sole “s2idle” string in
/sys/power/mem_sleep
.At the end of s2idle, harts will hit the
wfi
instruction or enter the SUSPENDED state through the sbi_cpuidle driver. The interrupt of possible wakeup devices will be kept to wake the system up.
v12: -next: riscv: Add independent irq/softirq stacks
This patch series adds independent irq/softirq stacks to decrease the press of the thread stack. Also, add a thread STACK_SIZE config for users to adjust the proper size during compile time.
进程调度
v1: net: sched: wrap tc_skip_wrapper with CONFIG_RETPOLINE
This patch fixes the following sparse warning:
net/sched/sch_api.c:2305:1: sparse: warning: symbol ‘tc_skip_wrapper’ was not declared. Should it be static?
v1: net-next: net/sched: introduce pretty printers for Qdiscs
Sometimes when debugging Qdiscs it may be confusing to know exactly what you’re looking at, especially since they’re hierarchical. Pretty printing the handle, parent handle and netdev is a bit cumbersome, so this patch proposes a set of wrappers around __qdisc_printk() which are heavily inspired from __net_printk().
v1: sched: EEVDF and latency-nice and/or slice-attr
Latest version of the EEVDF [1] patches.
The only real change since last time is the fix for tick-preemption [2], and a simple safe-guard for the mixed slice heuristic.
Other than that, I’ve re-arranged the patches to make EEVDF come first and have the latency-nice or slice-attribute patches on top.
v2: sched/fair: Don’t balance task to its current running CPU
The new_dst_cpu is chosen from the env->dst_grpmask. Currently it contains CPUs in sched_group_span() and if we have overlapped groups it’s possible to run into this case. This patch makes env->dst_grpmask of group_balance_mask() which exclude any CPUs from the busiest group and solve the issue. For balancing in a domain with no overlapped groups the behaviour keeps same as before.
v8: sched/fair: Scan cluster before scanning LLC in wake-up path
This is the follow-up work to support cluster scheduler. Previously we have added cluster level in the scheduler for both ARM64[1] and X86[2] to support load balance between clusters to bring more memory bandwidth and decrease cache contention. This patchset, on the other hand, takes care of wake-up path by giving CPUs within the same cluster a try before scanning the whole LLC to benefit those tasks communicating with each other.
v1: sched: deadline: Simplify pick_earliest_pushable_dl_task()
Using the while statement instead of the if and goto statements is more concise and efficient.
内存管理
v6: mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
Here’s version 6 of the series reducing the kmalloc() minimum alignment on arm64 to 8 (from 128). There are patches already to do the same for riscv (pretty straight-forward after this series).
The first 11 patches decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those aligned to the run-time probed cache_line_size(). On arm64 we gain the kmalloc-{64,192} caches.
v1: Introduce cmpxchg128() – aka. the demise of cmpxchg_double()
After much breaking of things, find here the improved version.
v2: net-next: splice, net: Handle MSG_SPLICE_PAGES in AF_TLS
Here are patches to make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol that it should splice the pages supplied if it can. Its sendpage implementations are then turned into wrappers around that.
v7: bio: check return values of bio_add_page
We have two functions for adding a page to a bio, __bio_add_page() which is used to add a single page to a freshly created bio and bio_add_page() which is used to add a page to an existing bio.
While __bio_add_page() is expected to succeed, bio_add_page() can fail.
v2: net-next: splice, net: Handle MSG_SPLICE_PAGES in AF_KCM
Here are patches to make AF_KCM handle the MSG_SPLICE_PAGES internal sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol that it should splice the pages supplied if it can. Its sendpage implementation is then turned into a wrapper around that.
v2: net-next: splice, net: Handle MSG_SPLICE_PAGES in Chelsio-TLS
Here are patches to make Chelsio-TLS handle the MSG_SPLICE_PAGES internal sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol that it should splice the pages supplied if it can. Its sendpage implementation is then turned into a wrapper around that.
v1: make unregistration of super_block shrinker more faster
The kernel test robot noticed a -88.8% regression of stress-ng.ramfs.ops_per_sec on commit f95bdb700bc6 (“mm: vmscan: make global slab shrink lockless”). More details can be seen from the link[1] below.
v2: mm/migrate_device: Try to handle swapcache pages
Migrating file pages and swapcache pages into device memory is not supported. The decision is done based on page_mapping(). For now, swapcache pages are not migrated.
v1: mm: zswap: multiple zpool support
Support using multiple zpools of the same type in zswap, for concurrency purposes. Add CONFIG_ZSWAP_NR_ZPOOLS_ORDER to control the number of zpools. The order is specific by the config rather than the absolute number to guarantee a power of 2. This is useful so that we can use deterministically link each entry to a zpool by hashing the zswap_entry pointer.
v3: zswap: do not shrink if cgroup may not zswap
Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy. If this limit is reached or exceeded, then zswap shrinking is triggered and short-circuits the store attempt.
v1: mm: zswap: support exclusive loads
Commit 71024cb4a0bf (“frontswap: remove frontswap_tmem_exclusive_gets”) removed support for exclusive loads from frontswap as it was not used.
Bring back exclusive loads support to frontswap by adding an exclusive_loads argument to frontswap_ops. Add support for exclusive loads to zswap behind CONFIG_ZSWAP_EXCLUSIVE_LOADS.
v1: zswap: do not shrink when memory.zswap.max is 0
Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy. If this limit is reached or exceeded, then zswap shrinking is triggered and short-circuits the store attempt.
v2: net-next: crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)
Here’s the fourth tranche of patches towards providing a MSG_SPLICE_PAGES internal sendmsg flag that is intended to replace the ->sendpage() op with calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol that it should splice the pages supplied if it can.
v4: sock: Improve condition on sockmem pressure
Currently the memcg’s status is also accounted into the socket’s memory pressure to alleviate the memcg’s memstall. But there are still cases that can be improved. Please check the patches for detailed info.
v2: string: use __builtin_memcpy() in strlcpy/strlcat
lib/string.c is built with -ffreestanding, which prevents the compiler from replacing certain functions with calls to their library versions.
v1: -next: mm: page_alloc: simplify has_managed_dma()
The ZONE_DMA should only exists on Node 0, only check NODE_DATA(0) is enough, so simplify has_managed_dma() and make it inline.
v1: mm: free retracted page table by RCU
Here is the third series of patches to mm (and a few architectures), based on v6.4-rc3 with the preceding two series applied: in which khugepaged takes advantage of pte_offset_map_lock allowing for pmd transitions.
v1: Do not print page type when the page has no type
It is confusing and unnecessary to print the page type when the page has no type.
文件系统
v2: Create large folios in iomap buffered write path
Commit ebb7fb1557b1 limited the length of ioend chains to 4096 entries to improve worst-case latency. Unfortunately, this had the effect of limiting the performance of:
fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30
-iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1
-numjobs=4 -directory=/mnt/testThe problem ends up being lock contention on the i_pages spinlock as we clear the writeback bit on each folio (and propagate that up through the tree). By using larger folios, we decrease the number of folios to be processed by a factor of 256 for this benchmark, eliminating the lock contention.
v1: highmem: Rename put_and_unmap_page() to unmap_and_put_page()
With commit 849ad04cf562a (“new helper: put_and_unmap_page()”), Al Viro introduced the put_and_unmap_page() to use in those many places where we have a common pattern consisting of calls to kunmap_local() + put_page().
v1: fs: Rename put_and_unmap_page() to unmap_and_put_page()
With commit 849ad04cf562a (“new helper: put_and_unmap_page()”), Al Viro introduced the put_and_unmap_page() to use in those many places where we have a common pattern consisting of calls to kunmap_local() + put_page().
v2: zonefs: use iomap for synchronous direct writes
Remove the function zonefs_file_dio_append() that is used to manually issue REQ_OP_ZONE_APPEND BIOs for processing synchronous direct writes and use iomap instead.
v1: fs.h: Optimize file struct to prevent false sharing
In the syscall test of UnixBench, performance regression occurred due to false sharing.
v1: fuse: Abort the requests under processing queue with a spin_lock
There is a potential race/timing issue while aborting the requests on processing list between fuse_dev_release() and fuse_abort_conn(). This is resulting into below warnings and can even result into UAF issues.
v3: NFSD: recall write delegation on GETATTR conflict
This patch series adds the recall of write delegation when there is conflict with a GETATTR and a counter in /proc/net/rpc/nfsd to keep count of this recall.
v4: fs/sysv: Null check to prevent null-ptr-deref bug
sb_getblk(inode->i_sb, parent) return a null ptr and taking lock on that leads to the null-ptr-deref bug.
Reported-by: syzbot+aad58150cbc64ba41bdc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=aad58150cbc64ba41bdc
v1: sysctl: move umh and keys sysctls
If you look at kernel/sysctl.c there are two sysctl arrays which are declared in header files but registered with no good reason now on kernel/sysctl.c instead of the place they belong. So just do the registration where it belongs.
v2: multiblock allocator improvements
So this patch was intended to remove a dead if-condition but it was not actually dead code and removing it was causing a performance regression. Unfortunately I somehow missed that when I was reviewing his patchset and it already went in so I had to revert the commit. I’ve added details of the regression and root cause in the revert commit. Also attaching the performance numbers I observer:
v1: fs/buffer: using __bio_add_page in submit_bh_wbc()
In submit_bh_wbc(), bio is newly allocated, so it does not need any merging logic.
And using bio_add_page here will execute ‘bio_flagged( bio, BIO_CLONED)’ and ‘bio_full’ twice, which is unnecessary.
v1: FUSE: dev: Change the posiion of spin_lock
just list_del need spin_lock ,so the spin_lock should be close to “list_del(&req->list)”, this may add a little benefit.
v1: Null check to prevent null-ptr-deref bug
sb_getblk(inode->i_sb, parent) return a null ptr and taking lock on that leads to the null-ptr-deref bug.
网络设备
v5: net-next: net: flower: add cfm support
The first patch adds cfm support to the flow dissector. The second adds the flower classifier support. The third adds a selftest for the flower cfm functionality.
iproute2 changes will come in follow up patches.
v4: vsock: MSG_ZEROCOPY flag support
Difference with copy way is not significant. During packet allocation, non-linear skb is created and filled with pinned user pages. There are also some updates for vhost and guest parts of transport - in both cases i’ve added handling of non-linear skb for virtio part. vhost copies data from such skb to the guest’s rx virtio buffers. In the guest, virtio transport fills tx virtio queue with pages from skb.
v1: Add support for sam9x7 SoC family
This patch series adds support for the new SoC family - sam9x7.
- The device tree, configs and drivers are added
- Clock driver for sam9x7 is added
- Support for basic peripherals is added
v1: RDMA/siw: Fabricate a GID on tun and loopback devices
LOOPBACK and NONE (tunnel) devices have all-zero MAC addresses. Currently, siw_device_create() falls back to copying the IB device’s name in those cases, because an all-zero MAC address breaks the RDMA core address resolution mechanism.
v1: net-next: Move KSZ9477 errata handling to PHY driver
Patches to move handling for KSZ9477 PHY errata register fixes from the DSA switch driver into the corresponding PHY driver, for more proper layering and ordering.
v1: net: dsa: realtek: rtl8365mb: add missing case for digital interface 0
when bringing up the switch on a Netgear WNDAP660, I observed that no traffic got passed from the RTL8363 to the ethernet interface…
Turns out, this was because the dropped case for RTL8365MB_DIGITAL_INTERFACE_SELECT_REG(0) that got deleted by accident.
This is a patchset for a new vendor specific VFIO driver (pds_vfio) for use with the AMD/Pensando Distributed Services Card (DSC). This driver makes use of the pds_core driver.
v1: RDMA/core: Handle ARPHRD_NONE devices
We would like to enable the use of siw on top of a VPN that is constructed and managed via a tun device. That hasn’t worked up until now because ARPHRD_NONE devices (such as tun devices) have no GID for the RDMA/core to look up.
v1: net: dsa: realtek: rtl8365mb: use mdio passthrough to access PHYs
when bringing up the PHYs on a Netgear WNDAP660, I observed that none of the PHYs are getting enumerated and the rtl8365mb fails to load.
v1: net: rfs: annotate lockless accesses
rfs runs without locks held, so we should annotate read and writes to shared variables.
v5: net-next: net: ioctl: Use kernel memory on protocol ioctl callbacks
Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers.
v1: iproute2: ipaddress: accept symbolic names
The function rtnl_addproto_a2n() was defined but never used. Use it to allow for symbolic names, and fix the function signatures so protocol value is consistently __u8.
v1: net-next: complete Lynx mdio device handling
This series completes the mdio device lifetime handling for Lynx PCS users which do not create their own mdio device, but instead fetch it using a firmware description - namely the DPAA2 and FMAN_MEMAC drivers.
v1: net: net/sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values
We got multiple syzbot reports, all duplicates of the following [1]
syzbot managed to install fq_pie with a zero TCA_FQ_PIE_QUANTUM, thus triggering infinite loops.
[PATCH RESEND net-next 0/5] Improve the taprio qdisc’s relationship with its children
[ Original patch set was lost due to an apparent transient problem with kernel.org’s DNSBL setup. This is an identical resend. ]
Prompted by Vinicius’ request to consolidate some child Qdisc dereferences in taprio: https://lore.kernel.org/netdev/87edmxv7x2.fsf@intel.com/
v1: net: enetc: correct the statistics of rx bytes
The purpose of this patch set is to fix the issue of rx bytes statistics. The first patch corrects the rx bytes statistics of normal kernel protocol stack path, and the second patch is used to correct the rx bytes statistics of XDP.
v1: net-next: ipv6: lower “link become ready”’s level message
This following message is printed in the console each time a network device configured with an IPv6 addresses is ready to be used:
v5: net-next: sock: Improve condition on sockmem pressure
Currently the memcg’s status is also accounted into the socket’s memory pressure to alleviate the memcg’s memstall. But there are still cases that can be improved. Please check the patches for detailed info.
v2: bpf-next: bpf, x86: allow function arguments up to 14 for TRACING
For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used on the kernel functions whose arguments count less than 6. This is not friendly at all, as too many functions have arguments count more than 6.
Therefore, let’s enhance it by increasing the function arguments count allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
v4: Introduce a vringh accessor for IO memory
Vringh is a host-side implementation of virtio rings, and supports the vring located on three kinds of memories, userspace, kernel space and a space translated iotlb.
v1: net-next: tools: ynl-gen: dust off the user space code
Every now and then I wish I finished the user space part of the netlink specs, Python scripts kind of stole the show but C is useful for selftests and stuff which needs to be fast. Recently someone asked me how to access devlink and ethtool from C++ which pushed me over the edge.
v6: net-next: net: dsa: mv88e6xxx: implement USXGMII mode for mv88e6393x
Changes from previous version:
- use phylink_decode_usxgmii_word() to decode USXGMII link state
- use existing include/uapi/linux/mdio.h defines when parsing status bits
v6: net-next: Brcm ASP 2.0 Ethernet Controller
Add support for the Broadcom ASP 2.0 Ethernet controller which is first introduced with 72165.
v1: net: dsa: qca8k: add CONFIG_LEDS_TRIGGERS dependency
There is a mix of ‘depends on’ and ‘select’ for LEDS_TRIGGERS, so it’s not clear what we should use here, but in general using ‘depends on’ causes fewer problems, so use that.
v1: net: tcp: gso: really support BIG TCP
oldlen name is a bit misleading, as it is the contribution of skb->len on the input skb TCP checksum. I added a comment to clarify this point.
v3: net/sctp: Make sha1 as default algorithm if fips is enabled
MD5 is not FIPS compliant. But still md5 was used as the default algorithm for sctp if fips was enabled. Due to this, listen() system call in ltp tests was failing for sctp in fips environment, with below error message.
**[GIT PULL: Networking for v6.4-rc5](http://lore.kernel.org/netdev/20230601180906.238637-1-
Additional napi fields such as PID association for napi thread etc. can be supported in a follow-on patch set.
This series only supports ‘get’ ability for retrieving napi fields (specifically, napi ids and queue[s]). The ‘set’ ability for setting queue[s] associated with a napi instance via netdev-genl will be submitted as a separate patch series.
安全增强
v5: checkpatch: Check for 0-length and 1-element arrays
Fake flexible arrays have been deprecated since last millennium. Proper C99 flexible arrays must be used throughout the kernel so CONFIG_FORTIFY_SOURCE and CONFIG_UBSAN_BOUNDS can provide proper array bounds checking.
Fixed-by: Joe Perches joe@perches.com
v1: s390/purgatory: Do not use fortified string functions
This means that the memcpy() calls with “buf” as a destination in sha256.c’s code will attempt to perform run-time bounds checking, which could lead to calling missing functions, specifically a potential WARN_ONCE, which isn’t callable from purgatory.
v1: x86/purgatory: Do not use fortified string functions
This means that the memcpy() calls with “buf” as a destination in sha256.c’s code will attempt to perform run-time bounds checking, which could lead to calling missing functions, specifically a potential WARN_ONCE, which isn’t callable from purgatory.
v1: next: firewire: Replace zero-length array with flexible-array member
Zero-length and one-element arrays are deprecated, and we are moving towards adopting C99 flexible-array members, instead.
v1: next: drm/amdgpu/discovery: Replace fake flex-arrays with flexible-array members
Zero-length and one-element arrays are deprecated, and we are moving towards adopting C99 flexible-array members, instead.
Use the DECLARE_FLEX_ARRAY() helper macro to transform zero-length arrays in a union into flexible-array members. And replace a one-element array with a C99 flexible-array member.
Rust For Linux
v1: add abstractions for network device drivers
This patchset adds minimum abstractions for network device drivers and Rust dummy network device driver, a simpler version of drivers/net/dummy.c.
The dummy network device driver doesn’t attach any bus such as PCI so the dependency is minimum. Hopefully, it would make reviewing easier.
v2: Rust scatterlist abstractions
This is a version of scatterlist abstractions for Rust drivers.
Scatterlist is used for efficient management of memory buffers, which is essential for many kernel-level operations such as Direct Memory Access (DMA) transfers and crypto APIs.
v2: rust: workqueue: add bindings for the workqueue
This patchset contains bindings for the kernel workqueue.
One of the primary goals behind the design used in this patch is that we must support embedding the
work_struct
as a field in user-provided types, because this allows you to submit things to the workqueue without having to allocate, making the submission infallible. If we didn’t have to support this, then the patch would be much simpler. One of the main things that make it complicated is that we must ensure that the function pointer in thework_struct
is compatible with the struct it is contained within.
v1: rust: error: integrate Rust error type with errname
This integrates the
Error
type with theerrname
by making it accessible via thename
method or via theDebug
trait.
BPF
v11: evm: Do HMAC of multiple per LSM xattrs for new inodes
One of the major goals of LSM stacking is to run multiple LSMs side by side without interfering with each other. The ultimate decision will depend on individual LSM decision.
[PATCH RESEND bpf-next 00/18] BPF token
Resending with trimmed CC list because original version didn’t make it to the mailing list.
This patch set introduces new BPF object, BPF token, which allows to delegate a subset of BPF functionality from privileged system-wide daemon (e.g., systemd or any other container manager) to a trusted unprivileged application. Trust is the key here. This functionality is not about allowing unconditional unprivileged BPF usage. Establishing trust, though, is completely up to the discretion of respective privileged application that would create a BPF token.
v1: selftests/bpf: Add missing selftests kconfig options
Our selftests of course rely on the kernel being built with CONFIG_DEBUG_INFO_BTF=y, though this (nor its dependencies of CONFIG_DEBUG_INFO=y and CONFIG_DEBUG_INFO_DWARF4=y) are not specified. This causes the wrong kernel to be built, and selftests to similarly fail to build.
v10: vhost: virtio core prepares for AF_XDP
Now, virtio may can not work with DMA APIs when virtio features do not have VIRTIO_F_ACCESS_PLATFORM.
- I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just work with the “real” devices.
- I tried to let xsk support callballs to get phy address from virtio-net driver as the dma address. But the maintainers of xsk may want to use dma-buf to replace the DMA APIs. I think that may be a larger effort. We will wait too long.
v1: bpf-next: bpf: Support ->fill_link_info for kprobe prog
Currently, it is not easy to determine which functions are probed by a kprobe_multi program. This patchset supports ->fill_link_info for it, allowing the user to easily obtain the probed functions.
Although the user can retrieve the functions probed by a perf_event program using
bpftool perf show
, it would be beneficial to also support ->fill_link_info. This way, the user can obtain it in the same manner as other bpf links.
v2: bpf-next: bpf_refcount followups (part 1)
This series is the first of two (or more) followups to address issues in the bpf_refcount shared ownership implementation discovered by Kumar. Specifically, this series addresses the “bpf_refcount_acquire on non-owning ref in another tree” scenario described in [0], and does not address issues raised in [1]. Further followups will address the other issues.
v2: bpf-next: bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo
Currently we observed a significant performance degradation in samples/bpf xdp1 and xdp2, due XDP multibuffer “xdp.frags” handling, added in commit 772251742262 (“samples/bpf: fixup some tools to be able to support xdp multibuffer”).
v1: bpf-next: bpf: getsockopt hook to get optval without checking kernel retval
Remove the judgment on retval and pass bpf ctx by default. The advantage of this is that it is more flexible. Bpf getsockopt can support the new optname without using the module to call the nf_register_sockopt to register.
v1: bpf-next: bpf: support BTF kind metadata to separate
BTF kind metadata provides information to parse BTF kinds. By separating parsing BTF from using all the information it provides, we allow BTF to encode new features even if they cannot be used. This is helpful in particular for cases where newer tools for BTF generation run on an older kernel; BTF kinds may be present that the kernel cannot yet use, but at least it can parse the BTF provided. Meanwhile userspace tools with newer libbpf may be able to use the newer information.
v1: net: ice: recycle/free all of the fragments from multi-buffer frame
The ice driver caches next_to_clean value at the beginning of ice_clean_rx_irq() in order to remember the first buffer that has to be freed/recycled after main Rx processing loop. The end boundary is indicated by first descriptor of frame that Rx processing loop has ended its duties. Note that if mentioned loop ended in the middle of gathering multi-buffer frame, next_to_clean would be pointing to the descriptor in the middle of the frame BUT freeing/recycling stage will stop at the first descriptor. This means that next iteration of ice_clean_rx_irq() will miss the (first_desc, next_to_clean - 1) entries.
v1: bpf/tests: Use struct_size()
Use struct_size() instead of hand writing it. This is less verbose and more informative.
v1: net: bpf, sockmap: avoid potential NULL dereference in sk_psock_verdict_data_ready()
syzbot found sk_psock(sk) could return NULL when called from sk_psock_verdict_data_ready().
Just make sure to handle this case.
v2: bpf-next: verify scalar ids mapping in regsafe()
To represent this set I use a u32_hashset data structure derived from tools/lib/bpf/hashmap.h. I tested it locally (see [1]), but I think that ideally it should be tested using KUnit. However, AFAIK, this would be the first use of KUnit in context of BPF verifier. If people are ok with this, I will prepare the tests and necessary CI integration.
v1: bpf-next: samples/bpf: xdp1 and xdp2 reduce XDPBUFSIZE to 60
Default samples/pktgen scripts send 60 byte packets as hardware adds 4-bytes FCS checksum, which fulfils minimum Ethernet 64 bytes frame size.
XDP layer will not necessary have access to the 4-bytes FCS checksum.
v2: bpf-next: xsk: multi-buffer support
This series of patches add multi-buffer support for AF_XDP. XDP and various NIC drivers already have support for multi-buffer packets. With this patch set, programs using AF_XDP sockets can now also receive and transmit multi-buffer packets both in copy as well as zero-copy mode. ZC multi-buffer implementation is based on ice driver.
v1: net: tcp: introduce a compack timer handler in sack compression
We’ve got some issues when sending a compressed ack is deferred to release phrase due to the socket owned by another user:
- a compressed ack would not be sent because of lack of ICSK_ACK_TIMER flag.
- the tp->compressed_ack counter should be decremented by 1.
- we cannot pass timeout check and reset the delack timer in tcp_delack_timer_handler().
- we are not supposed to increment the LINUX_MIB_DELAYEDACKS counter. …
v1: bpf-next: multi-buffer support for XDP_REDIRECT samples
This series adds multi-buffer support for two XDP_REDIRECT sample programs. It follows the pattern from xdp1 and xdp2.
v2: net-next: support non-frag page for page_pool_alloc_frag()
In [1] & [2], there are usecases for veth and virtio_net to use frag support in page pool to reduce memory usage, and it may request different frag size depending on the head/tail room space for xdp_frame/shinfo and mtu/packet size. When the requested frag size is large enough that a single page can not be split into more than one frag, using frag support only have performance penalty because of the extra frag count handling for frag support.
v1: bpf-next: bpf: Support ->show_fdinfo and ->fill_link_info for kprobe prog
Currently, it is not easy to determine which functions are probed by a kprobe_multi program. This patchset supports ->show_fdinfo and ->fill_link_info for it, allowing the user to easily obtain the probed functions.
周边技术动态
Qemu
RFC: target/riscv: Add support for Zacas extension
The Zacas[1] extension is a proposed unprivileged ISA extension for adding support for atomic compare-and-swap. Since this extension is not yet frozen (although no significant changes are expected) these patches are RFC/informational.
v2: linux-user/riscv: Add syscall riscv_hwprobe
This patch adds the new syscall for the “RISC-V Hardware Probing Interface” (https://docs.kernel.org/riscv/hwprobe.html).
v7: hw/riscv/virt: pflash improvements
This series improves the pflash usage in RISC-V virt machine with solutions to below issues.
1) Currently the first pflash is reserved for ROM/M-mode firmware code. But S-mode payload firmware like EDK2 need both pflash devices to have separate code and variable store so that OS distros can keep the FW code as read-only.
v1: disas/riscv: Add vendor extension support
This series adds vendor extension support to the QEMU disassembler for RISC-V. The following vendor extensions are covered:
- XThead{Ba,Bb,Bs,Cmo,CondMov,FMemIdx,Fmv,Mac,MemIdx,MemPair,Sync}
- XVentanaCondOps
Buildroot
package/openjdk{-bin}: security bump versions to 11.0.19+7 and 17.0.7+7
For details, see the announcements: https://mail.openjdk.org/pipermail/jdk-updates-dev/2023-April/021899.html https://mail.openjdk.org/pipermail/jdk-updates-dev/2023-April/021900.html
U-Boot
This patchset adds support to load images of the SPL’s next booting stage from a NVMe device.
Here is the revert, along with a work in progress attempt to make the DT match the hardware. Conor had asked me to share it, regardless of its early stage. It compiles, and boots Linux kernels, but there is no PLL driver I can find currently. So clocks are still hanging in PROBE_DEFER.
猜你喜欢:
- 我要投稿:发表原创技术文章,收获福利、挚友与行业影响力
- 泰晓资讯:汇总一周技术趣闻与文章,查看「Linux 资讯」
- 知识星球:独家 Linux 实战经验与技巧,订阅「Linux知识星球」
- 视频频道:泰晓学院,B 站,发布各类 Linux 视频课
- 开源小店:欢迎光临泰晓科技自营店,购物支持泰晓原创
- 技术交流:Linux 用户技术交流微信群,联系微信号:tinylab
支付宝打赏 ¥9.68元 | 微信打赏 ¥9.68元 | |
请作者喝杯咖啡吧 |
Read Album:
- Stratovirt 的 RISC-V 虚拟化支持(四):内存模型和 CPU 模型
- Stratovirt 的 RISC-V 虚拟化支持(三):KVM 模型
- Stratovirt 的 RISC-V 虚拟化支持(二):库的 RISC-V 适配
- Stratovirt 的 RISC-V 虚拟化支持(一):环境配置
- TinyBPT 和面向 buildroot 的二进制包管理服务(3):服务端说明