泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

基于泰晓RISC-V实验箱的Linux公开课
请稍侯

RISC-V Linux 内核及周边技术动态第 43 期

呀呀呀 创作于 2023/04/24

时间:20230423
编辑:晓依
仓库:RISC-V Linux 内核技术调研活动
赞助:中科院软件所 PLCT 实验室

内核动态

RISC-V 架构支持

v1: riscv: uprobes: Restore thread.bad_cause

thread.bad_cause is saved in arch_uprobe_pre_xol(), it should be restored in arch_uprobe_{post,abort}_xol() accordingly, otherwise the save operation is meaningless, this change is similar with x86 and powerpc.

v1: dt-bindings: riscv: add sv57 mmu-type

Dumping the dtb from new versions of QEMU warns that sv57 is an undocumented mmu-type. The kernel has supported sv57 for about a year, so bring it into the fold.

GIT PULL: KVM/riscv changes for 6.4

We have the following KVM RISC-V changes for 6.4: 1) ONE_REG interface to enable/disable SBI extensions 2) Zbb extension for Guest/VM 3) AIA CSR virtualization 4) Few minor cleanups and fixes

v17: Microchip Soft IP corePWM driver

Yet another version of this driver :)

This time around I’ve implemented Uwe’s simplified method for calculating the prescale & period_steps. For low values of prescale it makes for much worse approximations of the period, but as the period increases with respect to the that of the pwm’s underlying clock there is mostly no different in the approximations.

v1: riscv: mm: Ensure prot of VM_WRITE and VM_EXEC must be readable

The commit 8aeb7b17f04e (“RISC-V: Make mmap() with PROT_WRITE imply PROT_READ”) allows riscv to use mmap with PROT_WRITE only, and meanwhile mmap with w+x is also permitted. However, when userspace tries to access this page with PROT_WRITE|PROT_EXEC, which causes infinite loop at load page fault as well as it triggers soft lockup. According to riscv privileged spec, “Writable pages must also be marked readable”. The fix to drop the PAGE_COPY_EXEC and then PAGE_COPY_READ_EXEC should be just used instead. This aligns the other arches (i.e arm64) for protection_map.

v3: Add JH7110 cpufreq support

The StarFive JH7110 SoC has four RISC-V cores, and it supports up to 4 cpu frequency loads.

This patchset adds the compatible strings into the allowlist for supporting the generic cpufreq driver on JH7110 SoC. Also, it enables the axp15060 pmic for the cpu power source.

v1: RISC-V: include cpufeature.h in cpufeature.c

Automation complains: warning: symbol ‘__pcpu_scope_misaligned_access_speed’ was not declared. Should it be static?

cpufeature.c doesn’t actually include the header of the same name, as it had not previously used anything from it. The per-cpu variable is declared there, so include it to silence the complaints.

v5: Add JH7110 USB and USB PHY driver support

This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. The patch has been tested on the VisionFive 2 board.

v3: Change PWM-controlled LED pin active mode and algorithm

According to the circuit diagram of User LEDs - RGB described in the manual hifive-unleashed-a00.pdf[0] and hifive-unmatched-schematics-v3.pdf[1].

v2: Add TDM audio on StarFive JH7110

This patchset adds TDM audio driver for the StarFive JH7110 SoC. The first patch adds device tree binding for TDM module. The second patch adds the item for JH7110 audio board to the dt-binding of StarFive SoC-based boards. The third patch adds tdm driver support for JH7110 SoC. The last patch adds device node of tdm and sound card to JH7110 dts.

v1: kvmtool: RISC-V CoVE support

This series is an initial version of the support for running confidential VMs on riscv architecture. This is to get feedback on the proposed COVH, COVI and COVG extensions for running Confidential VMs on riscv. The specification is available here [0]. Make sure to build it to get the latest changes as it gets updated from time to time.

v2: Add JH7110 AON PMU support

This patchset adds aon power domain driver for the StarFive JH7110 SoC. It is used to turn on/off dphy rx/tx power switch. The series has been tested on the VisionFive 2 board.

v1: pwm: sifive: Simplify using devm_clk_get_prepared()

Instead of preparing the clk after it was requested and unpreparing in .probe()’s error path and .remove(), use devm_clk_get_prepared() which copes for unpreparing automatically.

v1: Split ptdesc from struct page

The MM subsystem is trying to shrink struct page. This patchset introduces a memory descriptor for page table tracking - struct ptdesc.

This patchset introduces ptdesc, splits ptdesc from struct page, and converts many callers of page table constructor/destructors to use ptdescs.

v1: tools/nolibc: add stackprotector support for more architectures

Add stackprotector support for all remaining architectures, except s390.

On s390 the stackprotectors are not supported in “global” mode; only “sysreg” mode which is not suppored in nolibc.

v1: RISC-V: Add steal-time support

One frequently touted benefit of virtualization is the ability to consolidate machines, increasing resource utilization. It may even be desirable to overcommit, at the risk of one or more VCPUs having to wait. Hypervisors which have interfaces for guests to retrieve the amount of time each VCPU had to wait give observers within the guests ways to account for less progress than would otherwise be expected. The SBI STA extension proposal[1] provides a standard interface for guest VCPUs to retrieve the amount of time “stolen”.

v3: riscv: mm: execute local TLB flush after populating vmemmap

The spare_init() calls memmap_populate() many times to create VA to PA mapping for the VMEMMAP area, where all “struct page” are located once CONFIG_SPARSEMEM_VMEMMAP is defined. These “struct page” are later initialized in the zone_sizes_init() function. However, during this process, no sfence.vma instruction is executed for this VMEMMAP area. This omission may cause the hart to fail to perform page table walk because some data related to the address translation is invisible to the hart. To solve this issue, the local_flush_tlb_kernel_range() is called right after the spare_init() to execute a sfence.vma instruction for the VMEMMAP area, ensuring that all data related to the address translation is visible to the hart.

v1: riscv: dts: starfive: Add PMU controller node

Add the pmu controller node for the StarFive JH7110 SoC. The PMU needs to be used by other modules, e.g. VPU,ISP,etc.

进程调度

v2: net: net/sched: cls_api: Initialize miss_cookie_node when action miss is not used

Function tcf_exts_init_ex() sets exts->miss_cookie_node ptr only when use_action_miss is true so it assumes in other case that the field is set to NULL by the caller. If not then the field contains garbage and subsequent tcf_exts_destroy() call results in a crash. Ensure that the field .miss_cookie_node pointer is NULL when use_action_miss parameter is false to avoid this potential scenario.

v2: sched/topology: add for_each_numa_cpu() macro

for_each_cpu() is widely used in kernel, and it’s beneficial to create a NUMA-aware version of the macro.

v1: net: sched: print jiffies when transmit queue time out

Although there is watchdog_timeo to let users know when the transmit queue begin stall, but dev_watchdog() is called with an interval. The jiffies will always be greater than watchdog_timeo.

v1: drm/msm: Move cmdstream dumping out of sched kthread

This is something that can block for arbitrary amounts of time as userspace consumes from the FIFO. So we don’t really want this to be in the fence signaling path.

v1: sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

A userspace service may manage uclamp dynamically for individual tasks and a child task will unintentionally inherit a pesudo-random uclamp setting. This could result in the child task being stuck with a static uclamp value that results in poor performance or poor power.

GIT PULL: sched/urgent for v6.3-rc7

pls pull an urgent scheduler fix for 6.3.

Thx.

内存管理

v1: mm/gup: disallow GUP writing to file-backed mappings by default

It isn’t safe to write to file-backed mappings as GUP does not ensure that the semantics associated with such a write are performed correctly, for instance filesystems which rely upon write-notify will not be correctly notified.

v12: cachestat: a new syscall for page cache state of files

There is currently no good way to query the page cache statistics of large files and directory trees. There is mincore(), but it scales poorly: the kernel writes out a lot of bitmap data that userspace has to aggregate, when the user really does not care about per-page information in that case. The user also needs to mmap and unmap each file as it goes along, which can be quite slow as well.

v2: migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT

This series is the result of discussion around my RFC patch [1] where I talked about completely removing the waits for the folio_lock in migrate_folio_unmap().

v1: shmem: add support for blocksize > PAGE_SIZE

This is an initial attempt to add support for block size > PAGE_SIZE for tmpfs. Why would you want this? It helps us experiment with higher order folio uses with fs APIS and helps us test out corner cases which would likely need to be accounted for sooner or later if and when filesystems enable support for this. Better review early and burn early than continue on in the wrong direction so looking for early feedback.

v1: [v2] kasan: use internal prototypes matching gcc-13 builtins

This now passes all randconfig builds on arm, arm64 and x86, but I have not tested it on the other architectures that support kasan, since they tend to fail randconfig builds in other ways. This might fail if any of the 32-bit architectures expect a ‘long’ instead of ‘int’ for the size argument.

v1: block: simplify with PAGE_SECTORS_SHIFT

A bit of block drivers have their own incantations with PAGE_SHIFT - SECTOR_SHIFT. Just simplfy and use PAGE_SECTORS_SHIFT all over.

v5: cgroup: eliminate atomic rstat flushing

A previous patch series ([1] currently in mm-stable) changed most atomic rstat flushing contexts to become non-atomic. This was done to avoid an expensive operation that scales with # cgroups and # cpus to happen with irqs disabled and scheduling not permitted. There were two remaining atomic flushing contexts after that series. This series tries to eliminate them as well, eliminating atomic rstat flushing completely.

v1: arm64: Also reset KASAN tag if page is not PG_mte_tagged

Consider the following sequence of events:

1) A page in a PROT_READ|PROT_WRITE VMA is faulted. 2) Page migration allocates a page with the KASAN allocator,causing it to receive a non-match-all tag, and uses itto replace the page faulted in 1. 3) The program uses mprotect() to enable PROT_MTE on the page faulted in 1.

v4: bio: check return values of bio_add_page

We have two functions for adding a page to a bio, __bio_add_page() which is used to add a single page to a freshly created bio and bio_add_page() which is used to add a page to an existing bio.

v1: shmem: restrict noswap option to initial user namespace

Prevent tmpfs instances mounted in an unprivileged namespaces from evading accounting of locked memory by using the “noswap” mount option.

v15: RESEND: Implement IOCTL to get and optionally clear info about PTEs

This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code.

v2: module: add debugging auto-load duplicate module support

The finit_module() system call can in the worst case use up to more than twice of a module’s size in virtual memory. Duplicate finit_module() system calls are non fatal, however they unnecessarily strain virtual memory during bootup and in the worst case can cause a system to fail to boot. This is only known to currently be an issue on systems with larger number of CPUs.

v15: Implement IOCTL to get and optionally clear info about PTEs

This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code.

v1: mm/cma: mm/cma: retry allocation of dedicated area on EBUSY

Sometimes continuous page range can’t be successfully allocated, because some pages in the range may not pass the isolation test. In this case, the CMA allocator gets an EBUSY error and retries allocation again (in the slightly shifted range).

v1: printk: Enough to disable preemption in printk deferred context

The comment above printk_deferred_enter()/exit() definition claims that it can be used only when interrupts are disabled.

v1: mm: skip CMA pages when they are not available

It is wasting of effort to reclaim CMA pages if they are not availabe for current context during direct reclaim. Skip them when under corresponding circumstance.

v1: mm/mmap: Map MAP_STACK to VM_STACK

One of the flags of mmap(2) is MAP_STACK to request a memory segment suitable for a process or thread stack. The kernel currently ignores this flags. Glibc uses MAP_STACK when mmapping a thread stack. However, selinux has an execstack check in selinux_file_mprotect() which disallows a stack VMA to be made executable.

v1: mm: reliable huge page allocator

As memory capacity continues to grow, 4k TLB coverage has not been able to keep up. On Meta’s 64G webservers, close to 20% of execution cycles are observed to be handling TLB misses when using 4k pages only. Huge pages are shifting from being a nice-to-have optimization for HPC workloads to becoming a necessity for common applications.

文件系统

v1: io_uring: add getdents support, take 2

The new API does nothing that cannot be achieved with plain syscalls so it shouldn’t be introducing any new problem, the only downside is that having the state in the file struct isn’t very uring-ish and if a better solution is found later that will probably require duplicating some logic in a new flag… But that seems like it would likely be a distant future, and this version should be usable right away.

v2: Support negative dentries on case-insensitive ext4 and f2fs

This is the v2 of the negative dentry support on case-insensitive directories. It doesn’t have any functional changes from v1, but it adds more context and a comment to the dentry->d_name access I’m doing in d_revalidate, documenting why (i understand) it is safe to do it without protecting from the parallell directory changes.

GIT PULL: Turn single vector imports into ITER_UBUF

This series turns singe vector imports into ITER_UBUF, rather than ITER_IOVEC. The former is more trivial to iterate and advance, and hence a bit more efficient. From some very unscientific testing, 60% of all iovec imports are single vector.

GIT PULL: pipe: nonblocking rw for io_uring

/* Summary */ This contains Jens’ work to support FMODE_NOWAIT and thus IOCB_NOWAIT for pipes ensuring that all places can deal with non-blocking requests.

To this end, pass down the information that this is a nonblocking request so that pipe locking, allocation, and buffer checking correctly deal with those.

v1: fs/coredump: open coredump file in O_WRONLY instead of O_RDWR

This makes it possible to make stricter apparmor profile and don’t allow the program to read any coredump in the system.

v2: shmem: Add user and group quota support for tmpfs

This is the version 2 of the quota support from tmpfs addressing some issues discussed on V1 and a few extra things, details are within each patch. Original cover-letter below.

v5: Introduce block provisioning primitives

Next revision of adding support for block provisioning requests.

v2: ext4: Handle error pointers being returned from __filemap_get_folio

Commit “mm: return an ERR_PTR from __filemap_get_folio” changed from returning NULL to returning an ERR_PTR(). This cannot be fixed in either the ext4 tree or the mm tree, so this patch should be applied as part of merging the two trees.

v10: Implement copy offload support

The patch series covers the points discussed in November 2021 virtual call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. We have covered the initial agreed requirements in this patchset and further additional features suggested by community. Patchset borrows Mikulas’s token based approach for 2 bdev implementation.

v1: Backport several fuse patches for 6.1.y

Antgroup is using 5.10.y in product environment, we found several patches are missing in 5.10.y tree. These patches are needed for us. So we backported them to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression.

v1: Backport several fuse patches for 5.15.y

Antgroup is using 5.10.y in product environment, we found several patches are missing in 5.10.y tree. These patches are needed for us. So we backported them to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression.

v1: Backport several fuse patches to 5.10.y

Antgroup is using 5.10.y in product environment, we found several patches are missing in 5.10.y tree. These patches are needed for us. So we backported them to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression.

v4: Introduce provisioning primitives for thinly provisioned storage

This patch series is revision 4 of introducing a new mechanism to pass through provision requests on stacked thinly provisioned storage devices. See [1] for original cover letter.

[1] https://lore.kernel.org/lkml/ZDnMl8A1B1+Tfn5S@redhat.com/T/#md4f20113c2242755747ae069f84be720a6751012

v3: bpf-next: FUSE BPF: A Stacked Filesystem Extension for FUSE

These patches extend FUSE to be able to act as a stacked filesystem. This allows pure passthrough, where the fuse file system simply reflects the lower filesystem, and also allows optional pre and post filtering in BPF and/or the userspace daemon as needed. This can dramatically reduce or even eliminate transitions to and from userspace.

v1: shmem: stable directory cookies

The current cursor-based directory cookie mechanism doesn’t work when a tmpfs filesystem is exported via NFS. This is because NFS clients do not open directories: each READDIR operation has to open the directory on the server, read it, then close it. The cursor state for that directory, being associated strictly with the opened struct file, is then discarded.

v1: vfs: allow using kernel buffer during fiemap operation

syzbot is reporting circular locking dependency between ntfs_file_mmap() (which has mm->mmap_lock => ni->ni_lock => ni->file.run_lock dependency) and ntfs_fiemap() (which has ni->ni_lock => ni->file.run_lock => mm->mmap_lock dependency), for commit c4b929b85bdb (“vfs: vfs-level fiemap interface”) implemented fiemap_fill_next_extent() using copy_to_user() where direct mm->mmap_lock dependency is inevitable.

网络设备

v5: net-next: net/smc: Introduce SMC-D-based OS internal communication acceleration

We found SMC-D can be used to accelerate OS internal communication, such as loopback or between two containers within the same OS instance. So this patch set provides a kind of SMC-D dummy device (we call it the SMC-D loopback device) to emulate an ISM device, so that SMC-D can also be used on architectures other than s390. The SMC-D loopback device are designed as a system global device, visible to all containers.

v4: net-next: tsnep: XDP socket zero-copy support

Implement XDP socket zero-copy support for tsnep driver. I tried to follow existing drivers like igc as far as possible. But one main

v3: net: netlink: Use copy_to_user() for optval in netlink_getsockopt().

Brad Spencer provided a detailed report [0] that when calling getsockopt() for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such options require at least sizeof(int) as length.

v5: bpf-next: bpf: add netfilter program type

Changes since last version:

  • rework test case in last patch wrt. ctx->skb dereference etc (Alexei)
  • pacify bpf ci tests, netfilter program type missed string translation in libbpf helper.

v5: drivers/net/phy: add driver for Microchip LAN867x 10BASE-T1S PHY

This patch adds support for the Microchip LAN867x 10BASE-T1S family (LAN8670/1/2). The driver supports P2MP with PLCA.

v2: can: virtio: Initial virtio CAN driver.

This is version 3 of the driver after having gotten review comments.

v1: net-next: net: dsa: MT7530, MT7531, and MT7988 improvements

This patch series is focused on simplifying the code, and improving the logic of the support for MT7530, MT7531, and MT7988 SoC switches.

There’s also a fix for the switch on the MT7988 SoC.

异步 IO

v1: io_uring: honor I/O nowait flag for read/write

When IO_URING_F_NONBLOCK is set on io_kiocb req->flag in io_write() or io_read() IOCB_NOWAIT is set for kiocb when passed it to the respective rw_iter callback. This sets REQ_NOWAIT for underlaying I/O. The result is low level driver always sees block layer request as REQ_NOWAIT even if user has submitted request with nowait = 0 e.g. fio nowait=0.

v1: tools/io_uring: Add .gitignore

Ignore {io_uring-bench,io_uring-cp}.

v2: io_uring: Pass the whole sqe to commands

These three patches prepare for the sock support in the io_uring cmd, as described in the following RFC:

https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/

v1: test/file-verify.t: Don’t run over mlock limit when run as non-root

test/file-verify tries to get 2MB of pinned memory at once, which is higher than the default allowed for non-root users in older kernels (64kb before v5.16, nowadays 8mb). Skip the test for non-root users if the registration fails instead of failing the test.

v1: Support for mapping SQ/CQ rings into huge page

io_uring SQ/CQ rings are allocated by the kernel from contigious, normal pages, and then the application mmap()’s the rings into userspace. This works fine, but does require contigious pages to be available for the given SQ and CQ ring sizes. As uptime increases on a given system, so does memory fragmentation. Entropy is invevitable.

v1: io_uring: Pass whole sqe to commands

These two patches prepares for the sock support in the io_uring cmd, as described in the following RFC:

https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/

v1: io_uring: Optimization of buffered random write

The buffered random write performance of io_uring is poor due to the following reason: By default, when performing buffered random writes, io_sq_thread will call io_issue_sqe writes req, but due to the setting of IO_URING_F_NONBLOCK, req is executed asynchronously in iou-wrk, where io_wq_submit_work calls io_issue_sqe completes the write req, with issue_flag as IO_URING_F_UNLOCKED | IO_URING_F_IOWQ, which will reduce performance. This patch will determine whether this req is a buffered random write, and if so, io_sq_thread directly calls io_issue_sqe(req, 0) completes req instead of completing it asynchronously in iou wrk.

v4: io_uring: add support for multishot timeouts

A multishot timeout submission will repeatedly generate completions with the IORING_CQE_F_MORE cflag set.

v1: for-next: another round of rsrc refactoring

The main part is Patch 3, which establishes 1:1 relation between struct io_rsrc_put and nodes, which removes io_rsrc_node_switch() / io_rsrc_node_switch_start() and all the additional complexity with pre allocations. Note, it doesn’t change any guarantees as io_queue_rsrc_removal() was doing allocations anyway and could always fail.

v1: liburing: io_uring sendto

There are two patches in this series. The first patch adds io_uring_prep_sendto() function. The second patch addd the manpage and CHANGELOG.

Rust For Linux

v1: v4.1: rust: lock: introduce SpinLock

This is the spinlock_t lock backend and allows Rust code to use the kernel spinlock idiomatically.

v1: .gitattributes: set diff driver for Rust source code files

Git supports a builtin Rust diff driver [1] since v2.23.0 (2019).

It improves the choice of hunk headers in some cases, such as

v1: Rust 1.68.2 upgrade

This is the first upgrade to the Rust toolchain since the initial Rust merge, from 1.62.0 to 1.68.2 (i.e. the latest).

BPF

v4: bpf-next: bpftool: Show map IDs along with struct_ops links.

A new link type, BPF_LINK_TYPE_STRUCT_OPS, was added to attach struct_ops to links. (226bc6ae6405) It would be helpful for users to know which map is associated with the link.

v1: bpf-next: selftests/bpf: verifier/prevent_map_lookup converted to inline assembly

Test verifier/prevent_map_lookup automatically converted to use inline assembly.

This was a part of a series [1] but could not be applied becuase another patch from a series had to be witheld.

v1: bpf-next: Second set of verifier/*.c migrated to inline assembly

This is a follow up for RFC [1]. It migrates a second batch of 23 verifier/*.c tests to inline assembly and use of ./test_progs for actual execution. Link to the first batch is [2].

v1: Dump map id instead of value for map_of_maps types

When using bpftool map dump in plain format, it is usually more convenient to show the inner map id instead of raw value. Changing this behavior would help with quick debugging with bpftool, without disruption scripted behavior. Since user could dump the inner map with id, but need to convert value.

v2: bpf-next: Introduce a new kfunc of bpf_task_under_cgroup

Trace sched related functions, such as enqueue_task_fair, it is necessary to specify a task instead of the current task which within a given cgroup.

v1: bpf-next: selftests/xsk: put MAP_HUGE_2MB in correct argument

Put the flag MAP_HUGE_2MB in the correct flags argument instead of the wrong offset argument.

v3: bpf-next: net/smc: Introduce BPF injection capability

This patches attempt to introduce BPF injection capability for SMC, and add selftest to ensure code stability.

As we all know that the SMC protocol is not suitable for all scenarios, especially for short-lived. However, for most applications, they cannot guarantee that there are no such scenarios at all. Therefore, apps may need some specific strategies to decide shall we need to use SMC or not, for example, apps can limit the scope of the SMC to a specific IP address or port.

v2: bpf: Socket lookup BPF API from tc/xdp ingress does not respect VRF bindings.

When calling socket lookup from L2 (tc, xdp), VRF boundaries aren’t respected. This patchset fixes this by regarding the incoming device’s VRF attachment when performing the socket lookups from tc/xdp.

v1: net-next: net: lan966x: Don’t use xdp_frame when action is XDP_TX

When the action of an xdp program was XDP_TX, lan966x was creating a xdp_frame and use this one to send the frame back. But it is also possible to send back the frame without needing a xdp_frame, because it possible to send it back using the page. And then once the frame is transmitted is possible to use directly page_pool_recycle_direct as lan966x is using page pools. This would save some CPU usage on this path.

v5: tracing: Add fprobe events

Here is the 5th version of improve fprobe and add a basic fprobe event support for ftrace (tracefs) and perf. Here is the previous version.

v1: bpf-next: Introduce a new bpf helper of bpf_task_under_cgroup

Trace sched related functions, such as enqueue_task_fair, it is necessary to specify a task instead of the current task which within a given cgroup to a map.

v2: bpf-next: Dynptr helpers

This patchset is the 3rd in the dynptr series. The 1st (dynptr fundamentals) can be found here [0] and the second (skb + xdp dynptrs) can be found here [1].

v2: bpf-next: Access variable length array relaxed for integer type

Add support for integer type of accessing variable length array. Add a selftest to check it.

v1: bpf-next: bpftool: Replace “__fallthrough” by a comment to address merge conflict

The recent support for inline annotations in control flow graphs generated by bpftool introduced the usage of the “__fallthrough” macro in a switch/case block in btf_dumper.c. This change went through the bpf-next tree, but resulted in a merge conflict in linux-next, because this macro has been renamed “fallthrough” (no underscores) in the meantime.

v1: bpf-next: bpf: handle another corner case in getsockopt

Martin reports another case where getsockopt EFAULTs perfectly valid callers. Let’s fix it and also replace EFAULT with pr_info_ratelimited. That should hopefully make this place less error prone.

v2: vmlinux.lds.h: Discard .note.gnu.property section

When tooling reads ELF notes, it assumes each note entry is aligned to the value listed in the .note section header’s sh_addralign field.

The kernel-created ELF notes in the .note.Linux and .note.Xen sections are aligned to 4 bytes. This causes the toolchain to set those sections’ sh_addralign values to 4.

v1: bpf-next: bpftool: Register struct_ops with a link.

You can include an optional path after specifying the object name for the ‘struct_ops register’ subcommand.

Since the commit 226bc6ae6405 (“Merge branch ‘Transit between BPF TCP congestion controls.’”) has been accepted, it is now possible to create a link for a struct_ops. This can be done by defining a struct_ops in SEC(“.struct_ops.link”) to make libbpf returns a real link. If we don’t pin the links before leaving bpftool, they will disappear. To instruct bpftool to pin the links in a directory with the names of the maps, we need to provide the path of that directory.

v6: bpf-next: bpf: Add socket destroy capability

This patch adds the capability to destroy sockets in BPF. We plan to use the capability in Cilium to force client sockets to reconnect when their remote load-balancing backends are deleted. The other use case is on-the-fly policy enforcement where existing socket connections prevented by policies need to be terminated.

v2: bpf-next: XDP-hints: XDP kfunc metadata for driver igc

Implement both RX hash and RX timestamp XDP hints kfunc metadata for driver igc.

周边技术动态

Qemu

v8: target/riscv: rework CPU extension validation

This version dropped patch 12 from v7. Alistair mentioned that it would limiti static CPUs needlesly, since there’s nothing preventing a static CPU to allow for extension changes during runtime, and that misa-w is enough to prevent write_misa() during runtime. I agree.

v1: hw/riscv: virt: Enable booting M-mode or S-mode FW from pflash0

Currently, virt machine supports two pflash instances each with 32MB size. However, the first pflash is always assumed to contain M-mode firmware and reset vector is set to this if enabled. Hence, for S-mode payloads like EDK2, only one pflash instance is available for use. This means both code and NV variables of EDK2 will need to use the same pflash.

v3: riscv: Make sure an exception is raised if a pte is malformed

As per the specification, in 64-bit, if any of the pte reserved bits Memory Protection”). In addition, we must check the napot/pbmt bits are not set if those extensions are not active.

v1: target/riscv: add Ventana’s Veyron V1 CPU

Add a virtual CPU for Ventana’s first CPU named veyron-v1. It runs exclusively for the rv64 target. It’s tested with the ‘virt’ board.

v7: target/riscv: rework CPU extensions validation

In this v7 we have three extra patches:

  • patch 4 [1] and 5 [2], both from Weiwei Li, addresses an issue that we’re going to have with Zca and RVC if we push the priv spec disabling code to the end of validation. More details can be seen on [3]. Patch 5 commit message also has some context on it;

v2: Add RISC-V vector cryptographic instruction set support

This patchset provides an implementation for Zvbb, Zvbc, Zvkned, Zvknh, Zvksh, Zvkg, and Zvksed of the draft RISC-V vector cryptography extensions as per the v20230407 version of the specification(1) (3206f07). This is an update to the patchset submitted to qemu-devel on Friday, 10 Mar 2023 16:03:01 +0000.

v2: target/riscv: Restore the predicate() NULL check behavior

When reading a non-existent CSR QEMU should raise illegal instruction exception, but currently it just exits due to the g_assert() check.

This actually reverts commit 0ee342256af9205e7388efdf193a6d8f1ba1a617. Some comments are also added to indicate that predicate() must be provided for an implemented CSR.

v1: riscv: implement Ssqosid extension and CBQRI controllers

This RFC series implements the Ssqosid extension and the sqoscfg CSR as defined in the RISC-V Capacity and Bandwidth Controller QoS Register Interface (CBQRI) specification [1]. Quality of Service (QoS) in this context is concerned with shared resources on an SoC such as cache capacity and memory bandwidth.

U-Boot

v5: Add StarFive JH7110 PCIe drvier support

This patchset needs to apply after patchset in [1]. These PCIe series patches are based on the JH7110 RISC-V SoC and VisionFive V2 board.

[1] https://patchwork.ozlabs.org/project/uboot/cover/20230329034224.26545-1-yanhong.wang@starfivetech.com

v1: u-boot-riscv/master

The following changes since commit 5db4972a5bbdbf9e3af48ffc9bc4fec73b7b6a79:

Merge tag ‘u-boot-nand-20230417’ of https://source.denx.de/u-boot/custodians/u-boot-nand-flash (2023-04-17 10:47:33 -0400)

v1: riscv: visionfive2: use OF_BOARD_SETUP

U-Boot already has a mechanism to fix up the DT before OS boot. This avoids the excessive duplication of data and work proposed by the explicit separation of 1.2a and 1.3b board revisions. It will also, to a good degree, improve the user experience, as pointed out by Matthias.



Read Album:

Read Related:

Read Latest: