泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

还在观望?5小时公开课入门RISC-V架构
请稍侯

RISC-V Linux 内核及周边技术动态第 89 期

呀呀呀 创作于 2024/05/04

时间:20240428
编辑:晓怡
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v9: riscv: rtc: sophgo: add rtc support for CV1800

Real Time Clock (RTC) is an independently powered module within the chip, which includes a 32KHz oscillator and a Power On Reset/POR submodule. It can be used for time display and timed alarm generation.

v1: riscv: sophgo: Add SG2042 external hardware monitor support

Add support for the onboard hardware monitor for SG2042.

Related SBI patch:

v1: riscv: sophgo: add spi nor support for cv1800 series

add spi nor support for cv1800 series

v2: clk: thead: Add support for TH1520 AP_SUBSYS clock controller

This series adds support for the AP sub-system clock controller in the T-Head TH1520 [1]. Yangtao Li originally submitted this series in May work in progress to me.

v1: kprobe/ftrace: bail out if ftrace was killed

If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet the kprobes will still be active, and when triggered, they will use the freed memory, likely resulting in a page fault and panic.

v4: riscv: Support vendor extensions and xtheadvector

This patch series ended up much larger than expected, please bear with me! The goal here is to support vendor extensions, starting at probing the device tree and ending with reporting to userspace.

v3: riscv: Apply Zawrs when available

Zawrs provides two instructions (wrs.nto and wrs.sto), where both are meant to allow the hart to enter a low-power state while waiting on a store to a memory location. The instructions also both wait an implementation-defined “short” duration (unless the implementation terminates the stall for another reason). The difference is that while wrs.sto will terminate when the duration elapses, wrs.nto, depending on configuration, will either just keep waiting or an ILL exception will be raised. Linux will use wrs.nto, so if platforms have an implementation which falls in the “just keep waiting” category (which is not expected), then it should not advertise Zawrs in the hardware description.

v6: mm: jit/text allocator

The patches are also available in git: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v6

v15: riscv: sophgo: add clock support for sg2042

This series adds clock controller support for sophgo sg2042.

v9: add support for EXAR XR20M1172 UART

EXAR XR20M1172 UART is mostly SC16IS762-compatible, but it has additional register which can change UART multiplier to 4x and 8x, similar to UPF_MAGIC_MULTIPLIER does.

v4: Add StarFive’s StarLink Cache Controller

StarFive’s StarLink Cache Controller flush/invalidates cache using non- conventional RISC-V Zicbom extension instructions. This driver provides the cache handling on StarFive RISC-V SoC.

v1: Add I2C support on TH1520

This adds I2C support in the device tree of the T-Head TH1520 RISCV-SoC and a default configuration for the BeagleV-Ahead. It appears that the TH1520 I2C is already supported in the upstream kernel through the Synopsis Designware I2C adapter driver.

v1: riscv: prevent pt_regs corruption for secondary idle threads

Top of the kernel thread stack should be reserved for pt_regs. However this is not the case for the idle threads of the secondary boot harts. Their stacks overlap with their pt_regs, so both may get corrupted.

v2: RISC-V: clarify what some RISCV_ISA* config options do

During some discussion on IRC yesterday and on Pu’s bpf patch [1] I noticed that these RISCV_ISA* Kconfig options are not really clear about their implications. Many of these options have no impact on what userspace is allowed to do, for example an application can use Zbb regardless of whether or not the kernel does.

v2: clock, reset: microchip: move all mpfs reset code to the reset subsystem

Stephen and Philipp, while reviewing patches, said that all of the aux device creation and the register read/write code could be moved to the reset subsystem, leaving the clock driver with no implementations of reset_* functions at all. Move them.

v3: Add StarFive’s JH8100 StarLink Cache Controller

StarFive’s JH8100 StarLink Cache Controller flush/invalidates cache using non- conventional RISC-V Zicbom extension instructions. This driver provides the cache handling on StarFive RISC-V SoC.

v1: KVM: selftest: Define _GNU_SOURCE for all selftests code

Define _GNU_SOURCE is the base CFLAGS instead of relying on selftests to manually #define _GNU_SOURCE, which is repetitive and error prone.

v3: Add support for a few Zc* extensions as well as Zcmop

Add support for (yet again) more RVA23U64 missing extensions. Add support for Zcmop, Zca, Zcf, Zcd and Zcb extensions isa string parsing, hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have been left out since they target microcontrollers/embedded CPUs and are not needed by RVA23U64.

v3: sysctl: treewide: constify ctl_table argument of sysctl handlers

  • Patch 1 is a bugfix for the stack_erasing sysctl handler
  • Patches 2-10 change various helper functions throughout the kernel to be able to handle ‘const ctl_table’.
  • Patch 11 changes the signatures of all proc handlers through the tree. Some other signatures are also adapted, for details see the commit message.

v3: perf kvm: Add kvm stat support on riscv

‘perf kvm stat report/record’ generates a statistical analysis of KVM events and can be used to analyze guest exit reasons. This patch tries to add stat support on riscv.

进程调度

v2: sched/eevdf: Prevent vlag from going out of bounds when reweight_eevdf

kernel encounters the following error when running workload:

内存管理

v1: mm/pagemap: Make trylock_page return bool

Make trylock_page return bool to align the return values of folio_trylock function and it also corresponds to its comment.

v1: mm/rmap: change the type of we_locked from int to bool

Change the type of we_locked from int to bool because folio_trylock return bool

v11: mm: report per-page metadata information

Adds a global Memmap field to /proc/meminfo. This information can be used by users to see how much memory is being used by per-page metadata, which can vary depending on build configuration, machine architecture, and system use.

v1: mm/slub: mark racy access on slab->freelist

In deactivate_slab(), slab->freelist can be changed concurrently. Mark data race on slab->freelist as benign using READ_ONCE.

v1: mm/swapfile: mark racy access on si->highest_bit

In scan_swap_map_slots(), si->highest_bit can by changed by swap_range_alloc() concurrently. All reads on si->highest_bit except one is either protected by lock or read using READ_ONCE. So mark the one racy read on si->highest_bit as benign using READ_ONCE.

v2: memcg: reduce memory consumption by memcg stats

Most of the memory overhead of a memcg object is due to memcg stats maintained by the kernel. Since stats updates happen in performance critical codepaths, the stats are maintained per-cpu and numa specific stats are maintained per-node * per-cpu. This drastically increase the overhead on large machines i.e. large of CPUs and multiple numa nodes.

v1: mm/damon: add a DAMOS filter type for page granularity access recheck

Add a new type of DAMOS filter, namely ‘young’ for such a case. It checks each page of DAMOS target region is accessed since the last check, and filters it out or in if ‘matching’ parameter is ‘true’ or ‘false’, respectively.

v5: mm/rmap: do not add fully unmapped large folio to deferred split list

In __folio_remove_rmap(), a large folio is added to deferred split list if any page in a folio loses its final mapping. But it is possible that the folio is fully unmapped and adding it to deferred split list is unnecessary.

v1: Make find_tcp_vma() more efficient

Liam asked me if we could do away with the “bool *mmap_locked” parameter, and the problem is that some architctures don’t support CONFIG_PER_VMA_LOCK yet. But we can abstract it … something like this maybe?

v1: mm: use memalloc_nofs_save() in page_cache_ra_order()

See commit f2c817bed58d (“mm: use memalloc_nofs_save in readahead path”), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio.

v2: iommu/intel: Free empty page tables on unmaps

This series frees empty page tables on unmaps. It intends to be a low overhead feature.

v1: mm/slub: Avoid recursive loop with kmemleak

The system will immediate fill up stack and crash when both CONFIG_DEBUG_KMEMLEAK and CONFIG_MEM_ALLOC_PROFILING are enabled. Avoid allocation tagging of kmemleak caches, otherwise recursive allocation tracking occurs.

v1: alloc_tag: Tighten file permissions on /proc/allocinfo

The /proc/allocinfo file exposes a tremendous about of information about kernel build details, memory allocations (obviously), and potentially even image layout (due to ordering). As this is intended to be consumed by system owners (like /proc/slabinfo), use the same file permissions as there: 0400.

v4: enable bs > ps in XFS

This is the fourth version of the series that enables block size > page size (Large Block Size) in XFS. The context and motivation can be seen in cover letter of the RFC v1[1]. We also recorded a talk about this effort at LPC [3], if someone would like more context on this effort.

v2: mm: add more readable thp_vma_allowable_order_foo()

There are too many bool arguments in thp_vma_allowable_orders(), adding some more readable thp_vma_allowable_order_foo(),

v1: mm/huge_memory: move writeback and truncation checks early

We should check as early as possible if we should bail due to writeback or truncation. This will allow us to add further sanity checks earlier as well.

v3: slab: Introduce dedicated bucket allocator

Series change history:

v11: net-next:pull request: net: intel: start The Great Code Dedup + Page Pool for iavf

Not a secret there’s a ton of code duplication between two and more Intel ethernet modules. Before introducing new changes, which would need to be copied over again, start decoupling the already existing duplicate functionality into a new module, which will be shared between several Intel Ethernet drivers.

v3: clean-up for create_kmalloc_caches()

I am cleanning up unnecessary code in create_kmalloc_caches(). I added one more commit to remove the check for NULL kmalloc_cachee according to the review comments like below.

v2: mm: migrate: support poison recover from migrate folio

The folio migration is widely used in kernel, memory compaction, memory hotplug, soft offline page, numa balance, memory demote/promotion, etc, but once access a poisoned source folio when migrating, the kerenl will panic.

v1: mm: introduce per-order mTHP split counters

At present, the split counters in THP statistics no longer include PTE-mapped mTHP. Therefore, we want to introduce per-order mTHP split counters to monitor the frequency of mTHP splits. This will assist developers in better analyzing and optimizing system performance.

v1: selftests/mm: soft-dirty should fail if a testcase fails

Previously soft-dirty was unconditionally exiting with success, even if one of it’s testcases failed. Let’s fix that so that failure can be reported to automated systems properly.

v2: percpu: simplify the logic of pcpu_alloc_first_chunk()

In the logic for hiding the end region of the chunk, there are several places where the same value ‘region_bits - offset_bits’ is calculated over and over again using different methods. Eliminating these redundant calculations can improve code readability.

v1: branch prediction hint

This is followup patch on discussion happened during mseal v10 discussion [1]

Add branch perdiction hint to mseal code.

v2: ptdump: add non-leaf descriptor support

Add an optional note_non_leaf parameter to ptdump, causing note_page to be called on non-leaf descriptors. Implement this functionality on arm64 by printing table descriptors along with table-specific permission sets.

网络设备

This series adds representor support for each rvu devices. When switchdev mode is enabled, representor netdev is registered for each rvu device.

v1: net-next: net: dsa: realtek: provide own phylink MAC operations

Convert realtek to provide its own phylink MAC operations, thus avoiding the shim layer in DSA’s port.c. We need to provide a stub for the mandatory mac_config() method for rtl8366rb.

v2: net-next: net: dsa: mt7530: do not set MT7530_P5_DIS when PHY muxing is being used

DSA initalises the ds->num_ports amount of ports in dsa_switch_touch_ports(). When the PHY muxing feature is in use, port 5 won’t be defined in the device tree. Because of this, the type member of the dsa_port structure for this port will be assigned DSA_PORT_TYPE_UNUSED. The dsa_port_setup() function calls ds->ops->port_disable() when the port type is DSA_PORT_TYPE_UNUSED.

v1: stable,5.10: introduce stop timer to solve the problem of CVE-2024-26865

For the CVE-2024-26865 issue, the stable 5.10 branch code also involves this issue. However, the patch of the mainline version cannot be used on stable 5.10 branch. The commit 740ea3c4a0b2(“tcp: Clean up kernel listener’ s reqsk in inet_twsk_purge()”) is required to stop the timer.

[PATCH v4 net-next v4 0/6] Add TCP fraglist GRO support

When forwarding TCP after GRO, software segmentation is very expensive, especially when the checksum needs to be recalculated. One case where that’s currently unavoidable is when routing packets over PPPoE. Performance improves significantly when using fraglist GRO implemented in the same way as for UDP.

v1: sctp: prefer struct_size over open coded arithmetic

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2].

v1: bpf: Add BPF_PROG_TYPE_CGROUP_SKB attach type enforcement in BPF_LINK_CREATE

Syzkaller found a case where it’s possible to attach cgroup_skb program to the sockopt hooks. Apparently it’s currently possible to do that, but only when using BPF_LINK_CREATE API. The first patch in the series has more info on why that happens.

v2: batman-adv: Add flex array to struct batadv_tvlv_tt_data

The “struct batadv_tvlv_tt_data” uses a dynamically sized set of trailing elements. Specifically, it uses an array of structures of type “batadv_tvlv_tt_vlan_data”. So, use the preferred way in the kernel declaring a flexible array [1].

v1: net-next: net: txgbe: use phylink_pcs_change() to report PCS link change events

Use phylink_pcs_change() when reporting changes in PCS link state to phylink as the interrupts are informing us about changes to the PCS state.

v1: net-next: net: dsa: microchip: use phylink_mac_ops for ksz driver

Three of these cases are handled by shimming the existing DSA calls through ksz_dev_ops, and the final case is handled through a conditional in ksz_phylink_mac_config(). These can all be handled with separate phylink_mac_ops.

v1: iwl-net: idpf: Interpret .set_channels() input differently

Unlike ice, idpf does not check, if user has requested at least 1 combined channel. Instead, it relies on a check in the core code. Unfortunately, the check does not trigger for us because of the hacky .set_channels() interpretation logic that is not consistent with the core code.

v2: net-next: ipv6: introduce dst_rt6_info() helper

Instead of (struct rt6_info *)dst casts, we can use :

#define dst_rt6_info(_ptr)
container_of_const(_ptr, struct rt6_info, dst)

Some places needed missing const qualifiers :

ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway()

v3: Documentation: networking: document ISO 15765-2

While the in-kernel ISO 15765-2 (ISO-TP) stack is fully functional and easy to use, no documentation exists for it.

v1: iwl: idpf: don’t enable NAPI and interrupts prior to allocating Rx buffers

Currently, idpf enables NAPI and interrupts prior to allocating Rx buffers. This may lead to frame loss (there are no buffers to place incoming frames) and even crashes on quick ifup-ifdown.

v2: net-next: net: phy: micrel: Add support for PTP_PF_EXTTS for lan8814

Extend the PTP programmable gpios to implement also PTP_PF_EXTTS function. The pins can be configured to capture both of rising and falling edge. Once the event is seen, then an interrupt is generated and the LTC is saved in the registers.

[RFC net-next (what uAPI?) ice: add support for more than 16 RSS queues for VF

Main aim for this RFC is to ask for the preferred uAPI, but let me start with some background. Then I describe considered uAPIs, from most readily available to most implementation-needed.

v2: Series to deliver Ethernets for STM32MP13

STM32MP13 is STM32 SOC with 2 GMACs instances This board have 2 RMII phy:-Ethernet1: RMII with crystal-Ethernet2: RMII without crystal Rework dwmac glue to simplify management for next stm32 Add support for PHY regulator

v1: net-next: mlxsw: Improve events processing performance

Amit Cohen writes:

Spectrum ASICs only support a single interrupt, it means that all the events are handled by one IRQ (interrupt request) handler.

v1: net-next: ipv6: use call_rcu_hurry() in fib6_info_release()

This is a followup of commit c4e86b4363ac (“net: add two more call_rcu_hurry()”)

fib6_info_destroy_rcu() is calling nexthop_put() or fib6_nh_release()

v1: ipsec: xfrm: Correct spelling mistake in xfrm.h comment

A spelling error was found in the comment section of include/uapi/linux/xfrm.h. Since this header file is copied to many userspace programs and undergoes Debian spellcheck, it’s preferable to fix it in upstream rather than downstream having exceptions.

This commit fixes the spelling mistake.

v1: net: qede: avoid overruling error codes

This series fixes the qede driver, so that qede_parse_flow_attr() and it’s subfunctions

v7: net-next: net/ipv4: add tracepoint for icmp_send

Introduce a tracepoint for icmp_send, which can help users to get more detail information conveniently when icmp abnormal events happen.

v1: net: wwan: Add net device name for error message print

In my local, I got an error print in dmesg like below: “sequence number glitch prev=487 curr=0” After checking, it belongs to mhi_wwan_mbim.c. Refer to the usage of this API in other files, I think we should add net device name print before message context.

v13: ipsec-next: xfrm: Introduce direction attribute for SA

Inspired by the upcoming IP-TFS patch set, and confusions experienced in the past due to lack of direction attribute on SAs, add a new direction “dir” attribute. It aims to streamline the SA configuration process and enhance the clarity of existing SA attributes.

v1: net-next: inet: use call_rcu_hurry() in inet_free_ifa()

This is a followup of commit c4e86b4363ac (“net: add two more call_rcu_hurry()”)

v2: net-next: virtio_net: support getting initial value of irq coalesce

Patch 1 from Xuan: the virtnet cvq supports to get result from the device.

[PATCH v3 net-next v3 0/6] Add TCP fraglist GRO support

When forwarding TCP after GRO, software segmentation is very expensive, especially when the checksum needs to be recalculated.

v1: net-next: net: give more chances to rcu in netdev_wait_allrefs_any()

This came while reviewing commit c4e86b4363ac (“net: add two more call_rcu_hurry()”).

安全增强

v1: clocksource/drivers/rda: Add sched_clock_register for RDA8810PL SoC

Add sched_clock_register during init bootup log before this patch: [ 8.044000] UBIFS (ubi0:4): Mounting in unauthenticated mode [ 8.044000] UBIFS error (ubi0:4 pid 176): ubifs_read_superblock: bad superblock, error 4

v3: perf/x86/amd/uncore: Use kcalloc() instead of kzalloc()

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1].

v1: Input: ff-core - prefer struct_size over open coded arithmetic

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2].

v1: hardening: Refresh KCFI options, add some more

Add some stuff that got missed along the way:

  • CONFIG_UNWIND_PATCH_PAC_INTO_SCS=y so SCS vs PAC is hardware selectable.

v1: clk: bcm: Move a couple of __counted_by initializations

This series addresses two UBSAN warnings I see on my Raspberry Pi 4 with recent releases of clang that support __counted_by by moving the initializations of the element count member before any accesses of the flexible array member.

v4: net: dsa: lan9303: use ethtool_puts() for lan9303_get_strings()

This pattern of strncpy with some pointer arithmetic setting fixed-sized intervals with string literal data is a bit weird so let’s use ethtool_puts() as this has more obvious behavior and is less-error prone.

v1: wifi: nl80211: Avoid address calculations via out of bounds array indexing

Before request->channels[] can be used, request->n_channels must be set.

v1: Annotate atomics for signed integer wrap-around

As part of enabling the signed integer overflow sanitizer for production use, we have to annotated the atomics which expect to use wrapping signed values. Do this for x86, arm64, and the fallbacks. Additionally annotate the first place anyone will trip over signed integer wrap-around: ipv4, which has traditionally included the comment hint about how to debug sanitizer issues.

v1: ubsan: Avoid i386 UBSAN handler crashes with Clang

When generating Runtime Calls, Clang doesn’t respect the -mregparm=3 option used on i386. Hopefully this will be fixed correctly in Clang 19: https://github.com/llvm/llvm-project/pull/89707

v7: arm64: qcom: add AIM300 AIoT board support

Add AIM300 AIoT support along with usb, ufs, regulators, serial, PCIe, and PMIC functions. AIM300 Series is a highly optimized family of modules designed to support AIoT applications. It integrates QCS8550 SoC, UFS and PMIC chip etc.

v1: Introduce STM32 DMA3 support

STM32 DMA3 is a direct memory access controller with different features depending on its hardware configuration. It is either called LPDMA (Low Power), GPDMA (General Purpose) or HPDMA (High Performance), and it can be found in new STM32 MCUs and MPUs.

v5: checkpatch: add check for snprintf to scnprintf

“There is a general misunderstanding amongst engineers that {v}snprintf() returns the length of the data actually encoded into the destination array. However, as per the C99 standard {v}snprintf() really returns the length of the data that would have been written if there were enough space for it. This misunderstanding has led to buffer-overruns in the past. It’s generally considered safer to use the {v}scnprintf() variants in their place (or even sprintf() in simple cases). So let’s do that.”

异步 IO

v1: Read/Write with meta/integrity

This adds a new io_uring interface to specify meta along with read/write. Beyond reading/writing meta, the interface also enables (a) flags to control data-integrity checks, (b) application tag.

v1: io_uring/rw: reinstate thread check for retries

Allowing retries for everything is arguably the right thing to do, now that every command type is async read from the start. But it’s exposed a few issues around missing check for a retry (which cca6571381a0 exposed), and the fixup commit for that isn’t necessarily 100% sound in terms of iov_iter state.

Rust For Linux

v1: rust: add ‘firmware’ field support to module! macro

This adds ‘firmware’ field support to module! macro, corresponds to MODULE_FIRMWARE macro.

v1: rust: hrtimer: introduce hrtimer support

This patch adds support for intrusive use of the hrtimer system. For now, only one timer can be embedded in a Rust struct.

v1: kbuild: rust: force alloc extern to allow “empty” Rust files

The reason is that we pass -Zcrate-attr='feature(new_uninit)' (together with -Zallow-features=new_uninit) to let non-rust/ code use that unstable feature.

BPF

v1: bpf-next: bpf: add support to read cpu_entry in bpf program

Add new field “cpu_entry” to bpf_perf_event_data which could be read by bpf programs attached to perf events. The value contains the CPU value recorded by specifying sample_type with PERF_SAMPLE_CPU when calling perf_event_open().

v2: bpf-next: bpf_helpers.h: define bpf_tail_call_static when building with GCC

The definition of bpf_tail_call_static in tools/lib/bpf/bpf_helpers.h is guarded by a preprocessor check to assure that clang is recent enough to support it. This patch updates the guard so the function is compiled when using GCC 13 or later as well.

v1: bpf-next: libbpf: support “module:function” syntax for tracing programs

In some situations, it is useful to explicitly specify a kernel module to search for a tracing program target (e.g. when a function of the same name exists in multiple modules or in vmlinux).

v1: bpf-next: bpf: avoid casts from pointers to enums in bpf_tracing.h

The BPF_PROG, BPF_KPROBE and BPF_KSYSCALL macros defined in tools/lib/bpf/bpf_tracing.h use a clever hack in order to provide a convenient way to define entry points for BPF programs as if they were normal C functions that get typed actual arguments, instead of as elements in a single “context” array argument.

v8: dwarves: pahole: Inject kfunc decl tags into BTF

This patchset teaches pahole to parse symbols in .BTF_ids section in vmlinux and discover exported kfuncs. Pahole then takes the list of kfuncs and injects a BTF_KIND_DECL_TAG for each kfunc.

v1: bpf-next: bpf: add mrtt and srtt as ctx->args for BPF_SOCK_OPS_RTT_CB

These provides more information about tcp RTT estimation. The selftest for BPF_SOCK_OPS_RTT_CB is extended for the added args.

v3: selftests/bpf: Add ring_buffer__consume_n test.

Add a testcase for the ring_buffer__consume_n() API.

The test produces multiple samples in a ring buffer, using a sys_getpid() fentry prog, and consumes them from user-space in batches, rather than consuming all of them greedily, like ring_buffer__consume() does.

v1: bpf_wq followup series

Few patches that should have been there from day 1.

v1: bpf-next: use network helpers, part 3

This patchset adds opts argument for __start_server, and adds setsockopt pointer together with optval and optlen into struct network_helper_opts to make start_server_addr helper more flexible. With these modifications, many duplicate codes can be dropped.

v1: rethook: inline arch_rethook_trampoline_callback() in assembly code

At the lowest level, rethook-based kretprobes on x86-64 architecture go through arch_rethoook_trampoline() function, manually written in assembly, which calls into a simple arch_rethook_trampoline_callback() function, written in C, and only doing a few straightforward field assignments, before calling further into rethook_trampoline_handler(), which handles kretprobe callbacks generically.

v1: bpf-next: bpf: Add bpf_guard_preempt() convenience macro

Add bpf_guard_preempt() macro that uses newly introduced bpf_preempt_disable/enable() kfuncs to guard a critical section.

v3: bpf-next: bpf/verifier: range computation improvements

This is the third series of these patches. Thank you Eduard and Yonghong for your reviews.

v5: bpf-next: Replace mono_delivery_time with tstamp_type

Patch 1 :- This patch takes care of only renaming the mono delivery timestamp to tstamp_type with no change in functionality of existing available code in kernel also Starts assigning tstamp_type with either mono or real and introduces a new enum in the skbuff.h, again no change in functionality of the existing available code in kernel , just making the code scalable.

v1: Objpool performance improvements

Improve objpool (used heavily in kretprobe hot path) performance with two improvements:

  • inlining performance critical objpool_push()/objpool_pop() operations;
  • avoiding re-calculating relatively expensive nr_possible_cpus().

v1: net-next: igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet

This patch adds support to per-packet Tx hardware timestamp request to AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that user needs to enable Tx HW timestamp capability via igc_ioctl() with SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request.

v1: dwarves: btf_encoder: add “distilled_base” BTF feature to split BTF generation

Adding “distilled_base” to –btf_features when generating split BTF will create split and .BTF.base BTF - the latter allows us to map references from split BTF to base BTF, even if that base BTF has changed. It does this by providing just enough information about the base types in the .BTF.base section.

v2: bpf-next: bpf: support resilient split BTF

Split BPF Type Format (BTF) provides huge advantages in that kernel modules only have to provide type information for types that they do not share with the core kernel; for core kernel types, split BTF refers to core kernel BTF type ids.

v1: bpf-next: bpf: add a few more options for GCC_BPF in selftests/bpf/Makefile

This little patch modifies selftests/bpf/Makefile so it passes the following extra options when invoking gcc-bpf:

-gbtfThis makes GCC to emit BTF debug info in .BTF and .BTF.ext.

v2: bpf-next: Introduce bpf_preempt_{disable,enable}

This set introduces two kfuncs, bpf_preempt_disable and bpf_preempt_enable, which are wrappers around preempt_disable and preempt_enable in the kernel. These functions allow a BPF program to have code sections where preemption is disabled. There are multiple use cases that are served by such a feature, a few are listed below:

v2: Dump off-cpu samples directly

As mentioned in: https://bugzilla.kernel.org/show_bug.cgi?id=207323

Currently, off-cpu samples are dumped when perf record is exiting. This results in off-cpu samples being after the regular samples. Also, samples are stored in large BPF maps which contain all the stack traces and accumulated off-cpu time, but they are eventually going to fill up after running for an extensive period. This patch fixes those problems by dumping samples directly into perf ring buffer, and dispatching those samples to the correct format.

v1: bpf-next: Add some ‘malloc’ failure checks

The “malloc” call may not be successful.Add the malloc failure checking to avoid possible null dereference.

v1: bpf-next: check bpf_dummy_struct_ops program params for test runs

When doing BPF_PROG_TEST_RUN for bpf_dummy_struct_ops programs, execution should be rejected when NULL is passed for non-nullable params, because for such params verifier assumes that such params are never NULL and thus might optimize out NULL checks.

v1: net-next: selftests: net: extract BPF building logic from the Makefile

This has been sitting in my tree for a while. I will soon add YNL/libynl support for networking selftests. This prompted a small cleanup of the selftest makefile for net/. We don’t want to be piling logic for each library in there. YNL will get its own .mk file which can be included. Do the same for the BPF building section, already.

v1: dwarves: replace –btf_features=”all” with “default”

Use of “all” in –btf_features is confusing; use the “default” keyword to request default set of BTF features for encoding instead. Then non-standard features can be added in a more natural way; i.e.

v4: net-next: dma: skip calling no-op sync ops when possible

The series grew from Eric’s idea and patch at [0]. The idea of using the shortcut for direct DMA as well belongs to Chris.

v2: bpf-next: use network helpers, part 2

This patchset uses more network helpers in test_sock_addr.c, but first of all, patch 2 is needed to make network_helpers.c independent of test_progs.c. Then network_helpers.h can be included into test_sock_addr.c without compile errors.

v1: arch/Kconfig: Move SPECULATION_MITIGATIONS to arch/Kconfig

SPECULATION_MITIGATIONS is currently defined only for x86. As a result, IS_ENABLED(CONFIG_SPECULATION_MITIGATIONS) is always false for other archs. f337a6a21e2f effectively set “mitigations=off” by default on non-x86 archs, which is not desired behavior. Jakub observed this change when running bpf selftests on s390 and arm64.

v10: bpf-next: BPF crypto API framework

This series introduces crypto kfuncs to make BPF programs able to utilize kernel crypto subsystem. Crypto operations made pluggable to avoid extensive growth of kernel when it’s not needed.

v1: libbpf: extending BTF_KIND_INIT to accommodate some unusual types

In btf__add_int, the size of the new btf_kind_int type is limited. When the size is greater than 16, btf__add_int fails to be added and -EINVAL is returned. This is usually effective.

周边技术动态

Qemu

v3: target/riscv: Raise exceptions on wrs.nto

Implementing wrs.nto to always just return is consistent with the specification, as the instruction is permitted to terminate the stall for any reason, but it’s not useful for virtualization, where we’d like the guest to trap to the hypervisor in order to allow scheduling of the lock holding VCPU.

v2: target/riscv/kvm: tolerate KVM disable ext errors

In this new version we changed the commit message a bit and we’re now only handling the case for EINVAL. Both were suggested by Drew in v1.

v1: target/riscv: change RISCV_EXCP_SEMIHOST exception number to 63

The current semihost exception number (16) is a reserved number (range [16-17]). The upcoming double trap specification uses that number for the double trap exception. Since the privileged spec (Table 22) defines ranges for custom uses change the semihosting exception number to 63 which belongs to the range [48-63] in order to avoid any future collisions with reserved exception.

v6: target/riscv/kvm/kvm-cpu.c: kvm_riscv_handle_sbi() fail with vendor-specific SBI

kvm_riscv_handle_sbi() may return not supported return code to not trigger qemu abort with vendor-specific sbi.

v5: target/riscv/kvm/kvm-cpu.c: kvm_riscv_handle_sbi() fail with vendor-specific sbi.

kvm_riscv_handle_sbi() may return not supported return code to not trigger qemu abort with vendor-specific sbi.

v4: riscv: thead: Add th.sxstatus CSR emulation

The th.sxstatus CSR can be used to identify available custom extension on T-Head CPUs. The CSR is documented here:https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadsxstatus.adoc

U-Boot

v3: riscv: Rename spl_soc_init() to spl_dram_init()

This patch series renames spl_soc_init() to spl_dram_init() since the purpose of the function is to initialization the DRAM on sifive/starfive boards. spl_dram_init() is a commonly used function for this purpose.

RISC-V u-boot unable to boot QEMU using ‘-cpu max’

In QEMU we have a ‘max’ type CPU that implements (almost) all extensions that QEMU is able to emulate. Recently, in QEMU commit 249e0905d05, we bumped the extensions for this CPU.



Read Album:

Read Related:

Read Latest: