RISC-V Linux 内核及周边技术动态第 47 期

呀呀呀创作于 2023/06/05

时间：20230528
编辑：晓依
仓库：RISC-V Linux 内核技术调研活动
赞助：PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v1: riscv: Reduce ARCH_KMALLOC_MINALIGN to 8

Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus it brings some bad effects to for coherent platforms:
Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and kmalloc-8 slab caches don’t exist any more, they are replaced with either kmalloc-128 or kmalloc-64.

v1: RISC-V: mark hibernation as nonportable

Hibernation support depends on firmware marking its reserved/PMP protected regions as not accessible from Linux. The latest versions of the de-facto SBI implementation (OpenSBI) do not do this, having dropped the no-map property to enable 1 GiB huge page mappings by the kernel. This was exposed by commit 3335068f8721 (“riscv: Use PUD/P4D/PGD pages for the linear mapping”), which made the first 2 MiB of DRAM (where SBI typically resides) accessible by the kernel.

v2: RISC-V: KVM: Ensure SBI extension is enabled

Ensure guests can’t attempt to invoke SBI extension functions when the SBI extension’s probe function has stated that the extension is not available.

v1: Add initialization of clock for StarFive JH7110 SoC

This patchset adds initial rudimentary support for the StarFive Quad SPI controller driver. And this driver will be used in StarFive’s VisionFive 2 board. In 6.4, the QSPI_AHB and QSPI_APB clocks changed from the default ON state to the default OFF state, so these clocks need to be enabled in the driver.At the same time, dts patch is added to this series.

v2: RISCV: Add KVM_GET_REG_LIST API

KVM_GET_REG_LIST will dump all register IDs that are available to KVM_GET/SET_ONE_REG and It’s very useful to identify some platform regression issue during VM migration.

v1: riscv: Kconfig: Add select ARM_AMBA to SOC_STARFIVE

Selects ARM_AMBA platform support for StarFive SoCs required by spi and crypto dma engine.

v1: tools/nolibc: riscv: Add full rv32 support

In the first series [1], we have fixed up the compile errors about _start and __NR_llseek for rv32, but left compile errors about tons of time32 syscalls (removed after kernel commit d4c08b9776b3 (“riscv: Use latest system call ABI”)) and the missing fstat in nolibc-test.c [2], now we have fixed up all of them.

v1: Add support for Allwinner GPADC on D1/T113s/R329 SoCs

This series adds support for general purpose ADC (GPADC) on new Allwinner’s SoCs, such as D1, T113s and R329. The implemented driver provides basic functionality for getting ADC channels data.

v2: dmaengine: pl330: rename _start to prevent build error

“_start” is used in several arches and proably should be reserved for ARCH usage. Using it in a driver for a private symbol can cause a build error when it conflicts with ARCH usage of the same symbol.

v1: riscv: mm: try VMA lock-based page fault handling first

Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails.

v2: riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

When trying to run linux with various opensource riscv core on resource limited FPGA platforms, for example, those FPGAs with less than 16MB SDRAM, I want to save mem as much as possible. One of the major technologies is kernel size optimizations, I found that riscv does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which passes -fdata-sections, -ffunction-sections to CFLAGS and passes the –gc-sections flag to the linker.

v3: Add Zawrs support and use it for spinlocks

Zawrs [0] was ratified in november 2022 [1], so I’ve resurrect the patch adding Zawrs support for spinlocks and adapted it to recent kernel changes.
Also incorporated are the nice comments David Laight provided on v2.

v1: tools/nolibc: autodetect stackprotector availability from compiler

As suggested by Willy it is possible to detect the availability of stackprotector via preprocessor defines. Make use of that to simplify the code and interface of nolibc.

v1: RISC-V: KVM: Redirect AMO load/store misaligned traps to guest

The M-mode redirects an unhandled misaligned trap back to S-mode when not delegating it to VS-mode(hedeleg). However, KVM running in HS-mode terminates the VS-mode software when back from M-mode. The KVM should redirect the trap back to VS-mode, and let VS-mode trap handler decide the next step.

进程调度

v1: sched/psi: make psi_cgroups_enabled static

The static key psi_cgroups_enabled is only used inside file psi.c. Make it static.

v1: sched/fair: Don’t balance task to its current running CPU

Further investigation shows that the warning is superfluous, the migration disabled task is just going to be migrated to its current running CPU. This is because that on load balance if the dst_cpu is not allowed by the task, we’ll re-select a new_dst_cpu as a candidate. If no task can be balanced to dst_cpu we’ll try to balance the task to the new_dst_cpu instead. In this case when the migration disabled task is not on CPU it only allows to run on its current CPU, load balance will select its current CPU as new_dst_cpu and later triggers the the warning above.

v1: sched/deadline: simplify dl_bw_cpus() using cpumask_weight_and()

cpumask_weight_and() can be used to count of bits both in rd->span and cpu_active_mask. No functional change intended.

内存管理

v1: Do not print page type when the page has no type

It is confusing and unnecessary to print the page type when the page has no type.

v4: block: Make old dio use iov_iter_extract_pages() and page pinning

Here are three patches that go on top of the similar patches for bio structs now in the block tree that make the old block direct-IO code use iov_iter_extract_pages() and page pinning.

v1: tmpfs.5: extend with new noswap documentation

Linux commit 2c6efe9cf2d7 (“shmem: add support to ignore swap”) merged as of v6.4 added support to disable swap for tmpfs mounts.
This extends the man page to document that.

v3: mm: zswap: shrink until can accept

This update addresses an issue with the zswap reclaim mechanism, which hinders the efficient offloading of cold pages to disk, thereby compromising the preservation of the LRU order and consequently diminishing, if not inverting, its performance benefits.

v1: net-next: crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)

Here’s the fourth tranche of patches towards providing a MSG_SPLICE_PAGES internal sendmsg flag that is intended to replace the ->sendpage() op with calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol that it should splice the pages supplied if it can.

v2: -next: memblock: unify memblock dump and debugfs show

There are two interfaces to show the memblock information, memblock_dump_all() and /sys/kernel/debug/memblock/, but the content is displayed separately, let’s unify them in case of more different changes over time.

v2: add support for blocksize > PAGE_SIZE

This is an initial attempt to add support for block size > PAGE_SIZE for tmpfs. Why would you want this? It helps us experiment with higher order folio uses with fs APIS and helps us test out corner cases which would likely need to be accounted for sooner or later if and when filesystems enable support for this. Better review early and burn early than continue on in the wrong direction so looking for early feedback.

v2: block: simplify with PAGE_SECTORS_SHIFT

A bit of block drivers have their own incantations with PAGE_SHIFT - SECTOR_SHIFT. Just simplfy and use PAGE_SECTORS_SHIFT all over.
Based on linux-next next-20230525.

v2: x86/mce: set MCE_IN_KERNEL_COPYIN for all MC-Safe Copy

Both EX_TYPE_FAULT_MCE_SAFE and EX_TYPE_DEFAULT_MCE_SAFE exception fixup types are used to identify fixups which allow in kernel #MC recovery, that is the Machine Check Safe Copy.

v4: mm, compaction: Skip all non-migratable pages during scan

Pages pinned in memory through extra refcounts can not be migrated. Currently as isolate_migratepages_block() scans pages for compaction, it skips any pinned anonymous pages. All non-migratable pages should be skipped and not just the anonymous pinned pages.

v16: Implement IOCTL to get and optionally clear info about PTEs

This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code.

v1: zonefs: Call zonefs_io_error() on any error from filemap_splice_read()

Call zonefs_io_error() after getting any error from filemap_splice_read() in zonefs_file_splice_read(), including non-fatal errors such as ENOMEM, EINTR and EAGAIN.

v1: mm/memcontrol: export memcg.swap watermark via sysfs for v2 memcg

This patch is similar to commit 8e20d4b33266 (“mm/memcontrol: export memcg->watermark via sysfs for v2 memcg”), but exports the swap counter’s watermark.

v5: mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8

Another version of the series reducing the kmalloc() minimum alignment on arm64 to 8 (from 128). Other architectures can easily opt in by defining ARCH_KMALLOC_MINALIGN as 8 and selecting DMA_BOUNCE_UNALIGNED_KMALLOC.

v1: net-next: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 3

Here’s the third tranche of patches towards providing a MSG_SPLICE_PAGES internal sendmsg flag that is intended to replace the ->sendpage() op with calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol that it should splice the pages supplied if it can and copy them if not.

v3: Optimize mremap during mutual alignment within PMD

The main changes are:
Care to be taken to move purely within a VMA, in other words this check in call_align_down():if (vma->vm_start <= addr_masked)return false;
As an example of why this is needed:Consider the following range which is 2MB aligned and isa part of a larger 10MB range which is not shown. Eachcharacter is 256KB below making the source and destination2MB each. The lower case letters are moved (s to d) and theupper case letters are not moved.

v1: mm/slab: add new flag SLAB_NO_MERGE to avoid merging per slab

Add a flag that allows to disable merging per slab. This can be used for more fine grained control over the caches or for debugging builds where separate slabs can verify that no objects leak.
The slab_nomerge boot option is too coarse and would need to be enabled on all testing hosts. There are some other ways how to disable merging, e.g. a slab constructor but this disables poisoning besides that it adds additional overhead. Other flags are internal and may have other semantics.

v1: mm: deduct the number of pages reclaimed by madvise from workingset

The pages reclaimed by madvise_pageout are made of inactive and dropped from LRU forcefully, which lead to the coming up refault pages possess a large refault distance than it should be. These could affect the accuracy of thrashing when madvise_pageout is used as a common way of memory reclaiming as ANDROID does now.

v4: net-next/mm: page_pool: new approach for leak detection and shutdown phase

Patchset change summary:
Remove PP workqueue and inflight warnings, instead rely on inflight pages to trigger cleanup
Moves leak detection to the MM-layer page allocator when combined with CONFIG_DEBUG_VM.

v1: mm/slab: rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED

As discussed at LSF/MM [1] [2] and with no objections raised there, deprecate the SLAB allocator. Rename the user-visible option so that users with CONFIG_SLAB=y get a new prompt with explanation during make oldconfig, while make olddefconfig will just switch to SLUB.

文件系统

v1: NFSD: recall write delegation on GETATTR conflict

This patch series adds the recall of write delegation when there is conflict with a GETATTR and a counter in /proc/net/rpc/nfsd to keep count of this recall.

v1: init: Add support for rootwait timeout parameter

Add an optional timeout arg to ‘rootwait’ as the maximum time in seconds to wait for the root device to show up before attempting forced mount of the root filesystem.
This can be helpful to force boot failure and restart in case the root device does not show up in time, allowing the bootloader to take any appropriate measures (e.g. recovery, A/B switch, retry…).

v1: block layer patches for bcachefs

Jens, here’s the full series of block layer patches needed for bcachefs:
Some of these (added exports, zero_fill_bio_iter?) can probably go with the bcachefs pull and I’m just including here for completeness. The main ones are the bio_iter patches, and the __invalidate_super() patch.

v2: Add support for Vendor Defined Error Types in Einj Module

This patchset adds support for Vendor Defined Error types in the einj module by exporting a binary blob file in module’s debugfs directory. Userspace tools can write OEM Defined Structures into the blob file as part of injecting Vendor defined errors.

v1: multiblock allocator improvements

So this patch was intended to remove a dead if-condition but it was not actually dead code and removing it was causing a performance regression. Unfortunately I somehow missed that when I was reviewing his patchset and it already went in so I had to revert the commit. I’ve added details of the regression and root cause in the revert commit. Also attaching the performance numbers I observer:

v4: bpf-next: Add O_PATH-based BPF_OBJ_PIN and BPF_OBJ_GET support

This feature is inspired as a result of recent conversations during LSF/MM/BPF 2023 conference about shortcomings of being able to perform BPF objects pinning only using lookup-based paths.

v1: fs: use UB-safe check for signed addition overflow in remap_verify_area

As loff_t is a signed type, we should use the safe overflow checks instead of relying on compiler implementation.
The bogus values are intentional and the test is supposed to verify the boundary conditions.

v3: arch: Make virt_to_pfn into a static inline

This is an attempt to harden the typing on virt_to_pfn() and pfn_to_virt().
Making virt_to_pfn() a static inline taking a strongly typed (const void *) makes the contract of a passing a pointer of that type to the function explicit and exposes any misuse of the macro virt_to_pfn() acting polymorphic and accepting many types such as (void *), (unitptr_t) or (unsigned long) as arguments without warnings.

v21: block: Use page pinning

This patchset rolls page-pinning out to the bio struct and the block layer, using iov_iter_extract_pages() to get pages and noting with BIO_PAGE_PINNED if the data pages attached to a bio are pinned. If the data pages come from a non-user-backed iterator, then the pages are left unpinned and unref’d, relying on whoever set up the I/O to do the retaining.

网络设备

v1: wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled

In case WoWlan was never configured during the operation of the system, the hw->wiphy->wowlan_config will be NULL. rsi_config_wowlan() checks whether wowlan_config is non-NULL and if it is not, then WARNs about it. The warning is valid, as during normal operation the rsi_config_wowlan() should only ever be called with non-NULL wowlan_config. In shutdown this rsi_config_wowlan() should only ever be called if WoWlan was configured before by the user.

v1: net: ipa: Use the correct value for IPA_STATUS_SIZE

commit b8dc7d0eea5a7709bb534f1b3ca70d2d7de0b42c introduced IPA_STATUS_SIZE as a replacement for the size of the removed struct ipa_status. sizeof(struct ipa_status) was sizeof(__le32[8]), use this as IPA_STATUS_SIZE.

v1: net-next: liquidio: Use vzalloc()

Use vzalloc() instead of hand writing it with vmalloc()+memset(). This is less verbose.

v2: net-next: net: dsa: mv88e6xxx: implement USXGMII mode for mv88e6393x

Enable USXGMII mode for mv88e6393x chips. Tested on Marvell 88E6191X.

v2: net-next: netlink: specs: add ynl spec for ovs_flow

Add a ynl specification for ovs_flow. The spec is sufficient to dump ovs flows but some attrs have been left as binary blobs because ynl doesn’t support C arrays in struct definitions yet.

v1: net-next: net: phy: smsc: add WoL support to LAN8740/LAN8742 PHYs.

Microchip LAN8740/LAN8742 PHYs support basic unicast, broadcast, and Magic Packet WoL. They have one pattern filter matching up to 128 bytes of frame data, which can be used to implement ARP or multicast WoL.

v1: net: netlink: specs: correct types of legacy arrays

ethtool has some attrs which dump multiple scalars into an attribute. The spec currently expects one attr per entry.

v4: iproute2: vxlan: option printing

This patchset makes printing of vxlan details more consistent. It also adds extra verbose output. The boolean options are now brinted after all the non-boolean options.

v1: net: tcp: deny tcp_disconnect() when threads are waiting

Historically connect(AF_UNSPEC) has been abused by syzkaller and other fuzzers to trigger various bugs.
A recent one triggers a divide-by-zero [1], and Paolo Abeni was able to diagnose the issue.

v1: net: af_packet: do not use READ_ONCE() in packet_bind()

Date: Fri, 26 May 2023 15:43:42 +0000
A recent patch added READ_ONCE() in packet_bind() and packet_bind_spkt()
This is better handled by reading pkt_sk(sk)->num later in packet_do_bind() while appropriate lock is held.
READ_ONCE() in writers are often an evidence of something being wrong.
Fixes: 822b5a1c17df (“af_packet: Fix data-races of pkt_sk(sk)->num.”)

v2: iproute2: Add ability to specify eBPF map pin path

We have a use case where we have several different applications composed of sets of eBPF programs (programs that may be attached at the TC/XDP layers), that need to share maps and not conflict with each other.

v1: net: usb: qmi_wwan: Set DTR quirk for BroadMobi BM818

BM818 is based on Qualcomm MDM9607 chipset.

v1: net-next: devlink: Spelling corrections

Make some minor spelling corrections in comments.
Found by inspection.

v1: bpf: netfilter: add BPF_NETFILTER bpf_attach_type

Andrii Nakryiko writes:
And we currently don’t have an attach type for NETLINK BPF link.Thankfully it’s not too late to add it. I see that link_create() inkernel/bpf/syscall.c just bypasses attach_type check. We shouldn’thave done that. Instead we need to add BPF_NETLINK attach type to enumbpf_attach_type. And wire all that properly throughout the kernel andlibbpf itself.

v1: net-next: net: dpaa2-mac: use correct interface to free mdiodev

Rather than using put_device(&mdiodev->dev), use the proper interface provided to dispose of the mdiodev - that being mdio_device_free().

v1: net: rxrpc: Truncate UTS_RELEASE for rxrpc version

UTS_RELEASE has a maximum length of 64 which can cause rxrpc_version to exceed the 65 byte message limit.
Per the rx spec[1]: “If a server receives a packet with a type value of 13, and the client-initiated flag set, it should respond with a 65-byte payload containing a string that identifies the version of AFS software it is running.”

v1: net-next: net: pcs: add helpers to xpcs and lynx to manage mdiodev

This morning, we have had two instances where the destruction of the MDIO device associated with XPCS and Lynx has been wrong. Rather than allowing this pattern of errors to continue, let’s make it easier for driver authors to get this right by adding a helper.

v2: net/sched: act_pedit: Parse L3 Header for L4 offset

Instead of relying on skb->transport_header being set correctly, opt instead to parse the L3 header length out of the L3 headers for both IPv4/IPv6 when the Extended Layer Op for tcp/udp is used. This fixes a bug if GRO is disabled, when GRO is disabled skb->transport_header is set by __netif_receive_skb_core() to point to the L3 header, it’s later fixed by the upper protocol layers, but act_pedit will receive the SKB before the fixups are completed.

v1: net-next: support non-frag page for page_pool_alloc_frag()

In [1], there is a use case to use frag support in page pool to reduce memory usage, and it may request different frag size depending on the head/tail room space for xdp_frame/shinfo and mtu/packet size. When the requested frag size is large enough that a single page can not be split into more than one frag, using frag support only have performance penalty because of the extra frag count handling for frag support.

v3: Add motorcomm phy pad-driver-strength-cfg support

The motorcomm phy (YT8531) supports the ability to adjust the drive strength of the rx_clk/rx_data, and the default strength may not be suitable for all boards. So add configurable options to better match the boards.(e.g. StarFive VisionFive 2)
The first patch adds a description of dt-bingding, and the second patch adds YT8531’s parsing and settings for pad-driver-strength-cfg.

v7: net-next: Wangxun netdev features support

Implement tx_csum and rx_csum to support hardware checksum offload. Implement ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid. Implement ndo_set_features. Enable macros in netdev features which wangxun can support.

v3: hv_netvsc: Allocate rx indirection table size dynamically

Allocate the size of rx indirection table dynamically in netvsc from the value of size provided by OID_GEN_RECEIVE_SCALE_CAPABILITIES query instead of using a constant value of ITAB_NUM.

v1: Truncate UTS_RELEASE for rxrpc version

UTS_RELEASE has maximum length of 64 which can cause rxrpc_version to exceed the 65 byte message limit.
Per https://web.mit.edu/kolya/afs/rx/rx-spec “If a server receives a packet with a type value of 13, and the client-initiated flag set, it should respond with a 65-byte payload containing a string that identifies the version of AFS software it is running.”

安全增强

v2: checkpatch: Check for strcpy and strncpy too

Warn about strcpy(), strncpy(), and strlcpy(). Suggest strscpy() and include pointers to the open KSPP issues for each, which has further details and replacement procedures.

v2: leds: as3645a: Replace strlcpy with strscpy

Part of a tree-wide effort to remove deprecated strlcpy()[1] and replace it with strscpy()[2]. No return values were used, so direct replacement is safe.

v1: next: nfsd: Replace one-element array with flexible-array member

One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace a one-element array with a flexible-arrayº member in struct vbi_anc_data and refactor the rest of the code, accordingly.

v1: next: media: pci: cx18-av-vbi: Replace one-element array with flexible-array member

One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element arrays with flexible-array members in struct vbi_anc_data.

v2: next: scsi: lpfc: Use struct_size() helper

Prefer struct_size() over open-coded versions of idiom:
sizeof(struct-with-flex-array) + sizeof(typeof-flex-array-elements) * count

v2: fscrypt: Replace 1-element array with flexible array

1-element arrays are deprecated and are being replaced with C99 flexible arrays[1].
As sizes were being calculated with the extra byte intentionally, propagate the difference so there is no change in binary output.
[1] https://github.com/KSPP/linux/issues/79

v1: next: vfio/ccw: Use struct_size() helper

Prefer struct_size() over open-coded versions.

v1: next: vfio/ccw: Replace one-element array with flexible-array member

One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element array with flexible-array member in struct vfio_ccw_parent and refactor the the rest of the code accordingly.

v1: lkdtm/bugs: Switch from 1-element array to flexible array

The testing for ARRAY_BOUNDS just wants an uninstrumented array, and the proper flexible array definition is fine for that.

v2: md/raid5: Convert stripe_head’s “dev” to flexible array member

Replace old-style 1-element array of “dev” in struct stripe_head with modern C99 flexible array. In the future, we can additionally annotate it with the run-time size, found in the “disks” member.

v1: overflow: Add struct_size_t() helper

While struct_size() is normally used in situations where the structure type already has a pointer instance, there are places where no variable is available. In the past, this has been worked around by using a typed NULL first argument, but this is a bit ugly. Add a helper to do this, and replace the handful of instances of the code pattern with it.

异步 IO

v2: io_uring: unlock sqd->lock before sq thread release CPU

The sq thread actively releases CPU resources by calling the cond_resched() and schedule() interfaces when it is idle. Therefore, more resources are available for other threads to run.

Rust For Linux

v2: scripts: read cfgs from Makefile for rust-analyzer

Both core and alloc had their cfgs missing in rust-project.json, to remedy this generate_rust_analyzer.py scans the Makefile from inside the rust directory for them to be added to a dictionary that each key corresponds to a crate and each value, to an array of cfgs.

BPF

v1: bpf-next: bpf: replace open code with for allocated object check

From commit 282de143ead9 (“bpf: Introduce allocated objects support”), With this allocated object with BPF program, (PTR_TO_BTF_ID | MEM_ALLOC) has been a way of indicating to check the type is the allocated object.

v1: bpf-next: bpf, vmtest: Build test_progs and friends as statically linked

With the specified TRUNNER_LDFLAGS out of vmtest to force static linking runners like test_progs/test_maps/etc work just fine.

v6: RESEND: libbpf: kprobe.multi: Filter with available_filter_functions

When using regular expression matching with “kprobe multi”, it scans all the functions under “/proc/kallsyms” that can be matched. However, not all of them can be traced by kprobe.multi. If any one of the functions fails to be traced, it will result in the failure of all functions. The best approach is to filter out the functions that cannot be traced to ensure proper tracking of the functions.

v5: libbpf: kprobe.multi: Filter with available_filter_functions

When using regular expression matching with “kprobe multi”, it scans all the functions under “/proc/kallsyms” that can be matched. However, not all of them can be traced by kprobe.multi. If any one of the functions fails to be traced, it will result in the failure of all functions. The best approach is to filter out the functions that cannot be traced to ensure proper tracking of the functions.

v1: Type aware module allocator

This set implements the second part of module type aware allocator (module_alloc_type), which was discussed in [1]. This part contains the interface of the new allocator, as well as changes in x86 code to use the new allocator (modules, BPF, ftrace, kprobe).

v1: dwarves: pahole: avoid adding same struct structure to two rb trees

This commit modifies resort_classes() to re-use ‘structures__tree’ and to reset ‘rb_node’ fields before adding structure instances to the tree for a second time.

v1: bpf-next: selftests/bpf: Check whether to run selftest

The sockopt test invokes test__start_subtest and then unconditionally asserts the success. That means that even if deny-listed, any test will still run and potentially fail. Evaluate the return value of test__start_subtest() to achieve the desired behavior, as other tests do.

v1: bpf: utilize table ID in bpf_fib_lookup helper

This patchset adds the ability to specify a table ID to the bpf_fib_lookup BPF helper.
A new tbid field is added to struct fib_bpf_lookup. When the fib_bpf_lookup helper is called with the BPF_FIB_LOOKUP_DIRECT flag and the tbid is set to an integer greater then 0, the tbid field will be interpreted as the table ID to use for the fib lookup.

v1: bpf-next: libbpf: add netfilter link attach helper

When initial netfilter bpf program type support got added one suggestion was to extend libbpf with a helper to ease attachment of nf programs to the hook locations.
Add such a helper and a demo test case that attaches a dummy program to various combinations.

v1: bpf-next: bpf: Export rx queue info for reuseport ebpf prog

BPF_PROG_TYPE_SK_REUSEPORT / sk_reuseport ebpf programs do not have access to the queue_mapping or napi_id of the incoming skb. Having this information can help ebpf progs determine which listen socket to select.

v1: bpf-next: libbpf: change var type in datasec resize func

This changes a local variable type that stores a new array id to match the return type of btf__add_array().

v1: bpf-next: Relax checks for unprivileged bpf() commands

During last relaxation of bpf syscall’s capabilities checks ([0]), the model of FD-based ownership was established: if process through whatever means got FD for some BPF object (map, prog, etc), it should be able to perform operations on this object without extra CAP_SYS_ADMIN or CAP_BPF capabilities.

v1: bpf-next: Revamp bpf_attr and make it easier to evolve

RFC patch set revamping anonymous substructs of union bpf_attr, which would allow nicer and more coherent evolution of bpf() syscall arguments, especially for commands like BPF_MAP_CREATE and BPF_PROG_LOAD. See patch #1 for justification and more details. Patch #2 demonstrates how straightforward it is to switch to new-style substricts in kernel code (and keep in mind that this is optional until we need some new field for a given command, so we can do it completely asynchronously from landing bpf_attr changes themselves). Patch #3 shows also similar libbpf changes, except for libbpf single patches switches over entire libbpf code base to new-style substructs (except skel_internal.h, due to concerns that users might be reliant on outdated system-wide linux/bpf.h UAPI header).

v3: bpf-next: libbpf: capability for resizing datasec maps

Due to the way the datasec maps like bss, data, rodata are memory mapped, they cannot be resized with bpf_map__set_value_size() like non-datasec maps can. This series offers a way to allow the resizing of datasec maps, by having the mapped regions resized as needed and also adjusting associated BTF info if possible.

v3: dwarves: Support for new btf_type_tag encoding

In recent discussion in BPF mailing list ([1], look for Solution #2) participants agreed to add a new DWARF representation for “btf_type_tag” annotations.
Existing representation is DW_TAG_LLVM_annotation object attached as a child to a DW_TAG_pointer_type. It means that “btf_type_tag” annotation is attached to a pointee type.

v1: libbpf: kprobe.multi: Filter with blacklist and available_filter_functions

When using regular expression matching with “kprobe multi”, it scans all the functions under “/proc/kallsyms” that can be matched. However, not all of them can be traced by kprobe.multi. If any one of the functions fails to be traced, it will result in the failure of all functions. The best approach is to filter out the functions that cannot be traced to ensure proper tracking of the functions.

v1: Bring back vmlinux.h generation

Commit 760ebc45746b (“perf lock contention: Add empty ‘struct rq’ to satisfy libbpf ‘runqueue’ type verification”) inadvertently created a declaration of ‘struct rq’ that conflicted with a generated vmlinux.h’s:
Fix the issue by moving the declaration to vmlinux.h. So this can’t happen again, bring back build support for generating vmlinux.h then add build tests.

周边技术动态

Qemu

v2: target/riscv: Add RISC-V Virtual IRQs and IRQ filtering support

This series adds M and HS-mode virtual interrupt and IRQ filtering support. This allows inserting virtual interrupts from M/HS-mode into S/VS-mode using mvien/hvien and mvip/hvip csrs. IRQ filtering is a use case of this change, i-e M-mode can stop delegating an interrupt to S-mode and instead enable it in MIE and receive those interrupts in M-mode and then selectively inject the interrupt using mvien and mvip.

v5: hw/riscv/virt: pflash improvements

This series improves the pflash usage in RISC-V virt machine with solutions to below issues.
1) Currently the first pflash is reserved for ROM/M-mode firmware code. But S-mode payload firmware like EDK2 need both pflash devices to have separate code and variable store so that OS distros can keep the FW code as read-only.

v3: target/riscv: Add support for PC-relative translation

This patchset tries to add support for PC-relative translation.
The existence of CF_PCREL can improve performance with the guest kernel’s address space randomization. Each guest process maps libc.so (et al) at a different virtual address, and this allows those translations to be shared.

v3: Add RISC-V KVM AIA Support

This series adds support for KVM AIA in RISC-V architecture.
In order to test these patches, we require Linux with KVM AIA support which can be found in the qemu_kvm_aia branch at https://github.com/yong-xuan/linux.git This kernel branch is based on the riscv_aia_v1 branch available at https://github.com/avpatel/linux.git, and it also includes two additional patches that fix a KVM AIA bug and reply to the query of KVM_CAP_IRQCHIP.

v3: hw/riscv: virt: Assume M-mode FW in pflash0 only when “-bios none”

Currently, virt machine supports two pflash instances each with 32MB size. However, the first pflash is always assumed to contain M-mode firmware and reset vector is set to this if enabled. Hence, for S-mode payloads like EDK2, only one pflash instance is available for use. This means both code and NV variables of EDK2 will need to use the same pflash.

v3: target/riscv: Add Smrnmi support.

This patchset added support for Smrnmi Extension in RISC-V.
RNMI also has higher priority than any other interrupts or exceptions and cannot be disabled by software.
RNMI may be used to route to other devices such as Bus Error Unit or Watchdog Timer in the future.

U-Boot

v1: riscv: Initial support for Lichee PI 4A board

Sipeed’s Lichee PI 4A board is based on T-HEAD’s TH1520 SoC which consists of quad core XuanTie C910 CPU, plus one C906 CPU and one E902 CPU.
In this series, the UART, basic device tree, CPU, PLIC are enabled, making it capable of running in serial console mode.

v4: Add ethernet driver for StarFive JH7110 SoC

This series of patches base on the latest branch/master,and adds ethernet support for the StarFive JH7110 RISC-V SoC. The series includes EEPROM, PHY and MAC drivers. The PHY model is YT8531 (from Motorcomm Inc), and the MAC version is dwmac-5.20 (from Synopsys DesignWare).

v1: arch: riscv: jh7110: Correctly zero L2 LIM

Background information:JH7110 SPL runs in L2 LIM (2M in size mapped at 0x8000000). Itconsists of 16 0x20000 sized regions, each one can be used aseither L2 cache way or SRAM (not both). From top to bottom, there’reways 0-15. The way 0 is always enabled, at most 0x1e0000 can be used.

[置顶] 泰晓 RISC-V 实验箱，配套 30+ 讲嵌入式 Linux 系统开发公开课

RISC-V Linux 内核及周边技术动态第 47 期

内核动态

RISC-V 架构支持

进程调度

内存管理

文件系统

网络设备

安全增强

异步 IO

Rust For Linux

BPF

周边技术动态

Qemu

U-Boot

猜你喜欢：

Read Album:

Read Related:

Read Latest:

支付宝打赏￥9.68元		微信打赏￥9.68元
	请作者喝杯咖啡吧