[置顶] 泰晓 RISC-V 实验箱,配套 30+ 讲嵌入式 Linux 系统开发公开课
RISC-V Linux 内核及周边技术动态第 49 期
时间:20230611
编辑:晓依
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS
内核动态
RISC-V 架构支持
v3: Add D1/T113s thermal sensor controller support
This series adds support for Allwinner D1/T113s thermal sensor controller. THIS controller is similar to the one on H6, but with only one sensor and uses a different scale and offset values.
v5: Add support for Allwinner GPADC on D1/T113s/R329/T507 SoCs
This series adds support for general purpose ADC (GPADC) on new Allwinner’s SoCs, such as D1, T113s, T507 and R329. The implemented driver provides basic functionality for getting ADC channels data.
v1: dt-bindings: riscv: cpus: switch to unevaluatedProperties: false
Do the various bits needed to drop the additionalProperties: true that we currently have in riscv/cpu.yaml, to permit actually enforcing what people put in cpus nodes.
v1: riscv: move memblock_allow_resize() after lm is ready
The initial memblock metadata is accessed from kernel image mapping. The regions arrays need to “reallocated” from memblock and accessed through linear mapping to cover more memblock regions. So the resizing should not be allowed until linear mapping is ready. Note that there are memblock allocations when building linear mapping.
v3: RISCV: Add KVM_GET_REG_LIST API
KVM_GET_REG_LIST will dump all register IDs that are available to KVM_GET/SET_ONE_REG and It’s very useful to identify some platform regression issue during VM migration.
v2: arch: allow pte_offset_map[_lock]() to fail
Here is v2 series of patches to various architectures, based on v6.4-rc5: preparing for v2 of changes following in mm, affecting pte_offset_map() and pte_offset_map_lock(). There are very few differences from v1: noted patch by patch below.
v2: dt-bindings: riscv: deprecate riscv,isa
When the RISC-V dt-bindings were accepted upstream in Linux, the base ISA etc had yet to be ratified. By the ratification of the base ISA, incompatible changes had snuck into the specifications - for example the Zicsr and Zifencei extensions were spun out of the base ISA.
Patch “riscv: vmlinux.lds.S: Explicitly handle ‘.got’ section” has been added to the 6.3-stable tree
This is a note to let you know that I’ve just added the patch titled
riscv: vmlinux.lds.S: Explicitly handle '.got' section
to the 6.3-stable tree which can be found at:http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:riscv-vmlinux.lds.s-explicitly-handle-.got-section.patch and it can be found in the queue-6.3 subdirectory.
v1: riscv: reserve DTB before possible memblock allocation
It’s possible that early_init_fdt_scan_reserved_mem() allocates memory from memblock for dynamic reserved memory in
/reserved-memory
node. Any fixed reservation must be done before that to avoid potential conflicts.
v3: tools/nolibc: add a new syscall helper
This is the revision of the v2 syscall helpers [1], it is based on -ENOSYS patchset [3], so, it is ok to simply merge both of them.
This revision mainly applied Thomas’ method, removed the __syscall() helper and replaced it with __sysret() instead, because __syscall() looks like _syscall() and syscall(), it may mixlead the developers.
v4: nolibc: add part2 of support for rv32
This is the v4 part2 of support for rv32 (v3 [1]), it applied the suggestions from Thomas, Arnd [2] and you [3]. now, the rv32 compile support almost aligned with x86 except the extra KARCH to make kernel happy, thanks very much for your nice review!
The new virtual kernel location is limited by the early page table that only has one PUD and with the PMD alignment constraint, the kernel can only take < 512 positions.
v4: Add JH7110 cpufreq support
This patchset adds the compatible strings into the allowlist for supporting the generic cpufreq driver on JH7110 SoC. Also, it enables the axp15060 pmic for the cpu power source.
v2: tools/nolibc: add two new syscall helpers
This is the revision of the v1 syscall helpers [1], just rebased it on patchset [3], so, it is ok to simply merge both of them.
This revision mainly applied your suggestions of v1, both of the syscall return and call helpers are simplified or cleaned up.
v2: Documentation: RISC-V: patch-acceptance: mention patchwork’s role
Palmer suggested at some point, not sure if it was in one of the weekly linux-riscv syncs, or a conversation at FOSDEM, that we should document the role of the automation running on our patchwork instance plays in patch acceptance.
v3: gpio: sifive: Add missing check for platform_get_irq
Add the missing check for platform_get_irq() and return error code if it fails. The returned error code will be dealed with in builtin_platform_driver(sifive_gpio_driver) and the driver will not be registered.
v2: perf parse-regs: Refactor architecture functions
This patch series is to refactor arch related functions for register parsing, which follows up the discussion for v1: https://lore.kernel.org/lkml/20230520025537.1811986-1-leo.yan@linaro.org/
v1: 6.3: riscv: vmlinux.lds.S: Explicitly handle ‘.got’ section
This is not an issue in mainline because handling of the .got section was added by commit 39b33072941f (“riscv: Introduce CONFIG_RELOCATABLE”) and further extended by commit 26e7aacb83df (“riscv: Allow to downgrade paging mode from the command line”) in 6.4-rc1. Neither of these changes are suitable for stable, so add explicit handling of the .got section in a standalone change to align 6.3 and mainline, which addresses the warning.
v21: -next: riscv: Add vector ISA support
This is the v21 patch series for adding Vector extension support in Linux. Please refer to [1] for the introduction of the patchset. The v21 patch series was aimed to solve build issues from v19, provide usage guideline for the prctl interface, and address review comments on v20.
v3: gpio: ath79: Add missing check for platform_get_irq
Add the missing check for platform_get_irq() and return error if it fails.
进程调度
v1: sched/deadline: merge __dequeue_dl_entity() into its sole caller
Sole caller dequeue_dl_entity() calls __dequeue_dl_entity() directly. So __dequeue_dl_entity() can be merged into its sole caller. No functional change intended.
v1: net/sched: act_pedit: Use kmemdup() to replace kmalloc + memcpy
./net/sched/act_pedit.c:245:21-28: WARNING opportunity for kmemdup.
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5478
v1: sched/wait: Determine whether the wait queue is empty before waking up
When we did some benchmark tests (such as pipe tests), we found that the wake behavior was still triggered when the wait queue was empty, even though it would exit later.
v3: net/sched: Set the flushing flags to false to prevent an infinite loop and add one test to tdc
[root@localhost tc-testing]# ./tdc.py -f tc-tests/infra/filter.json Test c2b4: Adding a new filter after flushing empty chain doesn’t cause an infinite loop All test results: 1..1 ok 1 c2b4 - Adding a new filter after flushing empty chain doesn’t cause an infinite loop
v2: sched/nohz: Add HRTICK_BW for using cfs bandwidth with nohz_full
CFS bandwidth limits and NOHZ full don’t play well together. Tasks can easily run well past their quotas before a remote tick does accounting. This leads to long, multi-period stalls before such tasks can run again. Use the hrtick mechanism to set a sched tick to fire at remaining_runtime in the future if we are on a nohz full cpu, if the task has quota and if we are likely to disable the tick (nr_running == 1). This allows for bandwidth accounting before tasks go too far over quota.
v1: sched/idle: disable tick in idle=poll idle entry
Commit a5183862e76fdc25f36b39c2489b816a5c66e2e5 (“tick/nohz: Conditionally restart tick on idle exit”) allows a nohz_full CPU to enter idle and return from it with the scheduler tick disabled (since the tick might be undesired noise).
The idle=poll case still unconditionally restarts the tick when entering idle.
To reduce the noise for that case as well, stop the tick when entering idle, for the idle=poll case.
v1: sched/debug,sched/core: Reset hung task detector while processing sysrq-t
On devices with multiple CPUs and multiple processes, outputting lengthy sysrq-t content on a slow serial port can consume a significant amount of time. We need to reset the hung task detector to avoid false hung task alerts.
v1: net/sched: Set the flushing flags to false to prevent an infinite loop
On 06/06/2023 11:45, renmingshuai wrote:
When a new chain is added by using tc, one soft lockup alarm will begenerated after delete the prio 0 filter of the chain. To reproducethe problem, perform the following steps: (1) tc qdisc add dev eth0 root handle 1: htb default 1 (2) tc chain add dev eth0 (3) tc filter del dev eth0 chain 0 parent 1: prio 0 (4) tc filter add dev eth0 chain 0 parent 1:
v1: sched: use kmem_cache_zalloc() to zero allocated tg
It’s more convenient to use kmem_cache_zalloc() to allocate zeroed tg. No functional change intended.
内存管理
v2: lib: Replace kmap() with kmap_local_page()
kmap() has been deprecated in favor of the kmap_local_page() due to high cost, restricted mapping space, the overhead of a global lock for synchronization, and making the process sleep in the absence of free slots.
v1: mm: compaction: mark kcompactd_run() and kcompactd_stop() __meminit
Add __meminit to kcompactd_run() and kcompactd_stop() to ensure they’re default to __init when memory hotplug is not enabled.
v1: mm: hugetlb: Add Kconfig option to set default nr_overcommit_hugepages
The default kernel configuration does not allow any huge page allocation until after setting nr_hugepages or nr_overcommit_hugepages to a non-zero value; without setting those, mmap attempts with MAP_HUGETLB will always fail with -ENOMEM. nr_overcommit_hugepages allows userspace to attempt to allocate huge pages at runtime, succeeding if the kernel can find or assemble a free huge page.
v1: mm/khugepaged: use DEFINE_READ_MOSTLY_HASHTABLE macro
These are equivalent, but DEFINE_READ_MOSTLY_HASHTABLE exists to define a hashtable in the .data..read_mostly section.
v3: mm/folio: Avoid special handling for order value 0 in folio_set_order
folio_set_order(folio, 0) is used in kernel at two places __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio. Currently, It is called to clear out the folio->_folio_nr_pages and folio->_folio_order.
v2: Optimize the fast path of mas_store()
Add fast paths for mas_wr_append() and mas_wr_slot_store() respectively. The newly added fast path of mas_wr_append() is used in fork() and how much it benefits fork() depends on how many VMAs are duplicated.
v1: net-next: splice, net: Some miscellaneous MSG_SPLICE_PAGES changes
Now that the splice_to_socket() has been rewritten so that nothing now uses the ->sendpage() file op[1], some further changes can be made, so here are some miscellaneous changes that can now be done.
v1: mm: compaction: skip memory hole rapidly when isolating migratable pages
On some machines, the normal zone can have a large memory hole like below memory layout, and we can see the range from 0x100000000 to scanner can meet the hole and it will take more time to skip the large hole. From my measurement, I can see the isolation scanner will take 80us100us to skip the large hole [0x100000000 - 0x1800000000].
v2: mm/vmalloc: Replace the ternary conditional operator with min()
It would be better to replace the traditional ternary conditional operator with min() in zero_iter
v1: net-next: sock: Propose socket.urgent for sockmem isolation
This is just a PoC patch intended to resume the discussion about tcpmem isolation opened by Google in LPC’22 [1].
We are facing the same problem that the global shared threshold can cause isolation issues. Low priority jobs can hog TCP memory and adversely impact higher priority jobs. What’s worse is that these low priority jobs usually have smaller cpu weights leading to poor ability to consume rx data.
v1: revert shrinker_srcu related changes
Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 (“mm: vmscan: make global slab shrink lockless”). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That’s why unregister_shrinker() has become slower.
v6: mm: ioremap: Convert architectures to take GENERIC_IOREMAP way
Currently, many architecutres have’t taken the standard GENERIC_IOREMAP way to implement ioremap_prot(), iounmap(), and ioremap_xx(), but make these functions specifically under each arch’s folder. Those cause many duplicated codes of ioremap() and iounmap().
v1: watchdog/mm: Allow dumping memory info in pretimeout
On my (embedded) systems, the most common cause of hitting the watchdog (pre)timeout is due to thrashing. Diagnosing these problems is hard without knowing the memory state at the point of the watchdog hit. In order to make this information available, add a module parameter to the watchdog pretimeout panic governor to ask it to dump memory info and the OOM task list (using a new helper in the OOM code) before triggering the panic.
v1: mm/min_free_kbytes: modify min_free_kbytes calculation rules
The current calculation of min_free_kbytes only uses ZONE_DMA and ZONE_NORMAL pages,but the ZONE_MOVABLE zone->_watermark[WMARK_MIN] will also divide part of min_free_kbytes.This will cause the min watermark of ZONE_NORMAL to be too small in the presence of ZONE_MOVEABLE.
**[v1: mm: kill [add | del]_page_to_lru_list()](http://lore.kernel.org/linux-mm/20230609013901.79250-1-wangkefeng.wang@huawei.com/)** |
Directly call lruvec_del_folio(), and drop unused page interfaces.
v2: mm: allow pte_offset_map[_lock]() to fail
Here is v2 series of patches to mm, based on v6.4-rc5: preparing for v2 effective changes to follow, probably next week (when I hope s390 will be sorted), affecting pte_offset_map() and pte_offset_map_lock(). There are very few differences from v1: noted patch by patch below.
v1: udmabuf: revert ‘Add support for mapping hugepages (v4)’
This effectively reverts commit 16c243e99d33 (“udmabuf: Add support for mapping hugepages (v4)”). Recently, Junxiao Chang found a BUG with page map counting as described here [1]. This issue pointed out that the udmabuf driver was making direct use of subpages of hugetlb pages. This is not a good idea, and no other mm code attempts such use. In addition to the mapcount issue, this also causes issues with hugetlb vmemmap optimization and page poisoning.
v1: mm: Sync percpu mm RSS counters before querying
An issue was observed with stats collected in struct rusage on ppc64le with 64kB pages. The percpu counters use batching withpercpu_counter_batch = max(32, nr*2) # in PAGE_SIZE i.e. with larger pages but similar RSS consumption (bytes), there’ll be less flushes and error more noticeable.
v1: staging: lib: Use memcpy_to/from_page()
Deprecate kmap() in favor of kmap_local_page() due to high cost, restricted mapping space, the overhead of a global lock for synchronization, and making the process sleep in the absence of free slots.
v3: Documentation/mm: Initial page table documentation
This is based on an earlier blog post at people.kernel.org, it describes the concepts about page tables that were hardest for me to grasp when dealing with them for the first time, such as the prevalent three-letter acronyms pfn, pgd, p4d, pud, pmd and pte.
v4: mm/migrate_device: Try to handle swapcache pages
Migrating file pages and swapcache pages into device memory is not supported. Try to get rid of the swap cache, and if successful, go ahead as with other anonymous pages.
v1: binfmt_elf: dynamically allocate note.data in parse_elf_properties
Dynamically allocate note.data in parse_elf_properties to fix compilation warning on some arch.
v1: mm/mm_init.c: add debug messsge for dma zone
If freesize is less than dma_reserve, print warning message to report this case.
v4: drm-next: v1: DRM GPUVA Manager & Nouveau VM_BIND UAPI
Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to keep track of GPU virtual address (VA) mappings in a more generic way.
The DRM GPUVA manager is indented to help drivers implement userspace-manageable GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it serves the following purposes in this context.
文件系统
v1: fs/aio: Stop allocating aio rings from HIGHMEM
There is no need to allocate aio rings from HIGHMEM because of very little memory needed here.
Therefore, use GFP_USER flag in find_or_create_page() and get rid of kmap*() mappings.
v4: blksnap - block devices snapshots module
I am happy to offer a improved version of the Block Devices Snapshots Module. It allows to create non-persistent snapshots of any block devices. The main purpose of such snapshots is to provide backups of block devices. See more in Documentation/block/blksnap.rst.
v1: Reduce impact of overlayfs fake path files
This is the solution that we discussed for removing FMODE_NONOTIFY from overlayfs real files.
My branch [1] has an extra patch for remove FMODE_NONOTIFY, but I am still testing the ovl-fsnotify interaction, so we can defer that step to later.
v1: ovl: port to new mount api
We recently ported util-linux to the new mount api. Now the mount(8) tool will by default use the new mount api. While trying hard to fall back to the old mount api gracefully there are still cases where we run into issues that are difficult to handle nicely.
v1: bdev: allow buffer-head & iomap aops to co-exist
At LSFMM it was clear that for some in order to support large order folios we want to use iomap. So the filesystems staying and requiring buffer-heads cannot make use of high order folios. This simplifies support and reduces the scope for what we need to do in order to support high order folios for buffered-io.
v2: fs: avoid empty option when generating legacy mount string
As each option string fragment is always prepended with a comma it would happen that the whole string always starts with a comma. This could be interpreted by filesystem drivers as an empty option and may produce errors.
v2: gfs2/buffer folio changes for 6.5
This kind of started off as a gfs2 patch series, then became entwined with buffer heads once I realised that gfs2 was the only remaining caller of __block_write_full_page(). For those not in the gfs2 world, the big point of this series is that block_write_full_page() should now handle large folios correctly.
网络设备
v4: net-next: tcp: enforce receive buffer memory limits by allowing the tcp window to shrink
Under certain circumstances, the tcp receive buffer memory limit set by autotuning (sk_rcvbuf) is increased due to incoming data packets as a result of the window not closing when it should be. This can result in the receive buffer growing all the way up to tcp_rmem[2], even for tcp sessions with a low BDP.
v1: amd-xgbe: extend 10Mbps support to MAC version 21H
MAC version 21H supports the 10Mbps speed. So, extend support to platforms that support it.
v1: dt-bindings: net: mediatek,net: add missing mediatek,mt7621-eth
Document the Ethernet controller found in the MediaTek MT7621 MIPS SoC family which is supported by the mtk_eth_soc driver.
v5: net-next: net: phy: add driver for MediaTek SoC built-in GE PHYs
Some of MediaTek’s Filogic SoCs come with built-in gigabit Ethernet PHYs which require calibration data from the SoC’s efuse. Despite the similar design the driver doesn’t share any code with the existing mediatek-ge.c. Add support for such PHYs by introducing a new driver with basic support for MediaTek SoCs MT7981 and MT7988 built-in 1GE PHYs.
v1: Add a sysctl option to disable bpf offensive helpers.
Some eBPF helper functions have been long regarded as problematic[1]. More than just used for powerful rootkit, these features can also be exploited to harm the containers by perform various attacks to the processes outside the container in the enrtire VM, such as process DoS, information theft, and container escape.
v4: net-next: virtio/vsock: support datagrams
This series introduces support for datagrams to virtio/vsock.
It is a spin-off (and smaller version) of this series from the summer:https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/
Please note that this is an RFC and should not be merged until associated changes are made to the virtio specification, which will follow after discussion from this series.
v1: net-next: net: support extack in dump and simplify ethtool uAPI
Ethtool currently requires header nest to be always present even if it doesn’t have to carry any attr for a given request. This inflicts unnecessary pain on the users.
v1: net-next: tools: ynl: generate code for the ethtool family
And finally ethtool support. Thanks to Stan’s work the ethtool family spec is quite complete, so there is a lot of operations to support.
I chickened out of stats-get support, they require at the very least type-value support on a u64 scalar. Type-value is an arrangement where a u16 attribute is encoded directly in attribute type. Code gen can support this if the inside is a nest, we just throw in an extra field into that nest to carry the attr type. But a little more coding is needed to for a scalar, because first we need to turn the scalar into a struct with one member, then we can add the attr type.
v4: iwl-next: Implement support for SRIOV + LAG
The first interface added into the aggregate will be flagged as the primary interface, and this primary interface will be responsible for managing the VF’s resources. VF’s created on the primary are the only VFs that will be supported on the aggregate. Only Active-Backup mode will be supported and only aggregates whose primary interface is in switchdev mode will be supported.
v2: ipvs: align inner_mac_header for encapsulation
When using encapsulation the original packet’s headers are copied to the inner headers. This preserves the space for an inner mac header, which is not used by the inner payloads for the encapsulation types supported by IPVS. If a packet is using GUE or GRE encapsulation and needs to be segmented, flow can be passed to __skb_udp_tunnel_segment() which calculates a negative tunnel header length. A negative tunnel header length causes pskb_may_pull() to fail, dropping the packet.
v1: net-next: tcp: tx path fully headless
This series completes transition of TCP stack tx path to headless packets: All payload now reside in page frags, never in skb->head.
v1: net-next: net: create device lookup API with reference tracking
We still see dev_hold() / dev_put() calls without reference tracker getting added in the new code. dev_get_by_name() / dev_get_by_index() seem to be one of the sources of those. Provide appropriate helpers. Allocating the tracker can obviously be done with an additional call to netdev_tracker_alloc(), but a single API feels cleaner.
v1: net-next: mdio: mdio-mux-mmioreg: Use of_property_read_reg() to parse “reg”
Use the recently added of_property_read_reg() helper to get the untranslated “reg” address value.
v1: net-next: net: add check for current MAC address in dev_set_mac_address
In some cases it is possible for kernel to come with request to change primary MAC address to the address that is already set on the given interface.
v2: net: Check if FIPS mode is enabled when running selftests
Some test cases from net/tls, net/fcnal-test and net/vrf-xfrm-tests that rely on cryptographic functions to work and use non-compliant FIPS algorithms fail in FIPS mode.
v1: net-next: tcp: Make pingpong threshold tunable
TCP pingpong threshold is 1 by default. But some applications, like SQL DB may prefer a higher pingpong threshold to activate delayed acks in quick ack mode for better performance.
v7: net-next: net: ioctl: Use kernel memory on protocol ioctl callbacks
Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers.
v1: net-next: rhashtable: length helper for rhashtable and rhltable
Whenever someone wants to retrieve the total number of elements in a rhashtable/rhltable it needs to open code the access to ‘nelems’. Therefore provide a helper for such operation and convert two accesses as an example.
v1: net-next: add egress rate limit offload for Marvell 6393X family
This series aims to give access to egress rate shaping offloading available on Marvell 88E6393X family (88E6393X/88E6193X/88E6191X/88E6361)
The switch offers a very basic egress rate limiter: rate can be configured from 64kbps up to 10gbps depending on the model, with some specific increments depending on the targeted rate, and is “burstless”.
v1: net-next: net: openvswitch: add support for l4 symmetric hashing
Since its introduction, the ovs module execute_hash action allowed hash algorithms other than the skb->l4_hash to be used. However, additional hash algorithms were not implemented. This means flows requiring different hash distributions weren’t able to use the kernel datapath.
v1: net-next: bnx2x: Make dmae_reg_go_c static
Make dmae_reg_go_c static, it is only used in bnx2x_main.c
Flagged by Sparse as:
…/bnx2x_main.c:291:11: warning: symbol ‘dmae_reg_go_c’ was not declared. Should it be static?
v2: net-next: net: mana: Add support for vlan tagging
To support vlan, use MANA_LONG_PKT_FMT if vlan tag is present in TX skb. Then extract the vlan tag from the skb struct, and save it to tx_oob for the NIC to transmit. For vlan tags on the payload, they are accepted by the NIC too.
[net PATCH v2] octeontx2-af: Move validation of ptp pointer before its usage
Moved PTP pointer validation before its use to avoid smatch warning. Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree.
v1: net-next: phylink EEE support
There has been some recent discussion on generalising EEE support so that drivers implement it more consistently. This has mostly focused around phylib, but there are other situations where EEE may be useful.
v1: net-next: sfc: Add devlink dev info support for EF10
Reuse the work done for EF100 to add devlink support for EF10. There is no devlink port support for EF10.
安全增强
v1: kunit: Add test attributes API
This is an RFC patch series to propose the addition of a test attributes framework to KUnit.
There has been interest in filtering out “slow” KUnit tests. Most notably, a new config, CONFIG_MEMCPY_SLOW_KUNIT_TEST, has been added to exclude particularly slow memcpy tests (https://lore.kernel.org/all/20230118200653.give.574-kees@kernel.org/).
v1: Integer overflows while scanning for integers
Lately I wondered whether users of integer scanning functions check for overflows. To detect such overflows around scanf I came up with the following patch. It simply triggers a WARN_ON_ONCE() upon an overflow.
[RESEND]v1: next: Replace one-element array with DECLARE_FLEX_ARRAY() helper
One-element arrays as fake flex arrays are deprecated and we are moving towards adopting C99 flexible-array members, instead. So, replace one-element array declaration in struct ct_sns_gpnft_rsp, which is ultimately being used inside a union:
drivers/scsi/qla2xxx/qla_def.h:
Refactor the rest of the code, accordingly.
This issue was found with the help of Coccinelle.
v1: um: Use HOST_DIR for mrproper
When HEADER_ARCH was introduced, the MRPROPER_FILES (then MRPROPER_DIRS) list wasn’t adjusted, leaving SUBARCH as part of the path argument. This resulted in the “mrproper” target not cleaning up arch/x86/… when SUBARCH was specified. Since HOST_DIR is arch/$(HEADER_ARCH), use it instead to get the correct path.
v2: uml: Replace strlcpy with strscpy
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89
Closes: https://lore.kernel.org/oe-kbuild-all/202305311135.zGMT1gYR-lkp@intel.com/
异步 IO
v1: Add io_uring support for futex wait/wake
Sending this just to the io_uring list for now so we can iron out details, questions, concerns, etc before going a bit broader to get the futex parts reviewed. Those are pretty straight forward though, and try not to get too entangled into futex internals.
v15: io_uring: add napi busy polling support
This adds the napi busy polling support in io_uring.c. It adds a new napi_list to the io_ring_ctx structure. This list contains the list of napi_id’s that are currently enabled for busy polling. This list is used to determine which napi id’s enabled busy polling. For faster access it also adds a hash table.
v14: io_uring: add napi busy polling support
This adds the napi busy polling support in io_uring.c. It adds a new napi_list to the io_ring_ctx structure. This list contains the list of napi_id’s that are currently enabled for busy polling. This list is used to determine which napi id’s enabled busy polling. For faster access it also adds a hash table.
Rust For Linux
v3: Rust scatterlist abstractions
This is a version of scatterlist abstractions for Rust drivers.
Scatterlist is used for efficient management of memory buffers, which is essential for many kernel-level operations such as Direct Memory Access (DMA) transfers and crypto APIs.
v2: add abstractions for network device drivers
This patchset adds minimum abstractions for network device drivers and Rust dummy network device driver, a simpler version of drivers/net/dummy.c.
v1: Rust PuzzleFS filesystem driver
This is a proof of concept driver written for the PuzzleFS next-generation container filesystem [1]. I’ve included a short abstract about puzzlefs further below. This driver is based on the rust-next branch, on top of which I’ve backported the filesystem abstractions from Wedson Almeida Filho [2][3] and Miguel Ojeda’s third-party crates support: proc-macro2, quote, syn, serde and serde_derive [4]. I’ve added the additional third-party crates serde_cbor[5] and hex [6]. Then I’ve adapted the user space puzzlefs code [1] so that the puzzlefs kernel module could present the directory hierarchy and implement the basic read functionality.
v2: Rust enablement for AArch64
The first patch enables the basic building of Rust for AArch64. Since v1 this has been rewritten to avoid the use of a target.json file for AArch64 and use the upstream rustc target definition. x86-64 still uses the target.json approach though.
BPF
v12: evm: Do HMAC of multiple per LSM xattrs for new inodes
One of the major goals of LSM stacking is to run multiple LSMs side by side without interfering with each other. The ultimate decision will depend on individual LSM decision.
v1: tools api fs: More thread safety for global filesystem variables
Multiple threads, such as with “perf top”, may race to initialize a file system path like hugetlbfs. The racy initialization of the path leads to at least memory leaks. To avoid this initialize each fs for reading the mount point path with pthread_once.
v4: bpf-next: verify scalar ids mapping in regsafe()
This example is unsafe because not all execution paths verify r7 range. Because of the jump at (4) the verifier would arrive at (6) in two states: I. r6{.id=b}, r7{.id=b} via path 1-6; II. r6{.id=a}, r7{.id=b} via path 1-4, 6.
Currently regsafe() does not call check_ids() for scalar registers, thus from POV of regsafe() states (I) and (II) are identical.
v3: net-next: introduce page_pool_alloc() API
In [1] & [2], there are usecases for veth and virtio_net to use frag support in page pool to reduce memory usage, and it may request different frag size depending on the head/tail room space for xdp_frame/shinfo and mtu/packet size. When the requested frag size is large enough that a single page can not be split into more than one frag, using frag support only have performance penalty because of the extra frag count handling for frag support.
v4: bpf-next: bpf, x86: allow function arguments up to 12 for TRACING
Therefore, let’s enhance it by increasing the function arguments count allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
In the 1st patch, we make arch_prepare_bpf_trampoline() support to copy function arguments in stack for x86 arch. Therefore, the maximum arguments can be up to MAX_BPF_FUNC_ARGS for FENTRY and FEXIT.
v3: Bring back vmlinux.h generation
Commit 760ebc45746b (“perf lock contention: Add empty ‘struct rq’ to satisfy libbpf ‘runqueue’ type verification”) inadvertently created a declaration of ‘struct rq’ that conflicted with a generated vmlinux.h’s:
v5: bpf-next: selftests/bpf: Add benchmark for bpf memory allocator
The benchmark could be used to compare the performance of hash map operations and the memory usage between different flavors of bpf memory allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also could be used to check the performance improvement or the memory saving provided by optimization.
v1: ftrace: Show all functions with addresses in available_filter_functions_addrs
when ftrace based tracers we need to cross check available_filter_functions with /proc/kallsyms. For example for kprobe_multi bpf link (based on fprobe) we need to make sure that symbol regex resolves to traceable symbols and that we get proper addresses for them.
v5: bpf: Socket lookup BPF API from tc/xdp ingress does not respect VRF bindings.
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren’t respected. This patchset fixes this by regarding the incoming device’s VRF attachment when performing the socket lookups from tc/xdp.
v2: bpf-next: bpf: Support ->fill_link_info for kprobe_multi and perf_event links
This patchset enhances the usability of kprobe_multi programs by introducing support for ->fill_link_info. This allows users to easily determine the probed functions associated with a kprobe_multi program. While
bpftool perf show
already provides information about functions probed by perf_event programs, supporting ->fill_link_info ensures consistent access to this information across all bpf links.
v1: perf lock contention: Add -x option for CSV style output
Sometimes we want to process the output by external programs. Let’s add the -x option to specify the field separator like perf stat.
This patch set introduces new BPF object, BPF token, which allows to delegate a subset of BPF functionality from privileged system-wide daemon (e.g., systemd or any other container manager) to a trusted unprivileged application. Trust is the key here. This functionality is not about allowing unconditional unprivileged BPF usage. Establishing trust, though, is completely up to the discretion of respective privileged application that would create a BPF token.
v1: bpf-next: selftests/bpf: Add missing prototypes for several test kfuncs
Adding missing prototypes for several kfuncs that are used by test_verifier tests. We don’t really need kfunc prototypes for these tests, but adding them to silence ‘make W=1’ build and to have all test kfuncs declarations in bpf_testmod_kfunc.h.
v2: bpf-next: BPF link support for tc BPF programs
This series adds BPF link support for tc BPF programs. We initially presented the motivation, related work and design at last year’s LPC conference in the networking & BPF track [0], and a recent update on our progress of the rework during this year’s LSF/MM/BPF summit [1]. The main changes are in first two patches and the last two have an extensive batch of test cases we developed along with it, please see individual patches for details. We tested this series with tc-testing selftest suite as well as BPF CI/selftests. Thanks!
v2: bpf-next: bpf, arm64: use BPF prog pack allocator in BPF JIT
BPF programs currently consume a page each on ARM64. For systems with many BPF programs, this adds significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system.
v4: bpf-next: Handle immediate reuse in bpf memory allocator
The implementation of v4 is mainly based on suggestions from Alexi [0]. There are still pending problems for the current implementation as shown in the benchmark result in patch #3, but there was a long time from the posting of v3, so posting v4 here for further disscussions and more suggestions.
v1: bpf: search_bpf_extables should search subprogram extables
JIT’d bpf programs that have subprograms can have a postive value for num_extentries but a NULL value for extable. This is problematic if one of these bpf programs encounters a fault during its execution. The fault handlers correctly identify that the faulting IP belongs to a bpf program. However, performing a search_extable call on a NULL extable leads to a second fault.
v3: bpf-next: xsk: multi-buffer support
This series of patches add multi-buffer support for AF_XDP. XDP and various NIC drivers already have support for multi-buffer packets. With this patch set, programs using AF_XDP sockets can now also receive and transmit multi-buffer packets both in copy as well as zero-copy mode. ZC multi-buffer implementation is based on ice driver.
v2: bpf: netfilter: add BPF_NETFILTER bpf_attach_type
Andrii Nakryiko writes:
And we currently don’t have an attach type for NETLINK BPF link.Thankfully it’s not too late to add it. I see that link_create() inkernel/bpf/syscall.c just bypasses attach_type check. We shouldn’thave done that. Instead we need to add BPF_NETLINK attach type to enumbpf_attach_type. And wire all that properly throughout the kernel andlibbpf itself.
v1: Add api to manipulate global varaible
We (the antgroup) has a requirement to manipulate global variables. The platform to manage bpf bytecode has no idea about varaibles’ type/size/address. It only has some strings (like key = value) passed from admin. We find a way to parse BTF and then query/update the variables. There may be better ways to do it. This approach is what we can find for now.
v1: bpf: Add extra path pointer check to d_path helper
Anastasios reported crash on stable 5.15 kernel with following bpf attached to lsm hook:
SEC(“lsm.s/bprm_creds_for_exec”)int BPF_PROG(bprm_creds_for_exec, struct linux_binprm *bprm){struct path *path = &bprm->executable->f_path;char p[128] = { 0 };
bpf_d_path(path, p, 128); return 0; }
but bprm->executable can be NULL, so bpf_d_path call will crash:
周边技术动态
Qemu
v3: linux-user/riscv: Add syscall riscv_hwprobe
This patch adds the new syscall for the “RISC-V Hardware Probing Interface” (https://docs.kernel.org/riscv/hwprobe.html).
v4: target/riscv: Add Smrnmi support.
This patchset added support for Smrnmi Extension in RISC-V.
RNMI also has higher priority than any other interrupts or exceptions and cannot be disabled by software.
RNMI may be used to route to other devices such as Bus Error Unit or Watchdog Timer in the future.
Buildroot
[branch/2023.02.x] package/cmake: (ctest) add support for riscv architecture
commit: https://git.buildroot.net/buildroot/commit/?id=13e4f1942cb2aca57edf5b2b7514d491690e8eeb branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/2023.02.x
Package binaries can be successfully built for and then executed on RISC-V platforms including RV32 and RV64 variants. Tested in QEMU.
U-Boot
This patchset adds support to load images of the SPL’s next booting stage from a NVMe device.
猜你喜欢:
- 我要投稿:发表原创技术文章,收获福利、挚友与行业影响力
- 泰晓资讯:汇总一周技术趣闻与文章,查看「Linux 资讯」
- 知识星球:独家 Linux 实战经验与技巧,订阅「Linux知识星球」
- 视频频道:泰晓学院,B 站,发布各类 Linux 视频课
- 开源小店:欢迎光临泰晓科技自营店,购物支持泰晓原创
- 技术交流:Linux 用户技术交流微信群,联系微信号:tinylab
支付宝打赏 ¥9.68元 | 微信打赏 ¥9.68元 | |
请作者喝杯咖啡吧 |
Read Album:
- TinyBPT 和面向 buildroot 的二进制包管理服务(1):设计简介与框架
- RISC-V Linux 内核及周边技术动态第 118 期
- RISC-V Linux 内核及周边技术动态第 117 期
- 实时分析工具 rtla timerlat 介绍(二):延迟测试原理
- 实时分析工具 rtla timerlat 介绍(一):交叉编译及使用