[置顶] 泰晓 RISC-V 实验箱,配套 30+ 讲嵌入式 Linux 系统开发公开课
RISC-V Linux 内核及周边技术动态第 78 期
时间:20240213
编辑:晓怡
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS
内核动态
RISC-V 架构支持
v8: riscv: sophgo: add clock support for Sophgo CV1800/SG2000 SoCs
To perform well on short messages, the new implementation processes the full message in one call to the assembly function if the data is contiguous. Otherwise it falls back to CBC operations followed by CTS at the end. For decryption, to further improve performance on short messages, especially block-aligned messages, the CBC-CTS assembly function parallelizes the AES decryption of all full blocks.
v11: riscv: Create and document PR_RISCV_SET_ICACHE_FLUSH_CTX prctl
Improve the performance of icache flushing by creating a new prctl flag PR_RISCV_SET_ICACHE_FLUSH_CTX. The interface is left generic to allow for future expansions such as with the proposed J extension [1].
Documentation is also provided to explain the use case.
v2: riscv: pwm: sophgo: add pwm support for CV1800
The Sophgo CV1800 chip provides a set of four independent PWM channel outputs. This series adds PWM controller support for Sophgo cv1800.
v2: riscv/fence: Consolidate fence definitions and define __{mb,rmb,wmb}
Disparate fence implementations are consolidated into fence.h.
Introduce __{mb,rmb,wmb}, and rely on the generic definitions for {mb,rmb,wmb}. A first consequence is that __{mb,rmb,wmb} map to a compiler barrier on !SMP (while their definition remains unchanged on SMP).
v1: riscv: Various text patching improvements
Here are a few changes to minimize calls to stop_machine() and flush_icache_*() in the various text patching functions, as well as to simplify the code.
v2: RISC-V: Add dynamic TSO support
The upcoming RISC-V Ssdtso specification introduces a bit in the senvcfg CSR to switch the memory consistency model of user mode at run-time from RVWMO to TSO. The active consistency model can therefore be switched on a per-hart base and managed by the kernel on a per-process base.
v1: clk: renesas: rzg2l: Add support for power domains
Series adds support for power domains on rzg2l driver.
RZ/G2L kind of devices support a functionality called MSTOP (module stop/standby). According to hardware manual the module could be switch to standby after its clocks are disabled. The reverse order of operation should be done when enabling a module (get the module out of standby, enable its clocks etc).
v1: -next: RISC-V: ACPI: Enable CPPC based cpufreq support
This series enables the support for “Collaborative Processor Performance Control (CPPC) on ACPI based RISC-V platforms. It depends on the encoding of CPPC registers as defined in RISC-V FFH spec [2].
GIT PULL: percpu changes for v6.8-rc4
The PR to enable the percpu page allocator had a tlb flush parameter mixup of end vs size.. This contains the fix.
v2: riscv: sophgo: add i2c and spi device to CV180x/SG2000x SoCs
Add i2c and spi devices
The patch depends on the clk patch: https://lore.kernel.org/all/IA1PR20MB4953C774D41EDF1EADB6EC18BB6D2@IA1PR20MB4953.namprd20.prod.outlook.com/
v1: RISC-V: Don’t use IPIs in flush_icache_all() when patching text
If some of the HARTs are parked by stop machine then IPI-based flushing in flush_icache_all() will hang. This hang is observed when text patching is invoked by various debug and BPF features.
进程调度
v1: sched: make cpu_util_cfs visible
As RT, DL, IRQ time could be deemed as lost time of CFS’s task, some timing value want to know the distribution of how these spread approximately by using utilization account value (nivcsw is not enough sometimes), wheras, cpu_util_cfs is not visible out side of kernel/sched, This commit would like to make it be visible.
v2: sched/debug: Dump end of stack when detected corrupted
When debugging a kernel hang during suspend/resume [1], there were random memory corruptions in different places like being reported by slub_debug+KASAN, or detected by scheduler with error message
v2: perf sched: Minor optimizations for resource initialization
start_work_mutex, work_done_wait_mutex, curr_thread, curr_pid, and cpu_last_switched are initialized together in cmd_sched(), but for different perf sched subcommands, some actions are unnecessary, especially perf sched record. This series of patches initialize only required resources for different subcommands.
v2: net-next: net/sched: actions report errors with extack
When an action detects invalid parameters, it should be adding an external ack to netlink so that the user is able to diagnose the issue.
v1: sched: cpufreq: Rename map_util_perf to apply_dvfs_headroom
We are providing headroom for the utilization to grow until the next decision point to pick the next frequency. Give the function a better name and give it some documentation. It is not really mapping anything.
内存管理
v1: hugetlb: two small improvements of hugetlb init parallelization
This series includes two improvements: fixing the PADATA Kconfig warning and a potential bug in gather_bootmem_prealloc_parallel. Please refer to the specific commit message for details.
This is the second version of the series that enables block size > page size (Large Block Size) in XFS. This version has various bug fixes and suggestion collected from the v1[1]. The context and motivation can be seen in cover letter of the v1. We also recorded a talk about this effort at LPC [2], if someone would like more context on this effort.
v5: per-vma locks in userfaultfd
Performing userfaultfd operations (like copy/move etc.) in critical section of mmap_lock (read-mode) causes significant contention on the lock when operations requiring the lock in write-mode are taking place concurrently. We can use per-vma locks instead to significantly reduce the contention issue.
v3: Memory allocation profiling
Memory allocation, v3 and final:
Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production.
v1: mm: document memalloc_noreclaim_save() and memalloc_pin_save()
The memalloc_noreclaim_save() function currently has no documentation comment, so the implications of its usage are not obvious. Namely that it not only prevents entering reclaim (as the name suggests), but also allows using all memory reserves and thus should be only used in contexts that are allocating memory to free memory. This may lead to new improper usages being added.
v4: Enable >0 order folio memory compaction
This patchset enables >0 order folio memory compaction, which is one of the prerequisitions for large folio support[1]. It is on top of mm-everything-2024-02-10-00-56.
Test that KASan can detect some unsafe atomic accesses.
As discussed in the linked thread below, these tests attempt to cover the most common uses of atomics and, therefore, aren’t exhaustive.
v2: Port hierarchical_{memory,swap}_limit cgroup1->cgroup2
which are useful for userland to easily and performance-wise find out the effective cgroup limits being applied. Otherwise userland has to open+read+close the file “memory.max” and/or “memory.swap.max” in multiple parent directories of a nested cgroup.
v1: mm/zswap: optimize for dynamic zswap_pools
Dynamic pool creation has been supported for a long time, which maybe not used so much in practice. But with the per-memcg lru merged, the current structure of zswap_pool’s lru and shrinker become less optimal.
v1: x86/vdso: Move vDSO to mmap region
The vDSO (and its initial randomization) was introduced in commit 2aae950b21e4 (“x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu”), but had very low entropy. The entropy was improved in commit but there is still improvement to be made.
v2: mm/memory: optimize unmap/zap with PTE-mapped THP
This series is based on [1]. Similar to what we did with fork(), let’s implement PTE batching during unmap/zap when processing PTE-mapped THPs.
We collect consecutive PTEs that map consecutive pages of the same large folio, making sure that the other PTE bits are compatible, and (a) adjust the refcount only once per batch, (b) call rmap handling functions only once per batch, (c) perform batch PTE setting/updates and (d) perform TLB entry removal once per batch.
v1: selftests/mm: Don’t needlessly use sudo to obtain root in run_vmtests.sh
When opening yama/ptrace_scope we unconditionally use sudo to ensure we are running as root, resulting in failures if running in a minimal root filesystem where sudo is not installed. Since automated test systems will typically just run all of kselftest as root (and many kselftests rely on this for full functionality) add a check to see if we’re already root and only invoke sudo if not.
v1: mm/hugetlb: Move page order check inside hugetlb_cma_reserve()
All platforms could benefit from page order check against MAX_PAGE_ORDER before allocating a CMA area for gigantic hugetlb pages. Let’s move this check from individual platforms to generic hugetlb.
v2: bpf-next: bpf: Introduce BPF arena.
The work on bpf_arena was inspired by Barret’s work: https://github.com/google/ghost-userspace/blob/main/lib/queue.bpf.h that implements queues, lists and AVL trees completely as bpf programs using giant bpf array map and integer indices instead of pointers. bpf_arena is a sparse array that allows to use normal C pointers to build such data structures. Last few patches implement page_frag allocator, link list and hash table as bpf programs.
v1: mm/memblock: Add MEMBLOCK_RSRV_NOINIT into flagname[] array
The commit 77e6c43e137c (“memblock: introduce MEMBLOCK_RSRV_NOINIT flag”) skipped adding this newly introduced memblock flag into flagname[] array, thus preventing a correct memblock flags output for applicable memblock regions.
v2: Memory management patches needed by Rust Binder
This patchset contains some abstractions needed by the Rust implementation of the Binder driver for passing data between userspace, kernelspace, and directly into other processes.
v1: fs/proc/task_mmu: Add display flag for VM_MAYOVERLAY
VM_UFFD_MISSING flag is mutually exclussive with VM_MAYOVERLAY flag as they both use the same bit position i.e 0x00000200 in the vm_flags. Let’s update show_smap_vma_flags() to display the correct flags depending on CONFIG_MMU.
文件系统
v6: Set casefold/fscrypt dentry operations through sb->s_d_op
v6 of this patchset applying the comments from Eric and the suggestion from Christian. Thank you for your feedback.
Here’s v4 of my patchset of adding fs-verity support to XFS.
This implementation uses extended attributes to store fs-verity metadata. The Merkle tree blocks are stored in the remote extended attributes. The names are offsets into the tree.
v1: dcache: rename d_genocide()
Political context aside, using analogies from the real world in code is supposed to help us human programmers understand the code better.
v1: fs/hfsplus: use better @opf description
Use a more descriptive explanation of the @opf function parameter, more in line with <linux/blk_types.h>.
v1: udf: convert to new mount API
Convert the UDF filesystem to the new mount API.
UDF is slightly unique in that it always preserves prior mount options across a remount, so that’s handled by udf_init_options().
v2: zonefs: convert zonefs to use the new mount api
Convert the zonefs filesystem to use the new mount API. Tested using the zonefs test suite from: https://github.com/damien-lemoal/zonefs-tools
Introduce the LANDLOCK_ACCESS_FS_IOCTL right, which restricts the use of ioctl(2) on file descriptors.
We attach IOCTL access rights to opened file descriptors, as we already do for LANDLOCK_ACCESS_FS_TRUNCATE.
v7: filtering and snapshots of a block devices
The filtering block device mechanism is implemented in the block layer. This allows to attach and detach block device filters. Filters extend the functionality of the block layer. See more in Documentation/block/blkfilter.rst.
v1: quota: Detect loops in quota tree
Syzbot has found that when it creates corrupted quota files where the quota tree contains a loop, we will deadlock when tryling to insert a dquot. Add loop detection into functions traversing the quota tree.
安全增强
DTS for the phone and some fly-by fixes
Patch 1 for Mark/sound Rest for qcom
v1: hardening: Enable KFENCE in the hardening config
KFENCE is not a security mitigation mechanism (due to sampling), but has the performance characteristics of unintrusive hardening techniques. When used at scale, however, it improves overall security by allowing kernel developers to detect heap memory-safety bugs cheaply.
v1: iommu/mtk_iommu: Use devm_kcalloc() instead of devm_kzalloc()
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1].
Here the multiplication is obviously safe because MTK_PROTECT_PA_ALIGN is defined as a literal value of 256 or 128.
v1: iommu/vt-d: Use kcalloc() instead of kzalloc()
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1].
Here the multiplication is obviously safe because DMAR_LATENCY_NUM is the number of latency types defined in the “latency_type” enum.
v2: mtd: rawnand: Prefer struct_size over open coded arithmetic
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1].
v1: fs/ntfs3: use kcalloc() instead of kzalloc()
We are trying to get rid of all multiplications from allocation functions to prevent integer overflows[1]. Here the multiplication is obviously safe, but using kcalloc() is more appropriate and improves readability. This patch has no effect on runtime behavior.
v1: stddef: Allow attributes to be used when creating flex arrays
We’re going to have more cases where we need to apply attributes (e.g. __counted_by) to struct members that have been declared with DECLARE_FLEX_ARRAY. Add a new …_ATTR helper to allow for this and annotate one such user in linux/in.h.
v1: irqchip/bcm-6345-l1: Prefer struct_size over open coded arithmetic
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1].
v1: drm/i915: Add flex arrays to struct i915_syncmap
The “struct i915_syncmap” uses a dynamically sized set of trailing elements. It can use an “u32” array or a “struct i915_syncmap *” array.
v1: scsi: Replace {v}snprintf() variants with safer alternatives
Note: We’re also taking the time to obay our new .editorconfig overlord!
For a far better description of the problem than I could author, see Jon’s write-up on LWN [1] and/or Alex’s on the Kernel Self Protection Project [1].
struct mwifiex_ie_types_chan_list_param_set::chan_scan_param is treated as a flexible array, so convert it into one so that it doesn’t trip the array bounds sanitizer[1]. Only a few places were using sizeof() on the whole struct, so adjust those to follow the calculation pattern to avoid including the trailing single element.
v3: pstore: add multi-backend support
I have been steadily working but struggled to find a seamlessly integrated way to implement tty frontend until Guilherme inspired me that multi-backend and tty frontend are actually two separate entities.
v1: xen/gntalloc: Replace UAPI 1-element array
Without changing the structure size (since it is UAPI), add a proper flexible array member, and reference it in the kernel so that it will not be trip the array-bounds sanitizer[1].
v1: net/sun3_82586: Avoid reading past buffer in debug output
Since NUM_XMIT_BUFFS is always 1, building m68k with sun3_defconfig and -Warraybounds, this build warning is visible[1]:
v3: Tegra30: add support for LG tegra based phones
Bring up Tegra 3 based LG phones Optimus 4X HD and Optimus Vu based on LG X3 board.
v2: selftests/seccomp: Pin benchmark to single CPU
The seccomp benchmark test (for validating the benefit of bitmaps) can be sensitive to scheduling speed, so pin the process to a single CPU, which appears to significantly improve reliability, and loosen the “close enough” checking to allow up to 10% variance instead of 1%.
v3: ubsan: Reintroduce signed overflow sanitizer
In order to mitigate unexpected signed wrap-around[1], bring back the signed integer overflow sanitizer. It was removed in commit 6aaa31aeb9cf (“ubsan: remove overflow checks”) because it was effectively a no-op when combined with -fno-strict-overflow (which correctly changes signed overflow from being “undefined” to being explicitly “wrap around”).
异步 IO
v1: -next: io_uring: switch struct io_kiocb flag definitions to BIT_ULL()
The io_kiocb.flags variable was expanded to 64 bits, but none of the existing or newly-added flag definitions were updated, causing build issues on 32-bit platforms, where unsigned long is a 32-bit value.
v1: liburing: add script for statistics sqpoll running time
Count the running time and actual IO processing time of the sqpoll thread, and output the statistical time to terminal.
v8: io_uring: Statistics of the true utilization of sq threads.
Count the running time and actual IO processing time of the sqpoll thread, and output the statistical data to fdinfo.
Variable description: “work_time” in the code represents the sum of the jiffies of the sq thread actually processing IO, that is, how many milliseconds it actually takes to process IO. “total_time” represents the total time that the sq thread has elapsed from the beginning of the loop to the current time point, that is, how many milliseconds it has spent in total.
Rust For Linux
v2: rust: locks: Add get_mut
method to Lock
Having a mutable reference guarantees that no other threads have access to the lock, so we can take advantage of that to grant callers access to the protected data without the the cost of acquiring and releasing the locks. Since the lifetime of the data is tied to the mutable reference, the borrow checker guarantees that the usage is safe.
v1: bcachefs: add framework for internal Rust code
This series adds support for Rust code into bcachefs. This only enables using Rust internally within bcachefs; there are no public Rust APIs added. Rust support is hidden behind a new config option, CONFIG_BCACHEFS_RUST. It is optional and bcachefs can still be built with full functionality without rust.
v3: rust: place generated init_module() function in .init.text
Currently Rust kernel modules have their init code placed in the
.text
section of the .ko file. I don’t think this causes any real problems for Rust modules as long as all code called during initialization lives in.text
.
v1: rust: stop using ptr_metadata feature
The
byte_sub
method was stabilized in Rust 1.75.0. By using that method, we no longer need the unstableptr_metadata
feature for implementingArc::from_raw
.
周边技术动态
Buildroot
v1: package/libopenssl: security bump to version 3.2.1
And drop the now upstreamed patches.
Fixes the following (low severity) issues:
- CVE-2023-6129 POLY1305 MAC implementation corrupts vector registers on PowerPC https://www.openssl.org/news/secadv/20240109.txt
support/testing: add optee-os runtime test
commit: https://git.buildroot.net/buildroot/commit/?id=cd56ac9eb63f0acecd78b1983f9d889f21f8fe0e branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master
U-Boot
v1: Added FDT PAD memory size while reserving memory for FDT to avoid some memory corruption issue
In the board_f.c file the FDT memory region is reserved without FDT padding bytes. uboot will add some params like bootargs while launching linux. While relocate the FDT, if its decided as run in the Fixed memory location i.e fdy_high is set as -1, then the padding bytes not added while relocating the FDT, but the size is blindly added with padding bytes without reserving the physical memory in the FDT header in the image_fdt.c file.
猜你喜欢:
- 我要投稿:发表原创技术文章,收获福利、挚友与行业影响力
- 泰晓资讯:汇总一周技术趣闻与文章,查看「Linux 资讯」
- 知识星球:独家 Linux 实战经验与技巧,订阅「Linux知识星球」
- 视频频道:泰晓学院,B 站,发布各类 Linux 视频课
- 开源小店:欢迎光临泰晓科技自营店,购物支持泰晓原创
- 技术交流:Linux 用户技术交流微信群,联系微信号:tinylab
支付宝打赏 ¥9.68元 | 微信打赏 ¥9.68元 | |
请作者喝杯咖啡吧 |
Read Album:
- TinyBPT 和面向 buildroot 的二进制包管理服务(1):设计简介与框架
- RISC-V Linux 内核及周边技术动态第 118 期
- RISC-V Linux 内核及周边技术动态第 117 期
- 实时分析工具 rtla timerlat 介绍(二):延迟测试原理
- 实时分析工具 rtla timerlat 介绍(一):交叉编译及使用