89e1fb7cef
49561 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
bc49af56ee |
blktrace: add support for REQ_OP_WRITE_ZEROES tracing
Currently, REQ_OP_WRITE_ZEROES operations are not handled in the
blktrace infrastructure, resulting in incorrect or missing operation
labels in ftrace blktrace output. This manifests as write-zeroes
operations appearing with incorrect labels like "N" instead of a
proper "WZ" designation.
This patch adds complete support for REQ_OP_WRITE_ZEROES across the
blktrace infrastructure:
Add BLK_TC_WRITE_ZEROES trace category in blktrace_api.h and update
BLK_TC_END_V2 marker accordingly
Map REQ_OP_WRITE_ZEROES to BLK_TC_WRITE_ZEROES in __blk_add_trace()
to ensure proper trace event categorization
Update fill_rwbs() to generate "WZ" label for write-zeroes operations
in ftrace output, making them easily identifiable
Add "write-zeroes" string mapping in act_to_str array for debugfs
filter interface
Update blk_fill_rwbs() to handle REQ_OP_WRITE_ZEROES for block layer
event tracing
With this fix, write-zeroes operations are now correctly traced and
displayed.
===========================================================
BEFORE THIS PATCH
===========================================================
blkdiscard -z -o 0 -l 40960 /dev/nvme0n1
blkdiscard-3809 [030] ..... 1212.253701: block_bio_queue: 259,0 NS 0 + 80 [blkdiscard]
blkdiscard-3809 [030] ..... 1212.253703: block_getrq: 259,0 NS 0 + 80 [blkdiscard]
blkdiscard-3809 [030] ..... 1212.253704: block_io_start: 259,0 NS 40960 () 0 + 80 be,0,4 [blkdiscard]
blkdiscard-3809 [030] ..... 1212.253704: block_plug: [blkdiscard]
blkdiscard-3809 [030] ..... 1212.253706: block_unplug: [blkdiscard] 1
blkdiscard-3809 [030] ..... 1212.253706: block_rq_insert: 259,0 NS 40960 () 0 + 80 be,0,4 [blkdiscard]
kworker/30:1H-566 [030] ..... 1212.253726: block_rq_issue: 259,0 NS 40960 () 0 + 80 be,0,4 [kworker/30:1H]
<idle>-0 [030] d.h1. 1212.253957: block_rq_complete: 259,0 NS () 0 + 80 be,0,4 [0]
<idle>-0 [030] dNh1. 1212.253960: block_io_done: 259,0 NS 0 () 0 + 0 none,0,0 [swapper/30]
Trace Event Breakdown:
Event | Device | Op | Sector | Sectors | Byte Size | Calculation
block_bio_queue | 259,0 | NS | 0 | 80 | - | 80 × 512 = 40,960
block_getrq | 259,0 | NS | 0 | 80 | - | 80 × 512 = 40,960
block_io_start | 259,0 | NS | 0 | 80 | 40960 | Direct from trace
block_rq_insert | 259,0 | NS | 0 | 80 | 40960 | Direct from trace
block_rq_issue | 259,0 | NS | 0 | 80 | 40960 | Direct from trace
block_rq_complete | 259,0 | NS | 0 | 80 | - | 80 × 512 = 40,960
block_io_done | 259,0 | NS | 0 | 0 | 0 | Completion (no data)
Total Bytes Transferred: Sectors: 80 Bytes: 80 × 512 = 40,960 bytes
===========================================================
AFTER THIS PATCH
===========================================================
blkdiscard -z -o 0 -l 40960 /dev/nvme0n1
blkdiscard-2477 [020] ..... 960.989131: block_bio_queue: 259,0 WZS 0 + 80 [blkdiscard]
blkdiscard-2477 [020] ..... 960.989134: block_getrq: 259,0 WZS 0 + 80 [blkdiscard]
blkdiscard-2477 [020] ..... 960.989135: block_io_start: 259,0 WZS 40960 () 0 + 80 be,0,4 [blkdiscard]
blkdiscard-2477 [020] ..... 960.989138: block_plug: [blkdiscard]
blkdiscard-2477 [020] ..... 960.989140: block_unplug: [blkdiscard] 1
blkdiscard-2477 [020] ..... 960.989141: block_rq_insert: 259,0 WZS 40960 () 0 + 80 be,0,4 [blkdiscard]
kworker/20:1H-736 [020] ..... 960.989166: block_rq_issue: 259,0 WZS 40960 () 0 + 80 be,0,4 [kworker/20:1H]
<idle>-0 [020] d.h1. 960.989476: block_rq_complete: 259,0 WZS () 0 + 80 be,0,4 [0]
<idle>-0 [020] dNh1. 960.989482: block_io_done: 259,0 WZS 0 () 0 + 0 none,0,0 [swapper/20]
Trace Event Breakdown:
Event | Device | Op | Sector | Sectors | Byte Size | Calculation
block_bio_queue | 259,0 | WZS | 0 | 80 | - | 80 × 512 = 40,960
block_getrq | 259,0 | WZS | 0 | 80 | - | 80 × 512 = 40,960
block_io_start | 259,0 | WZS | 0 | 80 | 40960 | Direct from trace
block_rq_insert | 259,0 | WZS | 0 | 80 | 40960 | Direct from trace
block_rq_issue | 259,0 | WZS | 0 | 80 | 40960 | Direct from trace
block_rq_complete | 259,0 | WZS | 0 | 80 | - | 80 × 512 = 40,960
block_io_done | 259,0 | WZS | 0 | 0 | 0 | Completion (no data)
Total Bytes Transferred: Sectors: 80 Bytes: 80 × 512 = 40,960 bytes
Tested with ftrace blktrace on NVMe devices using blkdiscard with
the -z (write-zeroes) flag.
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
||
|
|
e48886b9d6 |
blktrace: for ftrace use correct trace format ver
The ftrace blktrace path allocates buffers and writes trace events but
was using the wrong recording function. After
commit 4d8bc7bd4f73 ("blktrace: move ftrace blk_io_tracer to blk_io_trace2"),
the ftrace interface was moved to use blk_io_trace2 format, but
__blk_add_trace() still called record_blktrace_event() which writes in
blk_io_trace (v1) format.
This causes critical data corruption:
- blk_io_trace (v1) has 32-bit 'action' field at offset 28
- blk_io_trace2 (v2) has 32-bit 'pid' at offset 28 and 64-bit 'action'
at offset 32
- When record_blktrace_event() writes to a v2 buffer:
* Writing pid (offset 32 in v1) corrupts the v2 action field
* Writing action (offset 28 in v1) corrupts the v2 pid field
* The 64-bit action is truncated to 32-bit via lower_32_bits()
Fix by:
1. Adding version switch to select correct format (v1 vs v2)
2. Calling appropriate recording function based on version
3. Defaulting to v2 for ftrace (as intended by commit 4d8bc7bd4f73)
4. Adding WARN_ONCE for unexpected version values
Without this patch :-
linux-block (for-next) # sh reproduce_blktrace_bug.sh
dd-14242 [033] d..1. 3903.022308: Unknown action 36a2
dd-14242 [033] d..1. 3903.022333: Unknown action 36a2
dd-14242 [033] d..1. 3903.022365: Unknown action 36a2
dd-14242 [033] d..1. 3903.022366: Unknown action 36a2
dd-14242 [033] d..1. 3903.022369: Unknown action 36a2
The action field is corrupted because:
- ftrace allocated blk_io_trace2 buffer (64 bytes)
- But called record_blktrace_event() (writes v1, 48 bytes)
- Field offsets don't match, causing corruption
The hex value shown 0x30e3 is actually a PID, not an action code!
linux-block (for-next) #
linux-block (for-next) #
linux-block (for-next) # sh reproduce_blktrace_bug.sh
Trace output looks correct:
dd-2420 [019] d..1. 59.641742: 251,0 Q RS 0 + 8 [dd]
dd-2420 [019] d..1. 59.641775: 251,0 G RS 0 + 8 [dd]
dd-2420 [019] d..1. 59.641784: 251,0 P N [dd]
dd-2420 [019] d..1. 59.641785: 251,0 U N [dd] 1
dd-2420 [019] d..1. 59.641788: 251,0 D RS 0 + 8 [dd]
Fixes: 4d8bc7bd4f73 ("blktrace: move ftrace blk_io_tracer to blk_io_trace2")
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
||
|
|
4a0940bdca |
blktrace: use debug print to report dropped events
The WARN_ON_ONCE introduced in
commit f9ee38bbf70f ("blktrace: add block trace commands for zone operations")
triggers kernel warnings when zone operations are traced with blktrace
version 1. This can spam the kernel log during normal operation with
zoned block devices when userspace is using the legacy blktrace
protocol.
Currently blktrace implementation drops newly added REQ_OP_ZONE_XXX
when blktrace userspce version is set to 1.
Remove the WARN_ON_ONCE and quietly filter these events. Add a
rate-limited debug message to help diagnose potential issues without
flooding the kernel log. The debug message can be enabled via dynamic
debug when needed for troubleshooting.
This approach is more appropriate as encountering zone operations with
blktrace v1 is an expected condition that should be handled gracefully
rather than warned about, since users may be running older blktrace
userspace tools that only support version 1 of the protocol.
With this patch :-
linux-block (for-next) # git log -1
commit c8966006a0971d2b4bf94c0426eb7e4407c6853f (HEAD -> for-next)
Author: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Date: Mon Oct 27 19:26:53 2025 -0700
blktrace: use debug print to report dropped events
linux-block (for-next) # cdblktests
blktests (master) # ./check blktrace
blktrace/001 (blktrace zone management command tracing) [passed]
runtime 3.805s ... 3.889s
blktests (master) # dmesg -c
blktests (master) # echo "file kernel/trace/blktrace.c +p" > /sys/kernel/debug/dynamic_debug/control
blktests (master) # ./check blktrace
blktrace/001 (blktrace zone management command tracing) [passed]
runtime 3.889s ... 3.881s
blktests (master) # dmesg -c
[ 77.826237] blktrace: blktrace v1 cannot trace zone operation 0x1000190001
[ 77.826260] blktrace: blktrace v1 cannot trace zone operation 0x1000190004
[ 77.826282] blktrace: blktrace v1 cannot trace zone operation 0x1001490007
[ 77.826288] blktrace: blktrace v1 cannot trace zone operation 0x1001890008
[ 77.826343] blktrace: blktrace v1 cannot trace zone operation 0x1000190001
[ 77.826347] blktrace: blktrace v1 cannot trace zone operation 0x1000190004
[ 77.826350] blktrace: blktrace v1 cannot trace zone operation 0x1001490007
[ 77.826354] blktrace: blktrace v1 cannot trace zone operation 0x1001890008
[ 77.826373] blktrace: blktrace v1 cannot trace zone operation 0x1000190001
[ 77.826377] blktrace: blktrace v1 cannot trace zone operation 0x1000190004
blktests (master) # echo "file kernel/trace/blktrace.c -p" > /sys/kernel/debug/dynamic_debug/control
blktests (master) # ./check blktrace
blktrace/001 (blktrace zone management command tracing) [passed]
runtime 3.881s ... 3.824s
blktests (master) # dmesg -c
blktests (master) #
Reported-by: syzbot+153e64c0aa875d7e4c37@syzkaller.appspotmail.com
Fixes: f9ee38bbf70f ("blktrace: add block trace commands for zone operations")
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
||
|
|
4ae8efb4f9 |
blktrace: handle BLKTRACESETUP2 ioctl
Handle the BLKTRACESETUP2 ioctl, requesting an extended version of the blktrace protocol from user-space. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
3f6722816a |
blktrace: trace zone write plugging operations
Trace zone write plugging operations on block devices. As tracing of zoned block commands needs the upper 32bit of the widened 64bit action, only add traces to blktrace if user-space has requested version 2 of the blktrace protocol. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
1c164fcc1b |
blktrace: expose ZONE APPEND completions to blktrace
Expose ZONE APPEND completions as a block trace completion action to blktrace. As tracing of zoned block commands needs the upper 32bit of the widened 64bit action, only add traces to blktrace if user-space has requested version 2 of the blktrace protocol. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
f9ee38bbf7 |
blktrace: add block trace commands for zone operations
Add block trace commands for zone operations. These commands can only be handled with version 2 of the blktrace protocol. For version 1, warn if a command that does not fit into the 16 bits reserved for the command in this version is passed in. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
4d8bc7bd4f |
blktrace: move ftrace blk_io_tracer to blk_io_trace2
Move ftrace's blk_io_tracer to the new blk_io_trace2 infrastructure. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
67bfa74d81 |
blktrace: move trace_note to blk_io_trace2
Move trace_note() to the new blk_io_trace2 infrastructure. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
915bb53860 |
blktrace: differentiate between blk_io_trace versions
Differentiate between blk_io_trace and blk_io_trace2 when relaying to user-space depending on which version has been requested by the blktrace utility. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
c44347d606 |
blktrace: add definitions for struct blk_io_trace2
Add definitions for the extended version of the blktrace protocol using a wider action type to be able to record new actions in the kernel. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
113cbd6282 |
blktrace: pass blk_user_trace2 to setup functions
Pass struct blk_user_trace_setup2 to blktrace_setup_finalize(). This prepares for the incoming extension of the blktrace protocol with a 64bit act_mask. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
0d8627cc93 |
blktrace: add definitions for blk_user_trace_setup2
Add definitions for a version 2 of the blk_user_trace_setup ioctl. This new ioctl will enable a different struct layout of the binary data passed to user-space when using a new version of the blktrace utility requesting the new struct layout. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
42da88a724 |
blktrace: split do_blk_trace_setup into two functions
Split do_blk_trace_setup into two functions, this is done to prepare for an incoming new BLKTRACESETUP2 ioctl(2) which can receive extended parameters from user-space. Also move the size verification logic to the callers in preparation for using a new internal structure later. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
370cd70a40 |
blktrace: change the internal action to 64bit
Change the internal use of the action in blktrace to 64bit. Although for now only the lower 32bits will be used. With the upcoming version 2 of the blktrace user-space protocol the upper 32bit will also be utilized. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
70e3c62b89 |
blktrace: untangle if/else sequence in __blk_add_trace
Untangle the if/else sequence setting the trace action in __blk_add_trace() and turn it into a switch statement for better extensibility. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
04678e72e9 |
blktrace: split out relaying a blktrace event
Split out the code relaying a blktrace event to user-space using relayfs. This enables adding a second version supporting a new version of the protocol. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
472eca5383 |
blktrace: factor out recording a blktrace event
Factor out the recording of a blktrace event into its own function, deduplicating the code. This also enables recording different versions of the blktrace protocol later on. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
a65988a0ad |
blktrace: only calculate trace length once
De-duplicate the calculation of the trace length instead of doing the calculation twice, once for calling trace_buffer_lock_reserve() and once for calling relay_reserve(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
d9043c79ba |
- Make sure the check for lost pelt idle time is done unconditionally to
have correct lost idle time accounting - Stop the deadline server task before a CPU goes offline -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmj0yvsACgkQEsHwGGHe VUqiYRAAncYon7a++87nuCHIw2ktAcjn4PJTz0F1VGw9ZvcbWThUhNoA17jd4uOz XCzSH1rnHnlz359cJIzFwgVYjkBIaqT8GBN0al9ODra37laZCo89bKLmOeAlH81H 1xJXrDwn7U8dYBjgf6E6OGCdAx40kspCBxmpxrFW1VrGDvfNjEAKezm5GWeSED0Z umA93dBr82i4IvfARUkK8s35ctHyx+o+7lCvCSsKSJgM02WWrKqAA/lv6jFjIgdE 0UuYJv+5A2e1Iog2KNSbvSPn23VaMnsZtvXfJoRLFHEsNTiL9NliTnwrOY6xx0Z8 9+GUeWsbobKwcKSk4dctOh0g/4afNbxWe2aAPmScHJNHtXHSeejps+zy4xFCLTZn 2muHCdZ2zo6YSL+og4TQax+FnLYnGUtPFDOQYsNxv/Cp1H+cbgvG5Qp08XXt8Tfl Mt82g25GKklc28AN5Ui7FKTFmV2K363pV04YVZjXOwmxwiEYbwKw8gKfxi7CRW7S fl4nW6Kp8BFtJQxc/RCXDIiX3h0wRlTOmF5FzyFYxgdsmO5AdGqS9tqknLrV2NlH JVtj7alnrmCU34LwtTVfCvYQZiNd4IN+B6/htsL3AzrcLnqJz4O/T/Eyv9UL4yUs yvQuO+yStCyk0BFYaGM3/E0xp87NYjaLiHnpM2jia3DT3UT1t7Q= =uqJW -----END PGP SIGNATURE----- Merge tag 'sched_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Make sure the check for lost pelt idle time is done unconditionally to have correct lost idle time accounting - Stop the deadline server task before a CPU goes offline * tag 'sched_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix pelt lost idle time detection sched/deadline: Stop dl_server before CPU goes offline |
||
|
|
343b4b44a1 |
- Make sure perf reporting works correctly in setups using overlayfs or FUSE
- Move the uprobe optimization to a better location logically -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmj0x3YACgkQEsHwGGHe VUo6nQ/9E8LWC6PJG40QUXNZuj5qLe9VVaiWTW7w/zgeCf9nxkt6OhlOIu4fCMKz 6n5marqnvOoG9EXetUz5+n0wJvc9vDACESC0m6ESddaI4PGXULNJIsN2C5dR3UZ3 RULxaXvz9PVVkW3UIuM/U9az7fsG/ttH1rtrWQOsUYQZEO7vA9g+8KtASwnB7yBa 29WzVDYQIuHigdFPkVOuKBEdhslOjNjMM/N/shFOyFS62MGgwwFG/f4xv0c2GanJ 9gS2HPGhwOXLm8x/1Y6D8eKjiT5lvqZcDcRnui8bj7L7YGx+HU4PhRIIg7sBvGqA QQGolxA9Xo2BTufUTxEQK9v2fSvg0f9wuKbkDbRUdyUeWiZZjEeBM/m0AkzEEeKf FUrLCi3V/mN5J/sXSgIwjuCtYctwmsfaukL2bz6DB7feoTHceQmHunKCtBlDZtLE Md/4hzMNYM+T/3nx27quGz8Cepxn9PSObN7W+DddWr0TxOxg2Pq6iMbnd7MulueP K/AMvqDtbbVUB1XpsFvadRLcYUYYfXT9tiOCxa9O2w2NXDG8qeB6FZwScBaWuz1N 9GpKBhVMgZT8m0d3N8NoBi0+h32UVZnsJJ3UhHnceE8UyYf4kSO5L2K3nPHJa301 AavIPkH7+YOl5TAg6JlyYbRRdwfoUzxKUqY/hQ6Q8aLvwb2Jing= =huy7 -----END PGP SIGNATURE----- Merge tag 'perf_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Make sure perf reporting works correctly in setups using overlayfs or FUSE - Move the uprobe optimization to a better location logically * tag 'perf_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix MMAP2 event device with backing files perf/core: Fix MMAP event path names with backing files perf/core: Fix address filter match with backing files uprobe: Move arch_uprobe_optimize right after handlers execution |
||
|
|
f6fddc6df3 |
bpf: Fix memory leak in __lookup_instance error path
When __lookup_instance() allocates a func_instance structure but fails
to allocate the must_write_set array, it returns an error without freeing
the previously allocated func_instance. This causes a memory leak of 192
bytes (sizeof(struct func_instance)) each time this error path is triggered.
Fix by freeing 'result' on must_write_set allocation failure.
Fixes: b3698c356ad9 ("bpf: callchain sensitive stack liveness tracking using CFG")
Reported-by: BPF Runtime Fuzzer (BRF)
Signed-off-by: Shardul Bankar <shardulsb08@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20251016063330.4107547-1-shardulsb08@gmail.com
|
||
|
|
5fb750e8a9 |
bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
The following kmemleak splat:
[ 8.105530] kmemleak: Trying to color unknown object at 0xff11000100e918c0 as Black
[ 8.106521] Call Trace:
[ 8.106521] <TASK>
[ 8.106521] dump_stack_lvl+0x4b/0x70
[ 8.106521] kvfree_call_rcu+0xcb/0x3b0
[ 8.106521] ? hrtimer_cancel+0x21/0x40
[ 8.106521] bpf_obj_free_fields+0x193/0x200
[ 8.106521] htab_map_update_elem+0x29c/0x410
[ 8.106521] bpf_prog_cfc8cd0f42c04044_overwrite_cb+0x47/0x4b
[ 8.106521] bpf_prog_8c30cd7c4db2e963_overwrite_timer+0x65/0x86
[ 8.106521] bpf_prog_test_run_syscall+0xe1/0x2a0
happens due to the combination of features and fixes, but mainly due to
commit 6d78b4473cdb ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()")
It's using __GFP_HIGH, which instructs slub/kmemleak internals to skip
kmemleak_alloc_recursive() on allocation, so subsequent kfree_rcu()->
kvfree_call_rcu()->kmemleak_ignore() complains with the above splat.
To fix this imbalance, replace bpf_map_kmalloc_node() with
kmalloc_nolock() and kfree_rcu() with call_rcu() + kfree_nolock() to
make sure that the objects allocated with kmalloc_nolock() are freed
with kfree_nolock() rather than the implicit kfree() that kfree_rcu()
uses internally.
Note, the kmalloc_nolock() happens under bpf_spin_lock_irqsave(), so
it will always fail in PREEMPT_RT. This is not an issue at the moment,
since bpf_timers are disabled in PREEMPT_RT. In the future
bpf_spin_lock will be replaced with state machine similar to
bpf_task_work.
Fixes: 6d78b4473cdb ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Link: https://lore.kernel.org/bpf/20251015000700.28988-1-alexei.starovoitov@gmail.com
|
||
|
|
17e3e88ed0 |
sched/fair: Fix pelt lost idle time detection
The check for some lost idle pelt time should be always done when
pick_next_task_fair() fails to pick a task and not only when we call it
from the fair fast-path.
The case happens when the last running task on rq is a RT or DL task. When
the latter goes to sleep and the /Sum of util_sum of the rq is at the max
value, we don't account the lost of idle time whereas we should.
Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
||
|
|
ee6e44dfe6 |
sched/deadline: Stop dl_server before CPU goes offline
IBM CI tool reported kernel warning[1] when running a CPU removal
operation through drmgr[2]. i.e "drmgr -c cpu -r -q 1"
WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219 cpudl_set+0x58/0x170
NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
Call Trace:
[c000000002c2f8c0] init_stack+0x78c0/0x8000 (unreliable)
[c0000000002b7cb8] dl_server_timer+0x168/0x2a0
[c00000000034df84] __hrtimer_run_queues+0x1a4/0x390
[c00000000034f624] hrtimer_interrupt+0x124/0x300
[c00000000002a230] timer_interrupt+0x140/0x320
Git bisects to: commit 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
This happens since:
- dl_server hrtimer gets enqueued close to cpu offline, when
kthread_park enqueues a fair task.
- CPU goes offline and drmgr removes it from cpu_present_mask.
- hrtimer fires and warning is hit.
Fix it by stopping the dl_server before CPU is marked dead.
[1]: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com/
[2]: https://github.com/ibm-power-utilities/powerpc-utils/tree/next/src/drmgr
[sshegde: wrote the changelog and tested it]
Fixes: 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
Closes: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
|
||
|
|
fa4f4bae89 |
perf/core: Fix MMAP2 event device with backing files
Some file systems like FUSE-based ones or overlayfs may record the backing
file in struct vm_area_struct vm_file, instead of the user file that the
user mmapped.
That causes perf to misreport the device major/minor numbers of the file
system of the file, and the generation of the file, and potentially other
inode details. There is an existing helper file_user_inode() for that
situation.
Use file_user_inode() instead of file_inode() to get the inode for MMAP2
events.
Example:
Setup:
# cd /root
# mkdir test ; cd test ; mkdir lower upper work merged
# cp `which cat` lower
# mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
# perf record -e cycles:u -- /root/test/merged/cat /proc/self/maps
...
55b2c91d0000-55b2c926b000 r-xp 00018000 00:1a 3419 /root/test/merged/cat
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.004 MB perf.data (5 samples) ]
#
# stat /root/test/merged/cat
File: /root/test/merged/cat
Size: 1127792 Blocks: 2208 IO Block: 4096 regular file
Device: 0,26 Inode: 3419 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2025-09-08 12:23:59.453309624 +0000
Modify: 2025-09-08 12:23:59.454309624 +0000
Change: 2025-09-08 12:23:59.454309624 +0000
Birth: 2025-09-08 12:23:59.453309624 +0000
Before:
Device reported 00:02 differs from stat output and /proc/self/maps
# perf script --show-mmap-events | grep /root/test/merged/cat
cat 377 [-01] 243.078558: PERF_RECORD_MMAP2 377/377: [0x55b2c91d0000(0x9b000) @ 0x18000 00:02 3419 2068525940]: r-xp /root/test/merged/cat
After:
Device reported 00:1a is the same as stat output and /proc/self/maps
# perf script --show-mmap-events | grep /root/test/merged/cat
cat 362 [-01] 127.755167: PERF_RECORD_MMAP2 362/362: [0x55ba6e781000(0x9b000) @ 0x18000 00:1a 3419 0]: r-xp /root/test/merged/cat
With respect to stable kernels, overlayfs mmap function ovl_mmap() was
added in v4.19 but file_user_inode() was not added until v6.8 and never
back-ported to stable kernels. FMODE_BACKING that it depends on was added
in v6.5. This issue has gone largely unnoticed, so back-porting before
v6.8 is probably not worth it, so put 6.8 as the stable kernel prerequisite
version, although in practice the next long term kernel is 6.12.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org # 6.8
|
||
|
|
8818f507a9 |
perf/core: Fix MMAP event path names with backing files
Some file systems like FUSE-based ones or overlayfs may record the backing
file in struct vm_area_struct vm_file, instead of the user file that the
user mmapped.
Since commit def3ae83da02f ("fs: store real path instead of fake path in
backing file f_path"), file_path() no longer returns the user file path
when applied to a backing file. There is an existing helper
file_user_path() for that situation.
Use file_user_path() instead of file_path() to get the path for MMAP
and MMAP2 events.
Example:
Setup:
# cd /root
# mkdir test ; cd test ; mkdir lower upper work merged
# cp `which cat` lower
# mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
# perf record -e intel_pt//u -- /root/test/merged/cat /proc/self/maps
...
55b0ba399000-55b0ba434000 r-xp 00018000 00:1a 3419 /root/test/merged/cat
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.060 MB perf.data ]
#
Before:
File name is wrong (/cat), so decoding fails:
# perf script --no-itrace --show-mmap-events
cat 367 [016] 100.491492: PERF_RECORD_MMAP2 367/367: [0x55b0ba399000(0x9b000) @ 0x18000 00:02 3419 489959280]: r-xp /cat
...
# perf script --itrace=e | wc -l
Warning:
19 instruction trace errors
19
#
After:
File name is correct (/root/test/merged/cat), so decoding is ok:
# perf script --no-itrace --show-mmap-events
cat 364 [016] 72.153006: PERF_RECORD_MMAP2 364/364: [0x55ce4003d000(0x9b000) @ 0x18000 00:02 3419 3132534314]: r-xp /root/test/merged/cat
# perf script --itrace=e
# perf script --itrace=e | wc -l
0
#
Fixes: def3ae83da02f ("fs: store real path instead of fake path in backing file f_path")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org
|
||
|
|
ebfc8542ad |
perf/core: Fix address filter match with backing files
It was reported that Intel PT address filters do not work in Docker
containers. That relates to the use of overlayfs.
overlayfs records the backing file in struct vm_area_struct vm_file,
instead of the user file that the user mmapped. In order for an address
filter to match, it must compare to the user file inode. There is an
existing helper file_user_inode() for that situation.
Use file_user_inode() instead of file_inode() to get the inode for address
filter matching.
Example:
Setup:
# cd /root
# mkdir test ; cd test ; mkdir lower upper work merged
# cp `which cat` lower
# mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
# perf record --buildid-mmap -e intel_pt//u --filter 'filter * @ /root/test/merged/cat' -- /root/test/merged/cat /proc/self/maps
...
55d61d246000-55d61d2e1000 r-xp 00018000 00:1a 3418 /root/test/merged/cat
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.015 MB perf.data ]
# perf buildid-cache --add /root/test/merged/cat
Before:
Address filter does not match so there are no control flow packets
# perf script --itrace=e
# perf script --itrace=b | wc -l
0
# perf script -D | grep 'TIP.PGE' | wc -l
0
#
After:
Address filter does match so there are control flow packets
# perf script --itrace=e
# perf script --itrace=b | wc -l
235
# perf script -D | grep 'TIP.PGE' | wc -l
57
#
With respect to stable kernels, overlayfs mmap function ovl_mmap() was
added in v4.19 but file_user_inode() was not added until v6.8 and never
back-ported to stable kernels. FMODE_BACKING that it depends on was added
in v6.5. This issue has gone largely unnoticed, so back-porting before
v6.8 is probably not worth it, so put 6.8 as the stable kernel prerequisite
version, although in practice the next long term kernel is 6.12.
Closes: https://lore.kernel.org/linux-perf-users/aBCwoq7w8ohBRQCh@fremen.lan
Reported-by: Edd Barrett <edd@theunixzoo.co.uk>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org # 6.8
|
||
|
|
62685ab071 |
uprobe: Move arch_uprobe_optimize right after handlers execution
It's less confusing to optimize uprobe right after handlers execution and before we do the check for changed ip register to avoid situations where changed ip register would skip uprobe optimization. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> |
||
|
|
67029a49db |
tracing fixes for v6.18:
The previous fix to trace_marker required updating trace_marker_raw
as well. The difference between trace_marker_raw from trace_marker
is that the raw version is for applications to write binary structures
directly into the ring buffer instead of writing ASCII strings.
This is for applications that will read the raw data from the ring
buffer and get the data structures directly. It's a bit quicker than
using the ASCII version.
Unfortunately, it appears that our test suite has several tests that
test writes to the trace_marker file, but lacks any tests to the
trace_marker_raw file (this needs to be remedied). Two issues came
about the update to the trace_marker_raw file that syzbot found:
- Fix tracing_mark_raw_write() to use per CPU buffer
The fix to use the per CPU buffer to copy from user space was needed for
both the trace_maker and trace_maker_raw file.
The fix for reading from user space into per CPU buffers properly fixed
the trace_marker write function, but the trace_marker_raw file wasn't
fixed properly. The user space data was correctly written into the per CPU
buffer, but the code that wrote into the ring buffer still used the user
space pointer and not the per CPU buffer that had the user space data
already written.
- Stop the fortify string warning from writing into trace_marker_raw
After converting the copy_from_user_nofault() into a memcpy(), another
issue appeared. As writes to the trace_marker_raw expects binary data, the
first entry is a 4 byte identifier. The entry structure is defined as:
struct {
struct trace_entry ent;
int id;
char buf[];
};
The size of this structure is reserved on the ring buffer with:
size = sizeof(*entry) + cnt;
Then it is copied from the buffer into the ring buffer with:
memcpy(&entry->id, buf, cnt);
This use to be a copy_from_user_nofault(), but now converting it to
a memcpy() triggers the fortify-string code, and causes a warning.
The allocated space is actually more than what is copied, as the cnt
used also includes the entry->id portion. Allocating sizeof(*entry)
plus cnt is actually allocating 4 bytes more than what is needed.
Change the size function to:
size = struct_size(entry, buf, cnt - sizeof(entry->id));
And update the memcpy() to unsafe_memcpy().
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaOq0fhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qr4HAQCNbFEDjzNNEueCWLOptC5YfeJgdvPB
399CzuFJl02ZOgD/flPJa1r+NaeYOBhe1BgpFF9FzB/SPXAXkXUGLM7WIgg=
=goks
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"The previous fix to trace_marker required updating trace_marker_raw as
well. The difference between trace_marker_raw from trace_marker is
that the raw version is for applications to write binary structures
directly into the ring buffer instead of writing ASCII strings. This
is for applications that will read the raw data from the ring buffer
and get the data structures directly. It's a bit quicker than using
the ASCII version.
Unfortunately, it appears that our test suite has several tests that
test writes to the trace_marker file, but lacks any tests to the
trace_marker_raw file (this needs to be remedied). Two issues came
about the update to the trace_marker_raw file that syzbot found:
- Fix tracing_mark_raw_write() to use per CPU buffer
The fix to use the per CPU buffer to copy from user space was
needed for both the trace_maker and trace_maker_raw file.
The fix for reading from user space into per CPU buffers properly
fixed the trace_marker write function, but the trace_marker_raw
file wasn't fixed properly. The user space data was correctly
written into the per CPU buffer, but the code that wrote into the
ring buffer still used the user space pointer and not the per CPU
buffer that had the user space data already written.
- Stop the fortify string warning from writing into trace_marker_raw
After converting the copy_from_user_nofault() into a memcpy(),
another issue appeared. As writes to the trace_marker_raw expects
binary data, the first entry is a 4 byte identifier. The entry
structure is defined as:
struct {
struct trace_entry ent;
int id;
char buf[];
};
The size of this structure is reserved on the ring buffer with:
size = sizeof(*entry) + cnt;
Then it is copied from the buffer into the ring buffer with:
memcpy(&entry->id, buf, cnt);
This use to be a copy_from_user_nofault(), but now converting it to
a memcpy() triggers the fortify-string code, and causes a warning.
The allocated space is actually more than what is copied, as the
cnt used also includes the entry->id portion. Allocating
sizeof(*entry) plus cnt is actually allocating 4 bytes more than
what is needed.
Change the size function to:
size = struct_size(entry, buf, cnt - sizeof(entry->id));
And update the memcpy() to unsafe_memcpy()"
* tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Stop fortify-string from warning in tracing_mark_raw_write()
tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
|
||
|
|
fbde105f13 |
bpf-fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjpp+sACgkQ6rmadz2v
bTo+9Q//bUzEc2C64NbG0DTCcxnkDEadzBLIB0BAwnAkuRjR8HJiPGoCdBJhUqzM
/hNfIHTtDdyspU2qZbM+r4nVJ6zRAwIHrT2d/knERxXtRQozaWvUlRhVmf5tdWYm
DkbThS9sAfAOs21YjV7OWPrf7bC7T9syQTAfN0CE8cujZY7OnqCyzNwfb8iIusyo
+Ctm0/qUDVtd6SjdPAQjzp82fHIIwnMFZtWJiZml5LklL1Mx5cuVrT/sWgr5KATW
vZ9rUfgaiJkAsSX0sSlLnAI76+kJRB+IkmK1TRdWFlwW6dTsa/7MkDeXXPN1dEDi
o5ZqhcvaY0eAMbU4iX72Juf6gVFF6AgVwsrHmM79ICjg5umCLN/90QqYPc0ChRxl
EYuSWVQ6/cgV3W6l+KU53cwmRjjdSzyJQFei03COZ0iKF6xic0cynS3BKMQL6HkU
3BfTj19h+dxt7qywRaJFsrWK4t/uBX6N75XlVa9od/sk91tR/ibtJ6hcyuJGATr5
nkfMkyN155upAffUnkhv37TXtMXyX8/kd7BddCet31JJXyJuJZ0vYuOcur6awGyN
aB0T1ueG15sTfGf0zpxVNWhVqswHI/1Suk8EXwbDeHRcsmtrp8XWYawf5StIMwW0
Uy8GjS5KVl5bcrfDbcvj79jajpVjwnvFR1Sir9C4aROm5OpH3Hs=
=77JH
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Finish constification of 1st parameter of bpf_d_path() (Rong Tao)
- Harden userspace-supplied xdp_desc validation (Alexander Lobakin)
- Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel
Borkmann)
- Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers)
- Use correct context to unpin bpf hash map with special types (KaFai
Wan)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add test for unpinning htab with internal timer struct
bpf: Avoid RCU context warning when unpinning htab with internal structs
xsk: Harden userspace-supplied xdp_desc validation
bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
libbpf: Fix undefined behavior in {get,put}_unaligned_be32()
bpf: Finish constification of 1st parameter of bpf_d_path()
|
||
|
|
ae13bd2310 |
Just one series here - Mike Rappoport has taught KEXEC handover to
preserve vmalloc allocations across handover. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaOmDWAAKCRDdBJ7gKXxA jh+MAQDUPBj3mFm238CXI5DC1gJ3ETe3NJjJvfzIjLs51c+dFgD+PUuvDA0GUtKH LCl6T+HJXh2FgGn1F2Kl/0hwPtEvuA4= =HYr7 -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more updates from Andrew Morton: "Just one series here - Mike Rappoport has taught KEXEC handover to preserve vmalloc allocations across handover" * tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt kho: add support for preserving vmalloc allocations kho: replace kho_preserve_phys() with kho_preserve_pages() kho: check if kho is finalized in __kho_preserve_order() MAINTAINERS, .mailmap: update Umang's email address |
||
|
|
54b91e54b1 |
tracing: Stop fortify-string from warning in tracing_mark_raw_write()
The way tracing_mark_raw_write() records its data is that it has the
following structure:
struct {
struct trace_entry;
int id;
char buf[];
};
But memcpy(&entry->id, buf, size) triggers the following warning when the
size is greater than the id:
------------[ cut here ]------------
memcpy: detected field-spanning write (size 6) of single field "&entry->id" at kernel/trace/trace.c:7458 (size 4)
WARNING: CPU: 7 PID: 995 at kernel/trace/trace.c:7458 write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0
Modules linked in:
CPU: 7 UID: 0 PID: 995 Comm: bash Not tainted 6.17.0-test-00007-g60b82183e78a-dirty #211 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
RIP: 0010:write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0
Code: 04 00 75 a7 b9 04 00 00 00 48 89 de 48 89 04 24 48 c7 c2 e0 b1 d1 b2 48 c7 c7 40 b2 d1 b2 c6 05 2d 88 6a 04 01 e8 f7 e8 bd ff <0f> 0b 48 8b 04 24 e9 76 ff ff ff 49 8d 7c 24 04 49 8d 5c 24 08 48
RSP: 0018:ffff888104c3fc78 EFLAGS: 00010292
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 1ffffffff6b363b4 RDI: 0000000000000001
RBP: ffff888100058a00 R08: ffffffffb041d459 R09: ffffed1020987f40
R10: 0000000000000007 R11: 0000000000000001 R12: ffff888100bb9010
R13: 0000000000000000 R14: 00000000000003e3 R15: ffff888134800000
FS: 00007fa61d286740(0000) GS:ffff888286cad000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000560d28d509f1 CR3: 00000001047a4006 CR4: 0000000000172ef0
Call Trace:
<TASK>
tracing_mark_raw_write+0x1fe/0x290
? __pfx_tracing_mark_raw_write+0x10/0x10
? security_file_permission+0x50/0xf0
? rw_verify_area+0x6f/0x4b0
vfs_write+0x1d8/0xdd0
? __pfx_vfs_write+0x10/0x10
? __pfx_css_rstat_updated+0x10/0x10
? count_memcg_events+0xd9/0x410
? fdget_pos+0x53/0x5e0
ksys_write+0x182/0x200
? __pfx_ksys_write+0x10/0x10
? do_user_addr_fault+0x4af/0xa30
do_syscall_64+0x63/0x350
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa61d318687
Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
RSP: 002b:00007ffd87fe0120 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fa61d286740 RCX: 00007fa61d318687
RDX: 0000000000000006 RSI: 0000560d28d509f0 RDI: 0000000000000001
RBP: 0000560d28d509f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006
R13: 00007fa61d4715c0 R14: 00007fa61d46ee80 R15: 0000000000000000
</TASK>
---[ end trace 0000000000000000 ]---
This is because fortify string sees that the size of entry->id is only 4
bytes, but it is writing more than that. But this is OK as the
dynamic_array is allocated to handle that copy.
The size allocated on the ring buffer was actually a bit too big:
size = sizeof(*entry) + cnt;
But cnt includes the 'id' and the buffer data, so adding cnt to the size
of *entry actually allocates too much on the ring buffer.
Change the allocation to:
size = struct_size(entry, buf, cnt - sizeof(entry->id));
and the memcpy() to unsafe_memcpy() with an added justification.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20251011112032.77be18e4@gandalf.local.home
Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space")
Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
||
|
|
bda745ee8f |
tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
The fix to use a per CPU buffer to read user space tested only the writes to trace_marker. But it appears that the selftests are missing tests to the trace_maker_raw file. The trace_maker_raw file is used by applications that writes data structures and not strings into the file, and the tools read the raw ring buffer to process the structures it writes. The fix that reads the per CPU buffers passes the new per CPU buffer to the trace_marker file writes, but the update to the trace_marker_raw write read the data from user space into the per CPU buffer, but then still used then passed the user space address to the function that records the data. Pass in the per CPU buffer and not the user space address. TODO: Add a test to better test trace_marker_raw. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20251011035243.386098147@kernel.org Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
4f375ade6a |
bpf: Avoid RCU context warning when unpinning htab with internal structs
When unpinning a BPF hash table (htab or htab_lru) that contains internal structures (timer, workqueue, or task_work) in its values, a BUG warning is triggered: BUG: sleeping function called from invalid context at kernel/bpf/hashtab.c:244 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 14, name: ksoftirqd/0 ... The issue arises from the interaction between BPF object unpinning and RCU callback mechanisms: 1. BPF object unpinning uses ->free_inode() which schedules cleanup via call_rcu(), deferring the actual freeing to an RCU callback that executes within the RCU_SOFTIRQ context. 2. During cleanup of hash tables containing internal structures, htab_map_free_internal_structs() is invoked, which includes cond_resched() or cond_resched_rcu() calls to yield the CPU during potentially long operations. However, cond_resched() or cond_resched_rcu() cannot be safely called from atomic RCU softirq context, leading to the BUG warning when attempting to reschedule. Fix this by changing from ->free_inode() to ->destroy_inode() and rename bpf_free_inode() to bpf_destroy_inode() for BPF objects (prog, map, link). This allows direct inode freeing without RCU callback scheduling, avoiding the invalid context warning. Reported-by: Le Chen <tom2cat@sjtu.edu.cn> Closes: https://lore.kernel.org/all/1444123482.1827743.1750996347470.JavaMail.zimbra@sjtu.edu.cn/ Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.") Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251008102628.808045-2-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
5472d60c12 |
tracing clean up and fixes for v6.18:
- Have osnoise tracer use memdup_user_nul()
The function osnoise_cpus_write() open codes a kmalloc() and then
a copy_from_user() and then adds a nul byte at the end which is the
same as simply using memdup_user_nul().
- Fix wakeup and irq tracers when failing to acquire calltime
When the wakeup and irq tracers use the function graph tracer for
tracing function times, it saves a timestamp into the fgraph shadow
stack. It is possible that this could fail to be stored. If that
happens, it exits the routine early. These functions also disable
nesting of the operations by incremeting the data "disable" counter.
But if the calltime exits out early, it never increments the counter
back to what it needs to be.
Since there's only a couple of lines of code that does work after
acquiring the calltime, instead of exiting out early, reverse the
if statement to be true if calltime is acquired, and place the code
that is to be done within that if block. The clean up will always
be done after that.
- Fix ring_buffer_map() return value on failure of __rb_map_vma()
If __rb_map_vma() fails in ring_buffer_map(), it does not return
an error. This means the caller will be working against a bad vma
mapping. Have ring_buffer_map() return an error when __rb_map_vma()
fails.
- Fix regression of writing to the trace_marker file
A bug fix was made to change __copy_from_user_inatomic() to
copy_from_user_nofault() in the trace_marker write function.
The trace_marker file is used by applications to write into
it (usually with a file descriptor opened at the start of the
program) to record into the tracing system. It's usually used
in critical sections so the write to trace_marker is highly
optimized.
The reason for copying in an atomic section is that the write
reserves space on the ring buffer and then writes directly into
it. After it writes, it commits the event. The time between
reserve and commit must have preemption disabled.
The trace marker write does not have any locking nor can it
allocate due to the nature of it being a critical path.
Unfortunately, converting __copy_from_user_inatomic() to
copy_from_user_nofault() caused a regression in Android.
Now all the writes from its applications trigger the fault that
is rejected by the _nofault() version that wasn't rejected by
the _inatomic() version. Instead of getting data, it now just
gets a trace buffer filled with:
tracing_mark_write: <faulted>
To fix this, on opening of the trace_marker file, allocate
per CPU buffers that can be used by the write call. Then
when entering the write call, do the following:
preempt_disable();
cpu = smp_processor_id();
buffer = per_cpu_ptr(cpu_buffers, cpu);
do {
cnt = nr_context_switches_cpu(cpu);
migrate_disable();
preempt_enable();
ret = copy_from_user(buffer, ptr, size);
preempt_disable();
migrate_enable();
} while (!ret && cnt != nr_context_switches_cpu(cpu));
if (!ret)
ring_buffer_write(buffer);
preempt_enable();
This works similarly to seqcount. As it must enabled preemption
to do a copy_from_user() into a per CPU buffer, if it gets
preempted, the buffer could be corrupted by another task.
To handle this, read the number of context switches of the current
CPU, disable migration, enable preemption, copy the data from
user space, then immediately disable preemption again.
If the number of context switches is the same, the buffer
is still valid. Otherwise it must be assumed that the buffer may
have been corrupted and it needs to try again.
Now the trace_marker write can get the user data even if it has
to fault it in, and still not grab any locks of its own.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaOfVOhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qq6oAP9y+zxuouWjtXIz9/z++aykFgKCkeau
XHSSdJdn4R+AQgEA4SE0UWKH0F6Bg7qwyocahMMQ1tIJRrpihfNrKBUmmQ4=
=wDGp
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing clean up and fixes from Steven Rostedt:
- Have osnoise tracer use memdup_user_nul()
The function osnoise_cpus_write() open codes a kmalloc() and then a
copy_from_user() and then adds a nul byte at the end which is the
same as simply using memdup_user_nul().
- Fix wakeup and irq tracers when failing to acquire calltime
When the wakeup and irq tracers use the function graph tracer for
tracing function times, it saves a timestamp into the fgraph shadow
stack. It is possible that this could fail to be stored. If that
happens, it exits the routine early. These functions also disable
nesting of the operations by incremeting the data "disable" counter.
But if the calltime exits out early, it never increments the counter
back to what it needs to be.
Since there's only a couple of lines of code that does work after
acquiring the calltime, instead of exiting out early, reverse the if
statement to be true if calltime is acquired, and place the code that
is to be done within that if block. The clean up will always be done
after that.
- Fix ring_buffer_map() return value on failure of __rb_map_vma()
If __rb_map_vma() fails in ring_buffer_map(), it does not return an
error. This means the caller will be working against a bad vma
mapping. Have ring_buffer_map() return an error when __rb_map_vma()
fails.
- Fix regression of writing to the trace_marker file
A bug fix was made to change __copy_from_user_inatomic() to
copy_from_user_nofault() in the trace_marker write function. The
trace_marker file is used by applications to write into it (usually
with a file descriptor opened at the start of the program) to record
into the tracing system. It's usually used in critical sections so
the write to trace_marker is highly optimized.
The reason for copying in an atomic section is that the write
reserves space on the ring buffer and then writes directly into it.
After it writes, it commits the event. The time between reserve and
commit must have preemption disabled.
The trace marker write does not have any locking nor can it allocate
due to the nature of it being a critical path.
Unfortunately, converting __copy_from_user_inatomic() to
copy_from_user_nofault() caused a regression in Android. Now all the
writes from its applications trigger the fault that is rejected by
the _nofault() version that wasn't rejected by the _inatomic()
version. Instead of getting data, it now just gets a trace buffer
filled with:
tracing_mark_write: <faulted>
To fix this, on opening of the trace_marker file, allocate per CPU
buffers that can be used by the write call. Then when entering the
write call, do the following:
preempt_disable();
cpu = smp_processor_id();
buffer = per_cpu_ptr(cpu_buffers, cpu);
do {
cnt = nr_context_switches_cpu(cpu);
migrate_disable();
preempt_enable();
ret = copy_from_user(buffer, ptr, size);
preempt_disable();
migrate_enable();
} while (!ret && cnt != nr_context_switches_cpu(cpu));
if (!ret)
ring_buffer_write(buffer);
preempt_enable();
This works similarly to seqcount. As it must enabled preemption to do
a copy_from_user() into a per CPU buffer, if it gets preempted, the
buffer could be corrupted by another task.
To handle this, read the number of context switches of the current
CPU, disable migration, enable preemption, copy the data from user
space, then immediately disable preemption again. If the number of
context switches is the same, the buffer is still valid. Otherwise it
must be assumed that the buffer may have been corrupted and it needs
to try again.
Now the trace_marker write can get the user data even if it has to
fault it in, and still not grab any locks of its own.
* tag 'trace-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have trace_marker use per-cpu data to read user space
ring buffer: Propagate __rb_map_vma return value to caller
tracing: Fix irqoff tracers on failure of acquiring calltime
tracing: Fix wakeup tracers on failure of acquiring calltime
tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nul
|
||
|
|
64cf7d058a |
tracing: Have trace_marker use per-cpu data to read user space
It was reported that using __copy_from_user_inatomic() can actually schedule. Which is bad when preemption is disabled. Even though there's logic to check in_atomic() is set, but this is a nop when the kernel is configured with PREEMPT_NONE. This is due to page faulting and the code could schedule with preemption disabled. Link: https://lore.kernel.org/all/20250819105152.2766363-1-luogengkun@huaweicloud.com/ The solution was to change the __copy_from_user_inatomic() to copy_from_user_nofault(). But then it was reported that this caused a regression in Android. There's several applications writing into trace_marker() in Android, but now instead of showing the expected data, it is showing: tracing_mark_write: <faulted> After reverting the conversion to copy_from_user_nofault(), Android was able to get the data again. Writes to the trace_marker is a way to efficiently and quickly enter data into the Linux tracing buffer. It takes no locks and was designed to be as non-intrusive as possible. This means it cannot allocate memory, and must use pre-allocated data. A method that is actively being worked on to have faultable system call tracepoints read user space data is to allocate per CPU buffers, and use them in the callback. The method uses a technique similar to seqcount. That is something like this: preempt_disable(); cpu = smp_processor_id(); buffer = this_cpu_ptr(&pre_allocated_cpu_buffers, cpu); do { cnt = nr_context_switches_cpu(cpu); migrate_disable(); preempt_enable(); ret = copy_from_user(buffer, ptr, size); preempt_disable(); migrate_enable(); } while (!ret && cnt != nr_context_switches_cpu(cpu)); if (!ret) ring_buffer_write(buffer); preempt_enable(); It's a little more involved than that, but the above is the basic logic. The idea is to acquire the current CPU buffer, disable migration, and then enable preemption. At this moment, it can safely use copy_from_user(). After reading the data from user space, it disables preemption again. It then checks to see if there was any new scheduling on this CPU. If there was, it must assume that the buffer was corrupted by another task. If there wasn't, then the buffer is still valid as only tasks in preemptable context can write to this buffer and only those that are running on the CPU. By using this method, where trace_marker open allocates the per CPU buffers, trace_marker writes can access user space and even fault it in, without having to allocate or take any locks of its own. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Luo Gengkun <luogengkun@huaweicloud.com> Cc: Wattson CI <wattson-external@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20251008124510.6dba541a@gandalf.local.home Fixes: 3d62ab32df065 ("tracing: Fix tracing_marker may trigger page fault during preempt_disable") Reported-by: Runping Lai <runpinglai@google.com> Tested-by: Runping Lai <runpinglai@google.com> Closes: https://lore.kernel.org/linux-trace-kernel/20251007003417.3470979-2-runpinglai@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
de4cbd7047 |
ring buffer: Propagate __rb_map_vma return value to caller
The return value from `__rb_map_vma()`, which rejects writable or executable mappings (VM_WRITE, VM_EXEC, or !VM_MAYSHARE), was being ignored. As a result the caller of `__rb_map_vma` always returned 0 even when the mapping had actually failed, allowing it to proceed with an invalid VMA. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251008172516.20697-1-ankitkhushwaha.linux@gmail.com Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions") Reported-by: syzbot+ddc001b92c083dbf2b97@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=194151be8eaebd826005329b2e123aecae714bdb Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
c834a97962 |
tracing: Fix irqoff tracers on failure of acquiring calltime
The functions irqsoff_graph_entry() and irqsoff_graph_return() both call func_prolog_dec() that will test if the data->disable is already set and if not, increment it and return. If it was set, it returns false and the caller exits. The caller of this function must decrement the disable counter, but misses doing so if the calltime fails to be acquired. Instead of exiting out when calltime is NULL, change the logic to do the work if it is not NULL and still do the clean up at the end of the function if it is NULL. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251008114943.6f60f30f@gandalf.local.home Fixes: a485ea9e3ef3 ("tracing: Fix irqsoff and wakeup latency tracers when using function graph") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-2-sashal@kernel.org/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
4f7bf54b07 |
tracing: Fix wakeup tracers on failure of acquiring calltime
The functions wakeup_graph_entry() and wakeup_graph_return() both call func_prolog_preempt_disable() that will test if the data->disable is already set and if not, increment it and disable preemption. If it was set, it returns false and the caller exits. The caller of this function must decrement the disable counter, but misses doing so if the calltime fails to be acquired. Instead of exiting out when calltime is NULL, change the logic to do the work if it is not NULL and still do the clean up at the end of the function if it is NULL. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251008114835.027b878a@gandalf.local.home Fixes: a485ea9e3ef3 ("tracing: Fix irqsoff and wakeup latency tracers when using function graph") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-1-sashal@kernel.org/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
f0c029d2ff |
tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nul
Replace kmalloc() followed by copy_from_user() with memdup_user_nul() to simplify and improve osnoise_cpus_write(). Remove the manual NUL-termination. No functional changes intended. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251001130907.364673-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
a667300bd5 |
kho: add support for preserving vmalloc allocations
A vmalloc allocation is preserved using binary structure similar to global KHO memory tracker. It's a linked list of pages where each page is an array of physical address of pages in vmalloc area. kho_preserve_vmalloc() hands out the physical address of the head page to the caller. This address is used as the argument to kho_vmalloc_restore() to restore the mapping in the vmalloc address space and populate it with the preserved pages. [pasha.tatashin@soleen.com: free chunks using free_page() not kfree()] Link: https://lkml.kernel.org/r/mafs0a52idbeg.fsf@kernel.org [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20250921054458.4043761-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
8375b76517 |
kho: replace kho_preserve_phys() with kho_preserve_pages()
to make it clear that KHO operates on pages rather than on a random physical address. The kho_preserve_pages() will be also used in upcoming support for vmalloc preservation. Link: https://lkml.kernel.org/r/20250921054458.4043761-3-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
469661d0d3 |
kho: check if kho is finalized in __kho_preserve_order()
Patch series "kho: add support for preserving vmalloc allocations", v5.
Following the discussion about preservation of memfd with LUO [1] these
patches add support for preserving vmalloc allocations.
Any KHO uses case presumes that there's a data structure that lists
physical addresses of preserved folios (and potentially some additional
metadata). Allowing vmalloc preservations with KHO allows scalable
preservation of such data structures.
For instance, instead of allocating array describing preserved folios in
the fdt, memfd preservation can use vmalloc:
preserved_folios = vmalloc_array(nr_folios, sizeof(*preserved_folios));
memfd_luo_preserve_folios(preserved_folios, folios, nr_folios);
kho_preserve_vmalloc(preserved_folios, &folios_info);
This patch (of 4):
Instead of checking if kho is finalized in each caller of
__kho_preserve_order(), do it in the core function itself.
Link: https://lkml.kernel.org/r/20250921054458.4043761-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250921054458.4043761-2-rppt@kernel.org
Link: https://lore.kernel.org/all/20250807014442.3829950-30-pasha.tatashin@soleen.com [1]
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||
|
|
2215336295 |
hyperv-next for v6.18
-----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmjkpakTHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXip5B/48MvTFJ1qwRGPVzevZQ8Z4SDogEREp 69VS/xRf1YCIzyXyanwqf1dXLq8NAqicSp6ewpJAmNA55/9O0cwT2EtohjeGCu61 krPIvS3KT7xI0uSEniBdhBtALYBscnQ0e3cAbLNzL7bwA6Q6OmvoIawpBADgE/cW aZNCK9jy+WUqtXc6lNtkJtST0HWGDn0h04o2hjqIkZ+7ewjuEEJBUUB/JZwJ41Od UxbID0PAcn9O4n/u/Y/GH65MX+ddrdCgPHEGCLAGAKT24lou3NzVv445OuCw0c4W ilALIRb9iea56ZLVBW5O82+7g9Ag41LGq+841MNlZjeRNONGykaUpTWZ =OR26 -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Unify guest entry code for KVM and MSHV (Sean Christopherson) - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain() (Nam Cao) - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV (Mukesh Rathor) - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov) - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna Kumar T S M) - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari, Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley) * tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hyperv: Remove the spurious null directive line MAINTAINERS: Mark hyperv_fb driver Obsolete fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver Drivers: hv: Make CONFIG_HYPERV bool Drivers: hv: Add CONFIG_HYPERV_VMBUS option Drivers: hv: vmbus: Fix typos in vmbus_drv.c Drivers: hv: vmbus: Fix sysfs output format for ring buffer index Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store() x86/hyperv: Switch to msi_create_parent_irq_domain() mshv: Use common "entry virt" APIs to do work in root before running guest entry: Rename "kvm" entry code assets to "virt" to genericize APIs entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper mshv: Handle NEED_RESCHED_LAZY before transferring to guest x86/hyperv: Add kexec/kdump support on Azure CVMs Drivers: hv: Simplify data structures for VMBus channel close message Drivers: hv: util: Cosmetic changes for hv_utils_transport.c mshv: Add support for a new parent partition configuration clocksource: hyper-v: Skip unnecessary checks for the root partition hyperv: Add missing field to hv_output_map_device_interrupt |
||
|
|
21fbefc588 |
tracing updates for v6.18:
- Use READ_ONCE() and WRITE_ONCE() instead of RCU for syscall tracepoints Individual system call trace events are pseudo events attached to the raw_syscall trace events that just trace the entry and exit of all system calls. When any of these individual system call trace events get enabled, an element in an array indexed by the system call number is assigned to the trace file that defines how to trace it. When the trace event triggers, it reads this array and if the array has an element, it uses that trace file to know what to write it (the trace file defines the output format of the corresponding system call). The issue is that it uses rcu_dereference_ptr() and marks the elements of the array as using RCU. This is incorrect. There is no RCU synchronization here. The event file that is pointed to has a completely different way to make sure its freed properly. The reading of the array during the system call trace event is only to know if there is a value or not. If not, it does nothing (it means this system call isn't being traced). If it does, it uses the information to store the system call data. The RCU usage here can simply be replaced by READ_ONCE() and WRITE_ONCE() macros. - Have the system call trace events use "0x" for hex values Some system call trace events display hex values but do not have "0x" in front of it. Seeing "count: 44" can be assumed that it is 44 decimal when in actuality it is 44 hex (68 decimal). Display "0x44" instead. - Use vmalloc_array() in tracing_map_sort_entries() The function tracing_map_sort_entries() used array_size() and vmalloc() when it could have simply used vmalloc_array(). - Use for_each_online_cpu() in trace_osnoise.c() Instead of open coding for_each_cpu(cpu, cpu_online_mask), use for_each_online_cpu(). - Move the buffer field in struct trace_seq to the end The buffer field in struct trace_seq is architecture dependent in size, and caused padding for the fields after it. By moving the buffer to the end of the structure, it compacts the trace_seq structure better. - Remove redundant zeroing of cmdline_idx field in saved_cmdlines_buffer() The structure that contains cmdline_idx is zeroed by memset(), no need to explicitly zero any of its fields after that. - Use system_percpu_wq instead of system_wq in user_event_mm_remove() As system_wq is being deprecated, use the new wq. - Add cond_resched() is ftrace_module_enable() Some modules have a lot of functions (thousands of them), and the enabling of those functions can take some time. On non preemtable kernels, it was triggering a watchdog timeout. Add a cond_resched() to prevent that. - Add a BUILD_BUG_ON() to make sure PID_MAX_DEFAULT is always a power of 2 There's code that depends on PID_MAX_DEFAULT being a power of 2 or it will break. If in the future that changes, make sure the build fails to ensure that the code is fixed that depends on this. - Grab mutex_lock() before ever exiting s_start() The s_start() function is a seq_file start routine. As s_stop() is always called even if s_start() fails, and s_stop() expects the event_mutex to be held as it will always release it. That mutex must always be taken in s_start() even if that function fails. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaN/4phQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qpToAP4sENpZkZHFOl2PuikmAgjhB6PqiUtL LNuMDx45WygcLwD6Awy8DUlEBN6RzTPA761MSjs0+NMg16QLrhPLxWqFEgw= =nxDn -----END PGP SIGNATURE----- Merge tag 'trace-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Use READ_ONCE() and WRITE_ONCE() instead of RCU for syscall tracepoints Individual system call trace events are pseudo events attached to the raw_syscall trace events that just trace the entry and exit of all system calls. When any of these individual system call trace events get enabled, an element in an array indexed by the system call number is assigned to the trace file that defines how to trace it. When the trace event triggers, it reads this array and if the array has an element, it uses that trace file to know what to write it (the trace file defines the output format of the corresponding system call). The issue is that it uses rcu_dereference_ptr() and marks the elements of the array as using RCU. This is incorrect. There is no RCU synchronization here. The event file that is pointed to has a completely different way to make sure its freed properly. The reading of the array during the system call trace event is only to know if there is a value or not. If not, it does nothing (it means this system call isn't being traced). If it does, it uses the information to store the system call data. The RCU usage here can simply be replaced by READ_ONCE() and WRITE_ONCE() macros. - Have the system call trace events use "0x" for hex values Some system call trace events display hex values but do not have "0x" in front of it. Seeing "count: 44" can be assumed that it is 44 decimal when in actuality it is 44 hex (68 decimal). Display "0x44" instead. - Use vmalloc_array() in tracing_map_sort_entries() The function tracing_map_sort_entries() used array_size() and vmalloc() when it could have simply used vmalloc_array(). - Use for_each_online_cpu() in trace_osnoise.c() Instead of open coding for_each_cpu(cpu, cpu_online_mask), use for_each_online_cpu(). - Move the buffer field in struct trace_seq to the end The buffer field in struct trace_seq is architecture dependent in size, and caused padding for the fields after it. By moving the buffer to the end of the structure, it compacts the trace_seq structure better. - Remove redundant zeroing of cmdline_idx field in saved_cmdlines_buffer() The structure that contains cmdline_idx is zeroed by memset(), no need to explicitly zero any of its fields after that. - Use system_percpu_wq instead of system_wq in user_event_mm_remove() As system_wq is being deprecated, use the new wq. - Add cond_resched() is ftrace_module_enable() Some modules have a lot of functions (thousands of them), and the enabling of those functions can take some time. On non preemtable kernels, it was triggering a watchdog timeout. Add a cond_resched() to prevent that. - Add a BUILD_BUG_ON() to make sure PID_MAX_DEFAULT is always a power of 2 There's code that depends on PID_MAX_DEFAULT being a power of 2 or it will break. If in the future that changes, make sure the build fails to ensure that the code is fixed that depends on this. - Grab mutex_lock() before ever exiting s_start() The s_start() function is a seq_file start routine. As s_stop() is always called even if s_start() fails, and s_stop() expects the event_mutex to be held as it will always release it. That mutex must always be taken in s_start() even if that function fails. * tag 'trace-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix lock imbalance in s_start() memory allocation failure path tracing: Ensure optimized hashing works ftrace: Fix softlockup in ftrace_module_enable tracing: replace use of system_wq with system_percpu_wq tracing: Remove redundant 0 value initialization tracing: Move buffer in trace_seq to end of struct tracing/osnoise: Use for_each_online_cpu() instead of for_each_cpu() tracing: Use vmalloc_array() to improve code tracing: Have syscall trace events show "0x" for values greater than 10 tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE() |
||
|
|
2cd14dff16 |
Probes fixes for v6.17
- tracing: Fix race condition in kprobe initialization causing NULL pointer dereference. This happens on weak memory model, which does not correctly manage the flags access with appropriate memory barriers. Use RELEASE-ACQUIRE to fix it. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmjeikYbHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bEOYIAKk4KwhSGAhx4Q+Mkqzy pbNPvYMdAwocYIqxg3C1/OVK22DcauTZxMzRq7c3bW776Na6eWD4y9aykhRhNlGY TeKNAtZ0OdRSlq9ISj7P9H4A+Wywrn1CfNkGsZWRrEy0s2xkwUJjEOciI/WA3o8V Nd+Qv+/FTRZ9w/678hb647WrTXIsxSyfBG1L+l25GgqKqqwAtKO8Ur+BhOglxZoC WVHsJT6Xmisg0QiujXX3BgDFAXmAFkBuIQd+D+LHWXgo7W8I6VB7ilQ+hibyXAzf 6yYkZ49kyNLeaWf0/4cc/NaKkq8eK43PXvSvs2yhDt8yWthRLiK/DdeAl63ZNihp D2g= =Ad1o -----END PGP SIGNATURE----- Merge tag 'probes-fixes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probe fix from Masami Hiramatsu: - Fix race condition in kprobe initialization causing NULL pointer dereference. This happens on weak memory model, which does not correctly manage the flags access with appropriate memory barriers. Use RELEASE-ACQUIRE to fix it. * tag 'probes-fixes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix race condition in kprobe initialization causing NULL pointer dereference |
||
|
|
908057d185 |
This update includes the following changes:
Drivers: - Add ciphertext hiding support to ccp. - Add hashjoin, gather and UDMA data move features to hisilicon. - Add lz4 and lz77_only to hisilicon. - Add xilinx hwrng driver. - Add ti driver with ecb/cbc aes support. - Add ring buffer idle and command queue telemetry for GEN6 in qat. Others: - Use rcu_dereference_all to stop false alarms in rhashtable. - Fix CPU number wraparound in padata. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmjbpJsACgkQxycdCkmx i6fuTRAAzv5o0MIw4Kc7EEU3zMgFSX0FdcTUPY+eiFrWZrSrvUVW+jYcH9ppO8J7 offAYSZYatcyyU9+u8X22CQNKLdXnKQQ0YymWO35TOpvVxveUM1bqEEV1ZK0xaXD hlJTLoFIsPaVVhi8CW+ZNhDJBwJHNCv7Yi9TUB6sC7rilWWbJ5LzbEVw3Rtg81Lx 0hcuGX2LrpsHOVVWYxGdJ534Kt2lrkt+8/gWOFg3ap3RVQ39tohEjS2Adm2p8eiX zIdru/aYd89EcYoxuFyylX2d/OLmMAQpFsADy/Fys26eeOWtqggH62V1LAiSyEqw vLRBCVKpLhlbNNfnUs0f5nqjjYEUrNk9SA4rgoxITwKoucbWBQMS4zWJTEDKz29n iBBqHsukGpwVOE6RY8BzR/QNJKhZCSsJpGkagS1v6VPa5P1QomuKftGXKB7JKXKz xoyk+DhJyA8rkb/E5J9Ni7+Tb08Y4zvJ1dpCQHZMlln3DKkK+kk3gkpoxXMZwBV2 LbEMGTI+sfnAfqkGCJYAZR9gDJ5LQDR9jy/Ds5jvPuVvvjyY5LY/bjETqGPF2QVs Rz2Sg0RHl7PVZOP6QgbQzkV7SkJrZfyu5iYd0ZfUqZr7BaHLOHJG/E/HlUW3/mXu OjD+Q5gPhiOdc/qn+32+QERTDCFQdbByv0h7khGQA5vHE3XCu8E= =knnk -----END PGP SIGNATURE----- Merge tag 'v6.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Drivers: - Add ciphertext hiding support to ccp - Add hashjoin, gather and UDMA data move features to hisilicon - Add lz4 and lz77_only to hisilicon - Add xilinx hwrng driver - Add ti driver with ecb/cbc aes support - Add ring buffer idle and command queue telemetry for GEN6 in qat Others: - Use rcu_dereference_all to stop false alarms in rhashtable - Fix CPU number wraparound in padata" * tag 'v6.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (78 commits) dt-bindings: rng: hisi-rng: convert to DT schema crypto: doc - Add explicit title heading to API docs hwrng: ks-sa - fix division by zero in ks_sa_rng_init KEYS: X.509: Fix Basic Constraints CA flag parsing crypto: anubis - simplify return statement in anubis_mod_init crypto: hisilicon/qm - set NULL to qm->debug.qm_diff_regs crypto: hisilicon/qm - clear all VF configurations in the hardware crypto: hisilicon - enable error reporting again crypto: hisilicon/qm - mask axi error before memory init crypto: hisilicon/qm - invalidate queues in use crypto: qat - Return pointer directly in adf_ctl_alloc_resources crypto: aspeed - Fix dma_unmap_sg() direction rhashtable: Use rcu_dereference_all and rcu_dereference_all_check crypto: comp - Use same definition of context alloc and free ops crypto: omap - convert from tasklet to BH workqueue crypto: qat - Replace kzalloc() + copy_from_user() with memdup_user() crypto: caam - double the entropy delay interval for retry padata: WQ_PERCPU added to alloc_workqueue users padata: replace use of system_unbound_wq with system_dfl_wq crypto: cryptd - WQ_PERCPU added to alloc_workqueue users ... |
||
|
|
67da125e30 |
RCU pull request for v6.18
This pull request contains the following branches, non-octopus merged:
Documentation updates:
- Update whatisRCU.rst and checklist.rst for recent RCU API additions.
- Fix RCU documentation formatting and typos.
- Replace dead Ottawa Linux Symposium links in RTFP.txt.
Miscellaneous RCU updates:
- Document that rcu_barrier() hurries RCU_LAZY callbacks.
- Remove redundant interrupt disabling from
rcu_preempt_deferred_qs_handler().
- Move list_for_each_rcu from list.h to rculist.h, and adjust the
include directive in kernel/cgroup/dmem.c accordingly.
- Make initial set of changes to accommodate upcoming system_percpu_wq
changes.
SRCU updates:
- Create an srcu_read_lock_fast_notrace() for eventual use in tracing,
including adding guards.
- Document the reliance on per-CPU operations as implicit RCU readers
in __srcu_read_{,un}lock_fast().
- Document the srcu_flip() function's memory-barrier D's relationship
to SRCU-fast readers.
- Remove a redundant preempt_disable() and preempt_enable() pair from
srcu_gp_start_if_needed().
Torture-test updates:
- Fix jitter.sh spin time so that it actually varies as advertised.
It is still quite coarse-grained, but at least it does now vary.
- Update torture.sh help text to include the not-so-new --do-normal
parameter, which permits (for example) testing KCSAN kernels without
doing non-debug kernels.
- Fix a number of false-positive diagnostics that were being triggered
by rcutorture starting before boot completed. Running multiple
near-CPU-bound rcutorture processes when there is only the boot CPU
is after all a bit excessive.
- Substitute kcalloc() for kzalloc().
- Remove a redundant kfree() and NULL out kfree()ed objects.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmjWzFcTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jAUeD/4xp/3rvVlfX6UB1ax/lYbQopOm2Hns
7DVO/lp8ih6jUFCRappw7do+jbU3EcP+76sGUd5qkKlRbJIierTHBrolULpKph/p
hQ/LddPYIHg+bPtbq/vA6fFwFI/xwbBNQOMS1bWzzU5iWn/p/ETS1kWptz+g2wk5
pOxx9PnH52Ls5BgCCnb8kGpUuG3cy63sFb52ORrN206EaUq59Q12CLA7aTge4QGV
3fvBIv+U+chagELG0usoPPTC+fMUt8oj8vfVYyDqUPjoyxATDtvxAv7ORFI7ZmEm
CVemKrHWEaVEERfXSSbMp0TpPrNnkB4SoYOGT6vakyjgpcQHLh0GMaUZfluvyROt
nQNaQFbOLfh9GGBjZQpAu1Aa2aAvMcOxyrL1JPhvwFVT0G4hF6s4Zs0zLY7+MAPD
XT0wSf79U4lTzlg4eNrwZazoFGIeUhwyH2X59yc04yXytM7QUBFw+7XIK/PA8Wn4
LgJYrwn6poFinGT2HlwKPrIUKSfCpLS8ePutmMgMUqLhiVtKvoaE6S38h2T97d9i
OuLFDVMm+j0awMadS3cJD5kqtGVbqnPsUuaC3OFlpyng8K+OHHTNCfcFMiMbhgdw
w6cMioYdq/a9mLoN6T50ylvTOKeoKWxXI4X6q6x7vZYUztMpFby+b5mTTbR12dJs
LSqHR8QGSIpNmQ==
=FEf1
-----END PGP SIGNATURE-----
Merge tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Paul McKenney:
"Documentation updates:
- Update whatisRCU.rst and checklist.rst for recent RCU API additions
- Fix RCU documentation formatting and typos
- Replace dead Ottawa Linux Symposium links in RTFP.txt
Miscellaneous RCU updates:
- Document that rcu_barrier() hurries RCU_LAZY callbacks
- Remove redundant interrupt disabling from
rcu_preempt_deferred_qs_handler()
- Move list_for_each_rcu from list.h to rculist.h, and adjust the
include directive in kernel/cgroup/dmem.c accordingly
- Make initial set of changes to accommodate upcoming
system_percpu_wq changes
SRCU updates:
- Create an srcu_read_lock_fast_notrace() for eventual use in
tracing, including adding guards
- Document the reliance on per-CPU operations as implicit RCU readers
in __srcu_read_{,un}lock_fast()
- Document the srcu_flip() function's memory-barrier D's relationship
to SRCU-fast readers
- Remove a redundant preempt_disable() and preempt_enable() pair from
srcu_gp_start_if_needed()
Torture-test updates:
- Fix jitter.sh spin time so that it actually varies as advertised.
It is still quite coarse-grained, but at least it does now vary
- Update torture.sh help text to include the not-so-new --do-normal
parameter, which permits (for example) testing KCSAN kernels
without doing non-debug kernels
- Fix a number of false-positive diagnostics that were being
triggered by rcutorture starting before boot completed. Running
multiple near-CPU-bound rcutorture processes when there is only the
boot CPU is after all a bit excessive
- Substitute kcalloc() for kzalloc()
- Remove a redundant kfree() and NULL out kfree()ed objects"
* tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
rcu: WQ_UNBOUND added to sync_wq workqueue
rcu: WQ_PERCPU added to alloc_workqueue users
rcu: replace use of system_wq with system_percpu_wq
refperf: Set reader_tasks to NULL after kfree()
refperf: Remove redundant kfree() after torture_stop_kthread()
srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed()
srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
rculist: move list_for_each_rcu() to where it belongs
refscale: Use kcalloc() instead of kzalloc()
rcutorture: Use kcalloc() instead of kzalloc()
docs: rcu: Replace multiple dead OLS links in RTFP.txt
doc: Fix typo in RCU's torture.rst documentation
Documentation: RCU: Retitle toctree index
Documentation: RCU: Reduce toctree depth
Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block
rcu: docs: Requirements.rst: Abide by conventions of kernel documentation
doc: Add RCU guards to checklist.rst
doc: Update whatisRCU.rst for recent RCU API additions
rcutorture: Delay forward-progress testing until boot completes
...
|
||
|
|
48e3694ae7 |
printk changes for 6.18
-----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmjeOX8bFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyM7MP/0mD0oJa/8DRiZ0r1qLM xt/mCZIXhbqfJdCIaLEpfL35qPI59PWvrECGxwzL7CXOHlORxoCW98LoUDkigRKW e93mv8xSBaac11QrMSy32cEMSpzXmcNjz5u/02AUveXpxERr82nLgGwKY8eF0eAy 7wF+N4bGHPShEy2J7kVFVrgZompFQLDgPB3/GycSdKsZd7ss9XiKa/EWvPJbEGOD CNC0MlAd5QDxVXxhioC9g9tZaUENZrqxys1vfROFx8RlaobRa2Vt/TEJQ5WCzmKo JO8JdojD03h0U2DdFE/lgTjf/lz8hTiaRaTifNnhUofqvzFKupzcv9L2xq/zmpfC 4dMW6YOObD/FVeNVsgVC16/1WDKl+XfyogAmtTBjSNoPd6WoSczZsfubxGc30lW4 AqJ2OpPa+CXq2elL+Qb0dLVekMhNytjfYdyFK/FP0DXrkoV4FMHdfNLW58TR7Pcq iN1MNEkEs4dWSbn2xJzKlFQyVq2/t6A6l5N/kodZ1xLWiwFYzeA3FTOVzrOlXnde IAYh3iH2WoTiHjZB/oomhnXKCMeTIvMYsGQQl4y+XqRHIwpEjSjcm8/M3hvlMyJE m0D6cpW8IRLoZ0raxqJPVpXR3WZsLJm2InGMrfY/tPWTU7L5uxmF51GPXYKdUmpZ NUgzh2cOwSWWU4UHt/AiD9cd =uFn0 -----END PGP SIGNATURE----- Merge tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add KUnit test for the printk ring buffer - Fix the check of the maximal record size which is allowed to be stored into the printk ring buffer. It prevents corruptions of the ring buffer. Note that printk() is on the safe side. The messages are limited by 1kB buffer and are always small enough for the minimal log buffer size 4kB, see CONFIG_LOG_BUF_SHIFT definition. * tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: ringbuffer: Fix data block max size check printk: kunit: support offstack cpumask printk: kunit: Fix __counted_by() in struct prbtest_rbdata printk: ringbuffer: Explain why the KUnit test ignores failed writes printk: ringbuffer: Add KUnit test |