twx-linux/kernel/trace
Petr Pavlu af3274905b ring-buffer: Fix a race between readers and resize checks
commit c2274b908db05529980ec056359fae916939fdaa upstream.

The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.

The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:

[  190.271762] ------------[ cut here ]------------
[  190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[  190.271789] Modules linked in: [...]
[  190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[  190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G            E      6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[  190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[  190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[  190.272023] Code: [...]
[  190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[  190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[  190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[  190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[  190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[  190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[  190.272053] FS:  00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[  190.272057] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[  190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  190.272077] Call Trace:
[  190.272098]  <TASK>
[  190.272189]  ring_buffer_resize+0x2ab/0x460
[  190.272199]  __tracing_resize_ring_buffer.part.0+0x23/0xa0
[  190.272206]  tracing_resize_ring_buffer+0x65/0x90
[  190.272216]  tracing_entries_write+0x74/0xc0
[  190.272225]  vfs_write+0xf5/0x420
[  190.272248]  ksys_write+0x67/0xe0
[  190.272256]  do_syscall_64+0x82/0x170
[  190.272363]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  190.272373] RIP: 0033:0x7f1bd657d263
[  190.272381] Code: [...]
[  190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[  190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[  190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[  190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[  190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[  190.272412]  </TASK>
[  190.272414] ---[ end trace 0000000000000000 ]---

Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab792705c
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.

The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():

 ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
 if (!ret)
 	goto spin;
 for (unsigned i = 0; i < 1U << 26; i++)  /* inserted delay loop */
 	__asm__ __volatile__ ("" : : : "memory");
 rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;

.. and then run the following commands on the target system:

 echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
 while true; do
 	echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 	echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 done &
 while true; do
 	for i in /sys/kernel/tracing/per_cpu/*; do
 		timeout 0.1 cat $i/trace_pipe; sleep 0.2
 	done
 done

To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.

Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12 11:11:19 +02:00
..
rv tracing/tools: Updates for 6.4 2023-04-28 16:11:26 -07:00
blktrace.c
bpf_trace.c bpf: support deferring bpf_link dealloc to after RCU grace period 2024-04-10 16:36:06 +02:00
bpf_trace.h
error_report-traces.c
fgraph.c tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
fprobe.c fprobe: Fix to ensure the number of active retprobes is not zero 2023-10-17 10:22:42 +09:00
ftrace_internal.h tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
ftrace.c ftrace: Fix possible use-after-free issue in ftrace_location() 2024-06-12 11:11:17 +02:00
Kconfig Probes updates for v6.5: 2023-06-30 10:44:53 -07:00
kprobe_event_gen_test.c tracing: Fix wrong return in kprobe_event_gen_test.c 2023-03-19 12:20:48 -04:00
Makefile tracing/probes: Move finding func-proto API and getting func-param API to trace_btf 2023-08-23 09:39:45 +09:00
pid_list.c
pid_list.h
power-traces.c
preemptirq_delay_test.c
rethook.c rethook: Use __rcu pointer for rethook::handler 2023-12-13 18:45:19 +01:00
ring_buffer_benchmark.c
ring_buffer.c ring-buffer: Fix a race between readers and resize checks 2024-06-12 11:11:19 +02:00
rpm-traces.c
synth_event_gen_test.c tracing / synthetic: Disable events after testing in synth_event_gen_test_init() 2024-01-01 12:42:45 +00:00
trace_benchmark.c
trace_benchmark.h
trace_boot.c tracing/boot: Test strscpy() against less than zero for error 2023-07-05 10:30:49 -04:00
trace_branch.c
trace_btf.c tracing/probes: Fix to search structure fields correctly 2024-02-23 09:25:05 +01:00
trace_btf.h tracing/probes: Add a function to search a member of a struct/union 2023-08-23 09:40:16 +09:00
trace_clock.c
trace_dynevent.c
trace_dynevent.h
trace_entries.h tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure 2023-07-30 18:11:44 -04:00
trace_eprobe.c tracing/probes: Support BTF argument on module functions 2023-08-23 09:39:15 +09:00
trace_event_perf.c
trace_events_filter_test.h
trace_events_filter.c tracing: Have trace_event_file have ref counters 2023-11-08 11:56:21 +01:00
trace_events_hist.c tracing: Fix uaf issue when open the hist or hist_debug file 2024-01-20 11:51:43 +01:00
trace_events_inject.c tracing: Have event inject files inc the trace array ref count 2023-09-07 16:38:54 -04:00
trace_events_synth.c tracing/synthetic: Fix trace_string() return value 2024-02-23 09:25:05 +01:00
trace_events_trigger.c tracing/trigger: Fix to return error if failed to alloc snapshot 2024-02-23 09:24:54 +01:00
trace_events_user.c tracing/user_events: Align set_bit() address for all archs 2023-09-30 16:25:41 -04:00
trace_events.c tracing: hide unused ftrace_event_id_fops 2024-04-17 11:19:33 +02:00
trace_export.c tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure 2023-07-30 18:11:44 -04:00
trace_fprobe.c tracing: fprobe-event: Fix to check tracepoint event and return 2023-11-28 17:20:13 +00:00
trace_functions_graph.c function_graph: Support recording and printing the return value of function 2023-06-20 18:38:37 -04:00
trace_functions.c
trace_hwlat.c tracing: Remove extra space at the end of hwlat_detector/mode 2023-09-01 21:00:00 -04:00
trace_irqsoff.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-17 13:49:37 -04:00
trace_kdb.c
trace_kprobe_selftest.c tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
trace_kprobe_selftest.h
trace_kprobe.c tracing/kprobes: Fix the order of argument descriptions 2023-11-20 11:59:38 +01:00
trace_mmiotrace.c
trace_nop.c
trace_osnoise.c tracing/timerlat: Move hrtimer_init to timerlat_fd open() 2024-02-23 09:25:05 +01:00
trace_output.c tracing: Add size check when printing trace_marker output 2024-01-20 11:51:43 +01:00
trace_output.h tracing: Add "fields" option to show raw trace event fields 2023-03-29 06:52:08 -04:00
trace_preemptirq.c
trace_printk.c
trace_probe_kernel.h tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails 2023-07-14 17:04:58 +09:00
trace_probe_tmpl.h tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails 2023-07-14 17:04:58 +09:00
trace_probe.c tracing/probes: Fix to set arg size and fmt after setting type from BTF 2024-02-23 09:25:05 +01:00
trace_probe.h tracing/probes: Fix to show a parse error for bad type for $comm 2024-02-23 09:25:05 +01:00
trace_recursion_record.c
trace_sched_switch.c
trace_sched_wakeup.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-17 13:49:37 -04:00
trace_selftest_dynamic.c
trace_selftest.c tracing: Have function_graph selftest call cond_resched() 2023-05-28 21:15:46 -04:00
trace_seq.c tracing: Fix kernel-doc warnings in trace_seq.c 2023-07-28 19:59:04 -04:00
trace_stack.c
trace_stat.c
trace_stat.h
trace_synth.h
trace_syscalls.c tracing: bpf: use struct trace_entry in struct syscall_tp_t 2023-08-01 10:53:28 -07:00
trace_uprobe.c Probes updates for v6.6: 2023-09-02 11:10:50 -07:00
trace.c tracing: Use .flush() call to wake up readers 2024-04-03 15:28:41 +02:00
trace.h eventfs: Remove eventfs_file and just use eventfs_inode 2024-02-23 09:25:19 +01:00
tracing_map.c tracing: Ensure visibility when inserting an element into tracing_map 2024-01-31 16:19:01 -08:00
tracing_map.h tracing: Remove unused extern declaration tracing_map_set_field_descr() 2023-07-23 11:08:14 -04:00