twx-linux/kernel/trace
Linus Torvalds 8bf722c684 ring-buffer changes for v6.16:
- Allow the persistent ring buffer to be memory mapped
 
   In the last merge window there was issues with the implementation of
   mapping the persistent ring buffer because it was assumed that the
   persistent memory was just physical memory without being part of the
   kernel virtual address space. But this was incorrect and the persistent
   ring buffer can be mapped the same way as the allocated ring buffer is
   mapped.
 
   The meta data for the persistent ring buffer is different than the normal
   ring buffer and the organization of mapping it to user space is a little
   different. Make the updates needed to the meta data to allow the
   persistent ring buffer to be mapped to user space.
 
 - Fix cpus_read_lock() with buffer->mutex and cpu_buffer->mapping_lock
 
   Mapping the ring buffer to user space uses the cpu_buffer->mapping_lock.
   The buffer->mutex can be taken when the mapping_lock is held, giving the
   locking order of: cpu_buffer->mapping_lock -->> buffer->mutex. But there
   also exists the ordering:
 
       buffer->mutex -->> cpus_read_lock()
       mm->mmap_lock -->> cpu_buffer->mapping_lock
       cpus_read_lock() -->> mm->mmap_lock
 
   causing a circular chain of:
 
       cpu_buffer->mapping_lock -> buffer->mutex -->> cpus_read_lock() -->>
          mm->mmap_lock -->> cpu_buffer->mapping_lock
 
   By moving the cpus_read_lock() outside the buffer->mutex where:
   cpus_read_lock() -->> buffer->mutex, breaks the deadlock chain.
 
 - Do not trigger WARN_ON() for commit overrun
 
   When the ring buffer is user space mapped and there's a "commit overrun"
   (where an interrupt preempted an event, and then added so many events it
    filled the buffer having to drop events when it hit the preempted event)
   a WARN_ON() was triggered if this was read via a memory mapped buffer.
   This is due to "missed events" being non zero when the reader page ended
   up with the commit page. The idea was, if the writer is on the reader page,
   there's only one page that has been written to and there should be no
   missed events. But if a commit overrun is done where the writer is off the
   commit page and looped around to the commit page causing missed events, it
   is possible that the reader page is the commit page with missed events.
 
   Instead of triggering a WARN_ON() when the reader page is the commit page
   with missed events, trigger it when the reader page is the tail_page with
   missed events. That's because the writer is always on the tail_page if
   an event was interrupted (which holds the commit event) and continues off
   the commit page.
 
 - Reset the persistent buffer if it is fully consumed
 
   On boot up, if the user fully consumes the last boot buffer of the
   persistent buffer, if it reboots without enabling it, there will still be
   events in the buffer which can cause confusion. Instead, reset the buffer
   when it is fully consumed, so that the data is not read again.
 
 - Clean up some goto out jumps
 
   There's a few cases that the code jumps to the "out:" label that simply
   returns a value. There used to be more work done at those labels but now
   that they simply return a value use a return instead of jumping to a
   label.
 
 - Use guard() to simplify some of the code
 
   Add guard() around some locking instead of jumping to a label to do the
   unlocking.
 
 - Use free() to simplify some of the code
 
   Use free(kfree) on variables that will get freed on error and use
   return_ptr() to return the variable when its not freed. There's one
   instance where free(kfree) simplifies the code on a temp variable that was
   allocated just for the function use.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaDjJMxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkDzAP468AZOnjIxezfzYEmtcDl8ZUgf2U3I
 XtXjn7aKH/gZiwD/dCCZX2IY2gddqAb6s9Bo4/AWgtYbjacLPL+pWYbTJwQ=
 =DOfF
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:

 - Allow the persistent ring buffer to be memory mapped

   In the last merge window there was issues with the implementation of
   mapping the persistent ring buffer because it was assumed that the
   persistent memory was just physical memory without being part of the
   kernel virtual address space. But this was incorrect and the
   persistent ring buffer can be mapped the same way as the allocated
   ring buffer is mapped.

   The metadata for the persistent ring buffer is different than the
   normal ring buffer and the organization of mapping it to user space
   is a little different. Make the updates needed to the meta data to
   allow the persistent ring buffer to be mapped to user space.

 - Fix cpus_read_lock() with buffer->mutex and cpu_buffer->mapping_lock

   Mapping the ring buffer to user space uses the
   cpu_buffer->mapping_lock. The buffer->mutex can be taken when the
   mapping_lock is held, giving the locking order of:
   cpu_buffer->mapping_lock -->> buffer->mutex. But there also exists
   the ordering:

       buffer->mutex -->> cpus_read_lock()
       mm->mmap_lock -->> cpu_buffer->mapping_lock
       cpus_read_lock() -->> mm->mmap_lock

   causing a circular chain of:

       cpu_buffer->mapping_lock -> buffer->mutex -->> cpus_read_lock() -->>
          mm->mmap_lock -->> cpu_buffer->mapping_lock

   By moving the cpus_read_lock() outside the buffer->mutex where:
   cpus_read_lock() -->> buffer->mutex, breaks the deadlock chain.

 - Do not trigger WARN_ON() for commit overrun

   When the ring buffer is user space mapped and there's a "commit
   overrun" (where an interrupt preempted an event, and then added so
   many events it filled the buffer having to drop events when it hit
   the preempted event) a WARN_ON() was triggered if this was read via a
   memory mapped buffer.

   This is due to "missed events" being non zero when the reader page
   ended up with the commit page. The idea was, if the writer is on the
   reader page, there's only one page that has been written to and there
   should be no missed events.

   But if a commit overrun is done where the writer is off the commit
   page and looped around to the commit page causing missed events, it
   is possible that the reader page is the commit page with missed
   events.

   Instead of triggering a WARN_ON() when the reader page is the commit
   page with missed events, trigger it when the reader page is the
   tail_page with missed events. That's because the writer is always on
   the tail_page if an event was interrupted (which holds the commit
   event) and continues off the commit page.

 - Reset the persistent buffer if it is fully consumed

   On boot up, if the user fully consumes the last boot buffer of the
   persistent buffer, if it reboots without enabling it, there will
   still be events in the buffer which can cause confusion. Instead,
   reset the buffer when it is fully consumed, so that the data is not
   read again.

 - Clean up some goto out jumps

   There's a few cases that the code jumps to the "out:" label that
   simply returns a value. There used to be more work done at those
   labels but now that they simply return a value use a return instead
   of jumping to a label.

 - Use guard() to simplify some of the code

   Add guard() around some locking instead of jumping to a label to do
   the unlocking.

 - Use free() to simplify some of the code

   Use free(kfree) on variables that will get freed on error and use
   return_ptr() to return the variable when its not freed. There's one
   instance where free(kfree) simplifies the code on a temp variable
   that was allocated just for the function use.

* tag 'trace-ringbuffer-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Simplify functions with __free(kfree) to free allocations
  ring-buffer: Make ring_buffer_{un}map() simpler with guard(mutex)
  ring-buffer: Simplify ring_buffer_read_page() with guard()
  ring-buffer: Simplify reset_disabled_cpu_buffer() with use of guard()
  ring-buffer: Remove jump to out label in ring_buffer_swap_cpu()
  ring-buffer: Removed unnecessary if() goto out where out is the next line
  tracing: Reset last-boot buffers when reading out all cpu buffers
  ring-buffer: Allow reserve_mem persistent ring buffers to be mmapped
  ring-buffer: Do not trigger WARN_ON() due to a commit_overrun
  ring-buffer: Move cpus_read_lock() outside of buffer->mutex
2025-05-30 21:20:11 -07:00
..
rv rv: Fix out-of-bound memory access in rv_is_container_monitor() 2025-04-12 12:13:30 -04:00
blktrace.c traceevent/block: Add REQ_ATOMIC flag to block trace events 2025-05-23 09:18:48 -06:00
bpf_trace.c bpf: Fix error return value in bpf_copy_from_user_dynptr 2025-05-23 13:25:02 -07:00
bpf_trace.h
error_report-traces.c
fgraph.c ftrace: Show subops in enabled_functions 2025-05-08 09:36:08 -04:00
fprobe.c tracing: fprobe: Fix RCU warning message in list traversal 2025-05-10 08:28:02 +09:00
ftrace_internal.h
ftrace.c ftrace: Comment that ftrace_func_mapper is freed with free_ftrace_hash() 2025-05-08 09:36:08 -04:00
Kconfig ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS 2025-04-02 09:50:56 -04:00
kprobe_event_gen_test.c
Makefile
pid_list.c tracing: Cleanup upper_empty() in pid_list 2025-05-14 11:19:32 -04:00
pid_list.h
power-traces.c
preemptirq_delay_test.c
rethook.c
ring_buffer_benchmark.c ring-buffer: Use str_low_high() helper in ring_buffer_producer() 2024-10-19 11:12:25 -04:00
ring_buffer.c ring-buffer changes for v6.16: 2025-05-30 21:20:11 -07:00
rpm-traces.c
synth_event_gen_test.c
trace_benchmark.c
trace_benchmark.h
trace_boot.c
trace_branch.c tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field 2025-05-09 15:19:10 -04:00
trace_btf.c
trace_btf.h
trace_clock.c tracing: Use atomic64_inc_return() in trace_clock_counter() 2024-10-09 19:59:49 -04:00
trace_dynevent.c tracing: probes: Fix a possible race in trace_probe_log APIs 2025-05-13 22:23:34 +09:00
trace_dynevent.h tracing: probes: Fix a possible race in trace_probe_log APIs 2025-05-13 22:23:34 +09:00
trace_entries.h ftrace: Expose call graph depth as unsigned int 2025-05-08 09:36:08 -04:00
trace_eprobe.c tracing: add missing trace_probe_log_clear for eprobes 2025-05-10 08:44:50 +09:00
trace_event_perf.c perf: Remove unnecessary parameter of security check 2025-02-26 14:13:58 -05:00
trace_events_filter_test.h
trace_events_filter.c tracing: Fix filter string testing 2025-04-17 22:16:56 -04:00
trace_events_hist.c tracing: Rename event_trigger_alloc() to trigger_data_alloc() 2025-05-09 15:19:11 -04:00
trace_events_inject.c
trace_events_synth.c tracing: Do not add length to print format in synthetic events 2025-04-09 11:34:21 -04:00
trace_events_trigger.c tracing updates for v6.16: 2025-05-29 21:04:36 -07:00
trace_events_user.c tracing/user_events: Slightly simplify user_seq_show() 2025-03-06 13:35:27 -05:00
trace_events.c tracing: Add a helper function to handle the dereference arg in verifier 2025-05-09 15:19:11 -04:00
trace_export.c
trace_fprobe.c Probes fixes for v6.14: 2025-04-08 12:51:34 -07:00
trace_functions_graph.c ftrace: Do not disabled function graph based on "disabled" field 2025-05-09 15:19:10 -04:00
trace_functions.c tracing updates for v6.16: 2025-05-29 21:04:36 -07:00
trace_hwlat.c tracing: Remove TRACE_EVENT_FL_FILTERED logic 2024-10-08 15:24:49 -04:00
trace_irqsoff.c tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer 2025-05-09 15:19:10 -04:00
trace_kdb.c tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled 2025-05-09 15:18:47 -04:00
trace_kprobe_selftest.c
trace_kprobe_selftest.h
trace_kprobe.c tracing: probes: Fix a possible race in trace_probe_log APIs 2025-05-13 22:23:34 +09:00
trace_mmiotrace.c tracing/mmiotrace: Remove reference to unused per CPU data pointer 2025-05-08 09:36:09 -04:00
trace_nop.c
trace_osnoise.c tracing/osnoise: Allow arbitrarily long CPU string 2025-05-08 09:36:09 -04:00
trace_output.c tracing: Show preempt and irq events callsites from the offsets in field print 2025-05-06 11:34:52 -04:00
trace_output.h ftrace: Add print_function_args() 2025-03-04 11:27:23 -05:00
trace_preemptirq.c tracing: Fix archs that still call tracepoints without RCU watching 2024-12-05 09:28:58 -05:00
trace_printk.c
trace_probe_kernel.h
trace_probe_tmpl.h tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS 2024-12-26 10:50:04 -05:00
trace_probe.c tracing: probes: Fix a possible race in trace_probe_log APIs 2025-05-13 22:23:34 +09:00
trace_probe.h tracing: probe-events: Log error for exceeding the number of arguments 2025-03-27 21:19:54 +09:00
trace_recursion_record.c
trace_sched_switch.c pid: allow pid_max to be set per pid namespace 2024-12-02 11:25:25 +01:00
trace_sched_wakeup.c tracing: Convert the per CPU "disabled" counter to local from atomic 2025-05-09 15:19:10 -04:00
trace_selftest_dynamic.c
trace_selftest.c fgraph: Pass ftrace_regs to retfunc 2024-12-26 10:50:03 -05:00
trace_seq.c
trace_stack.c tracing updates for v6.16: 2025-05-29 21:04:36 -07:00
trace_stat.c tracing: Switch trace_stat.c code over to use guard() 2024-12-26 10:38:37 -05:00
trace_stat.h
trace_synth.h
trace_syscalls.c tracing/perf: Add might_fault check to syscall probes 2024-10-09 17:09:46 -04:00
trace_uprobe.c bpf-next-6.16 2025-05-28 15:52:42 -07:00
trace.c ring-buffer changes for v6.16: 2025-05-30 21:20:11 -07:00
trace.h tracing: Allow the top level trace_marker to write into another instances 2025-05-09 15:19:11 -04:00
tracing_map.c tracing: Fix cmp_entries_dup() to respect sort() comparison rules 2024-12-04 10:38:24 -05:00
tracing_map.h