twx-linux/kernel
Linus Torvalds 8bf722c684 ring-buffer changes for v6.16:
- Allow the persistent ring buffer to be memory mapped
 
   In the last merge window there was issues with the implementation of
   mapping the persistent ring buffer because it was assumed that the
   persistent memory was just physical memory without being part of the
   kernel virtual address space. But this was incorrect and the persistent
   ring buffer can be mapped the same way as the allocated ring buffer is
   mapped.
 
   The meta data for the persistent ring buffer is different than the normal
   ring buffer and the organization of mapping it to user space is a little
   different. Make the updates needed to the meta data to allow the
   persistent ring buffer to be mapped to user space.
 
 - Fix cpus_read_lock() with buffer->mutex and cpu_buffer->mapping_lock
 
   Mapping the ring buffer to user space uses the cpu_buffer->mapping_lock.
   The buffer->mutex can be taken when the mapping_lock is held, giving the
   locking order of: cpu_buffer->mapping_lock -->> buffer->mutex. But there
   also exists the ordering:
 
       buffer->mutex -->> cpus_read_lock()
       mm->mmap_lock -->> cpu_buffer->mapping_lock
       cpus_read_lock() -->> mm->mmap_lock
 
   causing a circular chain of:
 
       cpu_buffer->mapping_lock -> buffer->mutex -->> cpus_read_lock() -->>
          mm->mmap_lock -->> cpu_buffer->mapping_lock
 
   By moving the cpus_read_lock() outside the buffer->mutex where:
   cpus_read_lock() -->> buffer->mutex, breaks the deadlock chain.
 
 - Do not trigger WARN_ON() for commit overrun
 
   When the ring buffer is user space mapped and there's a "commit overrun"
   (where an interrupt preempted an event, and then added so many events it
    filled the buffer having to drop events when it hit the preempted event)
   a WARN_ON() was triggered if this was read via a memory mapped buffer.
   This is due to "missed events" being non zero when the reader page ended
   up with the commit page. The idea was, if the writer is on the reader page,
   there's only one page that has been written to and there should be no
   missed events. But if a commit overrun is done where the writer is off the
   commit page and looped around to the commit page causing missed events, it
   is possible that the reader page is the commit page with missed events.
 
   Instead of triggering a WARN_ON() when the reader page is the commit page
   with missed events, trigger it when the reader page is the tail_page with
   missed events. That's because the writer is always on the tail_page if
   an event was interrupted (which holds the commit event) and continues off
   the commit page.
 
 - Reset the persistent buffer if it is fully consumed
 
   On boot up, if the user fully consumes the last boot buffer of the
   persistent buffer, if it reboots without enabling it, there will still be
   events in the buffer which can cause confusion. Instead, reset the buffer
   when it is fully consumed, so that the data is not read again.
 
 - Clean up some goto out jumps
 
   There's a few cases that the code jumps to the "out:" label that simply
   returns a value. There used to be more work done at those labels but now
   that they simply return a value use a return instead of jumping to a
   label.
 
 - Use guard() to simplify some of the code
 
   Add guard() around some locking instead of jumping to a label to do the
   unlocking.
 
 - Use free() to simplify some of the code
 
   Use free(kfree) on variables that will get freed on error and use
   return_ptr() to return the variable when its not freed. There's one
   instance where free(kfree) simplifies the code on a temp variable that was
   allocated just for the function use.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaDjJMxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkDzAP468AZOnjIxezfzYEmtcDl8ZUgf2U3I
 XtXjn7aKH/gZiwD/dCCZX2IY2gddqAb6s9Bo4/AWgtYbjacLPL+pWYbTJwQ=
 =DOfF
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:

 - Allow the persistent ring buffer to be memory mapped

   In the last merge window there was issues with the implementation of
   mapping the persistent ring buffer because it was assumed that the
   persistent memory was just physical memory without being part of the
   kernel virtual address space. But this was incorrect and the
   persistent ring buffer can be mapped the same way as the allocated
   ring buffer is mapped.

   The metadata for the persistent ring buffer is different than the
   normal ring buffer and the organization of mapping it to user space
   is a little different. Make the updates needed to the meta data to
   allow the persistent ring buffer to be mapped to user space.

 - Fix cpus_read_lock() with buffer->mutex and cpu_buffer->mapping_lock

   Mapping the ring buffer to user space uses the
   cpu_buffer->mapping_lock. The buffer->mutex can be taken when the
   mapping_lock is held, giving the locking order of:
   cpu_buffer->mapping_lock -->> buffer->mutex. But there also exists
   the ordering:

       buffer->mutex -->> cpus_read_lock()
       mm->mmap_lock -->> cpu_buffer->mapping_lock
       cpus_read_lock() -->> mm->mmap_lock

   causing a circular chain of:

       cpu_buffer->mapping_lock -> buffer->mutex -->> cpus_read_lock() -->>
          mm->mmap_lock -->> cpu_buffer->mapping_lock

   By moving the cpus_read_lock() outside the buffer->mutex where:
   cpus_read_lock() -->> buffer->mutex, breaks the deadlock chain.

 - Do not trigger WARN_ON() for commit overrun

   When the ring buffer is user space mapped and there's a "commit
   overrun" (where an interrupt preempted an event, and then added so
   many events it filled the buffer having to drop events when it hit
   the preempted event) a WARN_ON() was triggered if this was read via a
   memory mapped buffer.

   This is due to "missed events" being non zero when the reader page
   ended up with the commit page. The idea was, if the writer is on the
   reader page, there's only one page that has been written to and there
   should be no missed events.

   But if a commit overrun is done where the writer is off the commit
   page and looped around to the commit page causing missed events, it
   is possible that the reader page is the commit page with missed
   events.

   Instead of triggering a WARN_ON() when the reader page is the commit
   page with missed events, trigger it when the reader page is the
   tail_page with missed events. That's because the writer is always on
   the tail_page if an event was interrupted (which holds the commit
   event) and continues off the commit page.

 - Reset the persistent buffer if it is fully consumed

   On boot up, if the user fully consumes the last boot buffer of the
   persistent buffer, if it reboots without enabling it, there will
   still be events in the buffer which can cause confusion. Instead,
   reset the buffer when it is fully consumed, so that the data is not
   read again.

 - Clean up some goto out jumps

   There's a few cases that the code jumps to the "out:" label that
   simply returns a value. There used to be more work done at those
   labels but now that they simply return a value use a return instead
   of jumping to a label.

 - Use guard() to simplify some of the code

   Add guard() around some locking instead of jumping to a label to do
   the unlocking.

 - Use free() to simplify some of the code

   Use free(kfree) on variables that will get freed on error and use
   return_ptr() to return the variable when its not freed. There's one
   instance where free(kfree) simplifies the code on a temp variable
   that was allocated just for the function use.

* tag 'trace-ringbuffer-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Simplify functions with __free(kfree) to free allocations
  ring-buffer: Make ring_buffer_{un}map() simpler with guard(mutex)
  ring-buffer: Simplify ring_buffer_read_page() with guard()
  ring-buffer: Simplify reset_disabled_cpu_buffer() with use of guard()
  ring-buffer: Remove jump to out label in ring_buffer_swap_cpu()
  ring-buffer: Removed unnecessary if() goto out where out is the next line
  tracing: Reset last-boot buffers when reading out all cpu buffers
  ring-buffer: Allow reserve_mem persistent ring buffers to be mmapped
  ring-buffer: Do not trigger WARN_ON() due to a commit_overrun
  ring-buffer: Move cpus_read_lock() outside of buffer->mutex
2025-05-30 21:20:11 -07:00
..
bpf bpf-next-6.16 2025-05-28 15:52:42 -07:00
cgroup ARM: 2025-05-29 08:10:01 -07:00
configs Networking changes for 6.16. 2025-05-28 15:24:36 -07:00
debug TTY/Serial driver updates for 6.15-rc1 2025-04-02 18:17:33 -07:00
dma dma-mapping updates for Linux 6.16: 2025-05-27 20:09:06 -07:00
entry entry: Inline syscall_exit_to_user_mode() 2025-04-29 08:27:10 +02:00
events perf: Only dump the throttle log for the leader 2025-05-21 13:57:43 +02:00
futex futex: Correct the kernedoc return value for futex_wait_setup(). 2025-05-21 13:57:41 +02:00
gcov gcov: clang: use correct function param names 2025-01-24 22:47:27 -08:00
irq Updates for the MSI subsystem (core code and PCI): 2025-05-27 08:15:26 -07:00
kcsan treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack() 2025-05-08 19:49:33 +02:00
livepatch sched,livepatch: Untangle cond_resched() and live-patching 2025-05-14 13:16:24 +02:00
locking Locking changes for v6.16: 2025-05-26 14:42:07 -07:00
module 22 hotfixes. 13 are cc:stable and the remainder address post-6.14 issues 2025-05-25 07:48:35 -07:00
power Power management updates for 6.16-rc1 2025-05-27 16:48:47 -07:00
printk printk changes for 6.15 2025-03-27 19:22:24 -07:00
rcu Another set of timer API cleanups: 2025-05-27 08:31:21 -07:00
sched bpf-next-6.16 2025-05-28 15:52:42 -07:00
time Updates for the time/timer core code: 2025-05-27 09:04:15 -07:00
trace ring-buffer changes for v6.16: 2025-05-30 21:20:11 -07:00
.gitignore
acct.c acct: block access to kernel internal filesystems 2025-02-12 12:24:16 +01:00
async.c
audit_fsnotify.c
audit_tree.c fsnotify: create a wrapper fsnotify_find_inode_mark() 2024-04-04 16:24:16 +02:00
audit_watch.c fs: add kern_path_locked_negative() 2025-04-15 11:32:34 +02:00
audit.c audit: record AUDIT_ANOM_* events regardless of presence of rules 2025-04-11 14:14:41 -04:00
audit.h audit: change context data from secid to lsm_prop 2024-10-11 14:34:16 -04:00
auditfilter.c audit: fix suffixed '/' filename matching 2024-12-05 19:22:38 -05:00
auditsc.c fs: dedup handling of struct filename init and refcounts bumps 2025-03-18 15:34:27 +01:00
backtracetest.c backtracetest: add MODULE_DESCRIPTION() 2024-06-24 22:24:55 -07:00
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-04-29 08:29:29 -07:00
capability.c capability: Remove unused has_capability 2025-03-07 22:03:09 -06:00
cfi.c Modules changes for 6.15-rc1 2025-03-30 15:44:36 -07:00
compat.c
configs.c
context_tracking.c context_tracking: Make RCU watch ct_kernel_exit_state() warning 2025-03-04 18:44:29 -08:00
cpu_pm.c
cpu.c perf: Remove too early and redundant CPU hotplug handling 2025-05-08 21:50:19 +02:00
crash_core.c crash: Use note name macros 2025-02-10 16:56:58 -08:00
crash_reserve.c crash: remove an unused argument from reserve_crashkernel_generic() 2025-03-16 22:30:47 -07:00
cred.c cred: remove old {override,revert}_creds() helpers 2024-12-02 11:25:09 +01:00
delayacct.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
dma.c
elfcorehdr.c
exec_domain.c
exit.c Scheduler updates for v6.16: 2025-05-26 15:19:58 -07:00
exit.h
extable.c
fail_function.c
fork.c Locking changes for v6.16: 2025-05-26 14:42:07 -07:00
freezer.c sched/fair: Fix external p->on_rq users 2024-10-14 09:14:35 +02:00
gen_kheaders.sh Revert "kheaders: Ignore silly-rename files" 2025-03-15 21:22:52 +09:00
groups.c
hung_task.c hung_task: show the blocker task if the task is hung on mutex 2025-03-21 22:10:04 -07:00
iomem.c mm/memremap: Pass down MEMREMAP_* flags to arch_memremap_wb() 2025-02-21 15:05:38 +01:00
irq_work.c kasan: make kasan_record_aux_stack_noalloc() the default behaviour 2025-01-13 22:40:36 -08:00
jump_label.c jump_label: Use RCU in all users of __module_text_address(). 2025-03-10 11:54:46 +01:00
kallsyms_internal.h kallsyms: get rid of code for absolute kallsyms 2024-07-20 16:33:21 +09:00
kallsyms_selftest.c kallsyms: Use kthread_run_on_cpu() 2025-01-02 22:12:12 +01:00
kallsyms_selftest.h
kallsyms.c kallsyms: Remove KALLSYMS_ABSOLUTE_PERCPU 2025-02-18 10:16:04 +01:00
kcmp.c kcmp: improve performance adding an unlikely hint to task comparisons 2025-02-21 10:25:33 +01:00
Kconfig.freezer
Kconfig.hz kernel: Fix "select" wording on HZ_250 description 2025-02-21 09:20:30 +01:00
Kconfig.kexec crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 2024-11-14 22:43:48 -08:00
Kconfig.locks
Kconfig.preempt sched: No PREEMPT_RT=y for all{yes,mod}config 2024-11-07 15:25:05 +01:00
kcov.c kcov: mark in_softirq_really() as __always_inline 2024-12-30 17:59:08 -08:00
kexec_core.c kexec: define functions to map and unmap segments 2025-04-29 15:54:53 -04:00
kexec_elf.c kexec: initialize ELF lowest address to ULONG_MAX 2025-03-16 22:30:47 -07:00
kexec_file.c ima: kexec: move IMA log copy from kexec load to execute 2025-04-29 15:54:54 -04:00
kexec_internal.h kexec: use atomic_try_cmpxchg_acquire() in kexec_trylock() 2024-09-01 20:43:23 -07:00
kexec.c crash: add a new kexec flag for hotplug support 2024-04-23 14:59:01 +10:00
kheaders.c kheaders: Simplify attribute through __BIN_ATTR_SIMPLE_RO() 2024-12-24 09:46:49 +01:00
kprobes.c kprobes: Use RCU in all users of __module_text_address(). 2025-03-10 11:54:46 +01:00
ksyms_common.c
ksysfs.c kernel/ksysfs.c: simplify bin_attribute definition 2025-01-07 16:59:15 +01:00
kthread.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
latencytop.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
Makefile tracing: Disable branch profiling in noinstr code 2025-03-22 09:49:26 +01:00
module_signature.c
notifier.c reboot: move reboot_notifier_list to kernel/reboot.c 2024-11-05 17:12:31 -08:00
nsproxy.c kernel/nsproxy: remove unnecessary guards 2025-05-09 13:13:54 +02:00
padata.c padata: do not leak refcount in reorder_work 2025-05-19 13:44:16 +08:00
panic.c panic: Move panic ctl tables into panic.c 2025-04-09 13:32:16 +02:00
params.c module: ensure that kobject_put() is safe for module type kobjects 2025-05-07 20:24:59 +02:00
pid_namespace.c pid: Do not set pid_max in new pid namespaces 2025-03-06 10:18:36 +01:00
pid_sysctl.h treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
pid.c pidfs: detect refcount bugs 2025-05-06 13:59:00 +02:00
profile.c profiling: remove profile=sleep support 2024-08-04 13:36:28 -07:00
ptrace.c
range.c
reboot.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
regset.c regset: use kvzalloc() for regset_get_alloc() 2024-04-25 21:07:03 -07:00
relay.c relay: use kasprintf() instead of fixed buffer formatting 2025-03-21 22:10:05 -07:00
resource_kunit.c resource, kunit: fix user-after-free in resource_test_region_intersects() 2024-10-09 12:47:19 -07:00
resource.c resource: replace open coded variant of DEFINE_RES() 2025-03-21 22:10:05 -07:00
rseq.c rseq: Fix segfault on registration when rseq_cs is non-zero 2025-03-06 22:26:49 +01:00
scftorture.c scftorture: Handle NULL argument passed to scf_add_to_free_list(). 2024-11-14 16:09:51 -08:00
scs.c
seccomp.c seccomp: avoid the lock trip seccomp_filter_release in common case 2025-02-24 11:17:10 -08:00
signal.c signal: Move signal ctl tables into signal.c 2025-04-09 13:32:16 +02:00
smp.c CSD-lock pull request for v6.14 2025-01-28 11:34:03 -08:00
smpboot.c
smpboot.h
softirq.c lockdep: Fix wait context check on softirq for PREEMPT_RT 2025-03-25 10:46:44 +01:00
stackleak.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
stacktrace.c
static_call_inline.c Modules changes for 6.15-rc1 2025-03-30 15:44:36 -07:00
static_call.c
stop_machine.c stop-machine: Add comment for rcu_momentary_eqs() 2025-03-11 10:15:52 -07:00
sys_ni.c Probes updates for v6.11: 2024-07-18 12:19:20 -07:00
sys.c futex: Add basic infrastructure for local task local hash 2025-05-03 12:02:07 +02:00
sysctl-test.c sysctl: move u8 register test to lib/test_sysctl.c 2025-04-14 14:13:41 +02:00
sysctl.c sparc: mv sparc sysctls into their own file under arch/sparc/kernel 2025-04-09 13:32:16 +02:00
task_work.c kasan: make kasan_record_aux_stack_noalloc() the default behaviour 2025-01-13 22:40:36 -08:00
taskstats.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
torture.c torture: Add get_torture_init_jiffies() for test-start time 2025-02-05 07:14:24 -08:00
tracepoint.c tracepoint: Print the function symbol when tracepoint_debug is set 2025-03-21 15:30:10 -04:00
tsacct.c tsacct: replace strncpy() with strscpy() 2024-07-12 16:39:53 -07:00
ucount.c ucount: use rcuref_t for reference counting 2025-03-16 22:30:50 -07:00
uid16.c
uid16.h
umh.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
up.c
user_namespace.c uidgid: add map_id_range_up() 2025-02-12 12:12:27 +01:00
user-return-notifier.c
user.c uidgid: make sure we fit into one cacheline 2024-09-12 12:16:09 +02:00
usermode_driver.c
utsname_sysctl.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
utsname.c
vhost_task.c vhost_task: fix vhost_task_create() documentation 2025-04-18 10:08:11 -04:00
vmcore_info.c mm: support only one page_type per page 2024-09-03 21:15:43 -07:00
watch_queue.c vfs-6.15-rc1.pipe 2025-03-24 09:52:37 -07:00
watchdog_buddy.c
watchdog_perf.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
watchdog.c A treewide hrtimer timer cleanup 2025-03-25 10:54:15 -07:00
workqueue_internal.h
workqueue.c workqueue: Changes for v6.16 2025-05-27 20:49:06 -07:00