twx-linux/kernel
Frederic Weisbecker f55acb1e44 timers/migration: Fix endless timer requeue after idle interrupts
When a CPU is an idle migrator, but another CPU wakes up before it,
becomes an active migrator and handles the queue, the initial idle
migrator may end up endlessly reprogramming its clockevent, chasing ghost
timers forever such as in the following scenario:

               [GRP0:0]
             migrator = 0
             active   = 0
             nextevt  = T1
              /         \
             0           1
          active        idle (T1)

0) CPU 1 is idle and has a timer queued (T1), CPU 0 is active and is
the active migrator.

               [GRP0:0]
             migrator = NONE
             active   = NONE
             nextevt  = T1
              /         \
             0           1
          idle        idle (T1)
          wakeup = T1

1) CPU 0 is now idle and is therefore the idle migrator. It has
programmed its next timer interrupt to handle T1.

                [GRP0:0]
             migrator = 1
             active   = 1
             nextevt  = KTIME_MAX
              /         \
             0           1
          idle        active
          wakeup = T1

2) CPU 1 has woken up, it is now active and it has just handled its own
timer T1.

3) CPU 0 gets a timer interrupt to handle T1 but tmigr_handle_remote()
realize it is not the migrator anymore. So it early returns without
observing that T1 has been expired already and therefore without
updating its ->wakeup value.

4) CPU 0 goes into tmigr_cpu_new_timer() which also early returns
because it doesn't queue a timer of its own. So ->wakeup is left
unchanged and the next timer is programmed to fire now.

5) goto 3) forever

This results in timer interrupt storms in idle and also in nohz_full (as
observed in rcutorture's TREE07 scenario).

Fix this with forcing a re-evaluation of tmc->wakeup while trying
remote timer handling when the CPU isn't the migrator anymmore. The
check is inherently racy but in the worst case the CPU just races setting
the KTIME_MAX value that a remote expiry also tries to set.

Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240318230729.15497-2-frederic@kernel.org
2024-03-19 10:14:55 +01:00
..
bpf bpf: enforce types for __arg_ctx-tagged arguments in global subprogs 2024-01-17 20:20:06 -08:00
cgroup Driver core changes for 6.8-rc1 2024-01-18 09:48:40 -08:00
configs hardening: Provide Kconfig fragments for basic options 2023-09-22 09:50:55 -07:00
debug kdb: Fix a potential buffer overflow in kdb_local() 2024-01-17 17:19:06 +00:00
dma dma-mapping fixes for Linux 6.8 2024-01-18 16:49:34 -08:00
entry entry: Move syscall_enter_from_user_mode() to header file 2023-12-21 23:12:18 +01:00
events uprobes: use pagesize-aligned virtual address when replacing pages 2024-01-25 23:52:20 -08:00
futex futex: Prevent the reuse of stale pi_state 2024-01-19 12:58:17 +01:00
gcov gcov: annotate struct gcov_iterator with __counted_by 2023-10-18 14:43:22 -07:00
irq genirq: Initialize resend_node hlist for all interrupt descriptors 2024-01-24 14:15:41 +01:00
kcsan mm: delete checks for xor_unlock_is_negative_byte() 2023-10-18 14:34:17 -07:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-09-20 11:24:18 +02:00
locking RCU pull request for v6.8 2024-01-12 16:35:58 -08:00
module Modules changes for v6.8-rc1 2024-01-10 18:00:18 -08:00
power PM: hibernate: Repair excess function parameter description warning 2023-12-20 19:19:26 +01:00
printk TTY/Serial changes for 6.7-rc1 2023-11-03 15:44:25 -10:00
rcu rcu: Defer RCU kthreads wakeup when CPU is dying 2024-01-24 22:46:17 +05:30
sched tick: Assume timekeeping is correctly handed over upon last offline idle call 2024-02-26 11:37:32 +01:00
time timers/migration: Fix endless timer requeue after idle interrupts 2024-03-19 10:14:55 +01:00
trace Probes fixes for v6.8-rc4: 2024-02-17 07:59:47 -08:00
.gitignore
acct.c fs: rename __mnt_{want,drop}_write*() helpers 2023-09-11 15:05:50 +02:00
async.c header cleanups for 6.8 2024-01-10 16:43:55 -08:00
audit_fsnotify.c
audit_tree.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-14 17:34:27 -05:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2023-11-12 22:33:49 -05:00
audit.h audit: correct audit_filter_inodes() definition 2023-07-21 12:17:25 -04:00
auditfilter.c audit: move trailing statements to next line 2023-08-15 18:16:14 -04:00
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-13 18:34:46 +02:00
backtracetest.c
bounds.c
capability.c lsm: constify the 'target' parameter in security_capget() 2023-08-08 16:48:47 -04:00
cfi.c
compat.c
configs.c
context_tracking.c
cpu_pm.c
cpu.c tick: Assume timekeeping is correctly handed over upon last offline idle call 2024-02-26 11:37:32 +01:00
crash_core.c kernel/crash_core.c: make __crash_hotplug_lock static 2024-01-12 15:20:47 -08:00
crash_dump.c
cred.c cred: get rid of CONFIG_DEBUG_CREDENTIALS 2023-12-15 14:19:48 -08:00
delayacct.c
dma.c
exec_domain.c
exit.c exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) 2024-02-07 21:20:33 -08:00
exit.h exit: add internal include file with helpers 2023-09-21 12:03:50 -06:00
extable.c
fail_function.c
fork.c exec: Distinguish in_execve from in_exec 2024-01-24 11:48:52 -08:00
freezer.c Linux 6.7-rc6 2023-12-23 15:52:13 +01:00
gen_kheaders.sh
groups.c groups: Convert group_info.usage to refcount_t 2023-09-29 11:28:39 -07:00
hung_task.c
iomem.c kernel/iomem.c: remove __weak ioremap_cache helper 2023-08-21 13:37:28 -07:00
irq_work.c
jump_label.c
kallsyms_internal.h
kallsyms_selftest.c Modules changes for v6.6-rc1 2023-08-29 17:32:32 -07:00
kallsyms_selftest.h
kallsyms.c kallsyms: Change func signature for cleanup_symbol_name() 2023-08-25 15:00:36 -07:00
kcmp.c file: convert to SLAB_TYPESAFE_BY_RCU 2023-10-19 11:02:48 +02:00
Kconfig.freezer
Kconfig.hz
Kconfig.kexec kexec: select CRYPTO from KEXEC_FILE instead of depending on it 2023-12-20 13:46:19 -08:00
Kconfig.locks
Kconfig.preempt
kcov.c
kexec_core.c kexec: do syscore_shutdown() in kernel_kexec 2024-01-12 15:20:47 -08:00
kexec_elf.c
kexec_file.c kexec_file: fix incorrect temp_start value in locate_mem_hole_top_down() 2023-12-29 12:22:25 -08:00
kexec_internal.h
kexec.c kernel: kexec: copy user-array safely 2023-10-09 16:59:47 +10:00
kheaders.c
kprobes.c kprobes: Remove unnecessary initial values of variables 2024-02-08 23:29:29 +09:00
ksyms_common.c
ksysfs.c crash: hotplug support for kexec_load() 2023-08-24 16:25:14 -07:00
kthread.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
latencytop.c
Makefile kernel/numa.c: Move logging out of numa.h 2023-12-20 19:26:30 -05:00
module_signature.c
notifier.c
nsproxy.c nsproxy: Convert nsproxy.count to refcount_t 2023-08-21 11:29:12 -07:00
numa.c kernel/numa.c: Move logging out of numa.h 2023-12-20 19:26:30 -05:00
padata.c padata: Fix refcnt handling in padata_free_shell() 2023-10-27 18:04:24 +08:00
panic.c panic: use atomic_try_cmpxchg in panic() and nmi_panic() 2023-10-04 10:41:56 -07:00
params.c params: Fix multi-line comment style 2023-12-01 09:51:44 -08:00
pid_namespace.c wait: Remove uapi header file from main header file 2023-12-20 19:26:31 -05:00
pid_sysctl.h memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy 2023-08-21 13:37:59 -07:00
pid.c file: remove __receive_fd() 2023-12-12 14:24:14 +01:00
profile.c
ptrace.c Quite a lot of kexec work this time around. Many singleton patches in 2024-01-09 11:46:20 -08:00
range.c
reboot.c Thermal control updates for 6.8-rc1 2024-01-09 16:20:17 -08:00
regset.c
relay.c kernel: relay: remove relay_file_splice_read dead code, doesn't work 2023-12-29 12:22:27 -08:00
resource_kunit.c
resource.c Quite a lot of kexec work this time around. Many singleton patches in 2024-01-09 11:46:20 -08:00
rseq.c
scftorture.c scftorture: Pause testing after memory-allocation failure 2023-07-14 15:02:57 -07:00
scs.c
seccomp.c file: remove __receive_fd() 2023-12-12 14:24:14 +01:00
signal.c kernel/signal.c: simplify force_sig_info_to_task(), kill recalc_sigpending_and_wake() 2023-12-10 17:21:32 -08:00
smp.c CSD lock commits for v6.7 2023-10-30 17:56:53 -10:00
smpboot.c kthread: add kthread_stop_put 2023-10-04 10:41:57 -07:00
smpboot.h
softirq.c sched/core: introduce sched_core_idle_cpu() 2023-07-13 15:21:50 +02:00
stackleak.c
stacktrace.c stacktrace: fix kernel-doc typo 2023-12-29 12:22:29 -08:00
static_call_inline.c
static_call.c
stop_machine.c
sys_ni.c lsm/stable-6.8 PR 20240105 2024-01-09 12:57:46 -08:00
sys.c getrusage: use sig->stats_lock rather than lock_task_sighand() 2024-02-07 21:20:32 -08:00
sysctl-test.c
sysctl.c asm-generic updates for v6.7 2023-11-01 15:28:33 -10:00
task_work.c task_work: add kerneldoc annotation for 'data' argument 2023-09-19 13:21:32 -07:00
taskstats.c taskstats: fill_stats_for_tgid: use for_each_thread() 2023-10-04 10:41:57 -07:00
torture.c torture: Print out torture module parameters 2023-09-24 17:24:01 +02:00
tracepoint.c
tsacct.c
ucount.c sysctl: Add size to register_sysctl 2023-08-15 15:26:17 -07:00
uid16.c
uid16.h
umh.c sysctl: fix unused proc_cap_handler() function warning 2023-06-29 15:19:43 -07:00
up.c smp: Change function signatures to use call_single_data_t 2023-09-13 14:59:24 +02:00
user_namespace.c mnt_idmapping: decouple from namespaces 2023-11-28 14:08:47 +01:00
user-return-notifier.c
user.c binfmt_misc: enable sandboxed mounts 2023-10-11 08:46:01 -07:00
usermode_driver.c
utsname_sysctl.c
utsname.c
vhost_task.c
watch_queue.c watch_queue: fix kcalloc() arguments order 2023-12-21 13:17:54 +01:00
watchdog_buddy.c watchdog/hardlockup: move SMP barriers from common code to buddy code 2023-06-19 16:25:28 -07:00
watchdog_perf.c
watchdog.c watchdog: if panicking and we dumped everything, don't re-enable dumping 2023-12-29 12:22:30 -08:00
workqueue_internal.h workqueue: Drop the special locking rule for worker->flags and worker_pool->flags 2023-08-07 15:57:22 -10:00
workqueue.c workqueue: Use global variant for add_timer() 2024-02-22 17:52:30 +01:00