twx-linux/lib
Douglas Anderson 1f423c905a watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups.  Instead of using something
arch-specific we will use the buddy system, where each CPU watches out for
another one.  Specifically, each CPU will use its softlockup hrtimer to
check that the next CPU is processing hrtimer interrupts by verifying that
a counter is increasing.

NOTE: unlike the other hard lockup detectors, the buddy one can't easily
show what's happening on the CPU that locked up just by doing a simple
backtrace.  It relies on some other mechanism in the system to get
information about the locked up CPUs.  This could be support for NMI
backtraces like [1], it could be a mechanism for printing the PC of locked
CPUs at panic time like [2] / [3], or it could be something else.  Even
though that means we still rely on arch-specific code, this arch-specific
code seems to often be implemented even on architectures that don't have a
hardlockup detector.

This style of hardlockup detector originated in some downstream Android
trees and has been rebased on / carried in ChromeOS trees for quite a long
time for use on arm and arm64 boards.  Historically on these boards we've
leveraged mechanism [2] / [3] to get information about hung CPUs, but we
could move to [1].

Although the original motivation for the buddy system was for use on
systems without an arch-specific hardlockup detector, it can still be
useful to use even on systems that _do_ have an arch-specific hardlockup
detector.  On x86, for instance, there is a 24-part patch series [4] in
progress switching the arch-specific hard lockup detector from a scarce
perf counter to a less-scarce hardware resource.  Potentially the buddy
system could be a simpler alternative to free up the perf counter but
still get hard lockup detection.

Overall, pros (+) and cons (-) of the buddy system compared to an
arch-specific hardlockup detector (which might be implemented using
perf):
+ The buddy system is usable on systems that don't have an
  arch-specific hardlockup detector, like arm32 and arm64 (though it's
  being worked on for arm64 [5]).
+ The buddy system may free up scarce hardware resources.
+ If a CPU totally goes out to lunch (can't process NMIs) the buddy
  system could still detect the problem (though it would be unlikely
  to be able to get a stack trace).
+ The buddy system uses the same timer function to pet the hardlockup
  detector on the running CPU as it uses to detect hardlockups on
  other CPUs. Compared to other hardlockup detectors, this means it
  generates fewer interrupts and thus is likely better able to let
  CPUs stay idle longer.
- If all CPUs are hard locked up at the same time the buddy system
  can't detect it.
- If we don't have SMP we can't use the buddy system.
- The buddy system needs an arch-specific mechanism (possibly NMI
  backtrace) to get info about the locked up CPU.

[1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
[2] https://issuetracker.google.com/172213129
[3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
[4] https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calderon@linux.intel.com/
[5] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/

Link: https://lkml.kernel.org/r/20230519101840.v5.14.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 17:44:21 -07:00
..
842
crypto modules-6.4-rc1 2023-04-27 16:36:55 -07:00
dim linux/dim: Do nothing if no time delta between samples 2023-05-09 11:06:45 +02:00
fonts lib/fonts: fix undefined behavior in bit shift for get_default_font 2022-11-18 13:55:09 -08:00
kunit kunit: include debugfs header file 2023-06-09 17:44:16 -07:00
livepatch selftests/livepatch: better synchronize test_klp_callbacks_busy 2022-06-15 10:29:10 +02:00
lz4 lib: make LZ4_decompress_safe_forceExtDict() static 2022-07-17 17:31:39 -07:00
lzo lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t() 2022-07-29 18:12:34 -07:00
math math64: favor kernel-doc from header files 2022-11-21 14:30:53 -07:00
mpi lib/mpi: Fix buffer overrun when SG is too long 2023-01-06 17:15:46 +08:00
pldmfw lib: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:53 -07:00
raid6 for-6.2/block-2022-12-08 2022-12-13 10:43:59 -08:00
reed_solomon treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
test_fortify
vdso vdso: Improve cmd_vdso_check to check all dynamic relocations 2023-03-21 21:15:34 +01:00
xz
zlib_deflate lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH 2023-02-27 17:00:14 -08:00
zlib_dfltcc lib/zlib: remove redundation assignement of avail_in dfltcc_gdht() 2023-02-02 22:50:10 -08:00
zlib_inflate lib/zlib: Split deflate and inflate states for DFLTCC 2023-02-02 22:50:09 -08:00
zstd add intptr_t 2023-06-09 17:44:13 -07:00
.gitignore
argv_split.c
ashldi3.c
ashrdi3.c
asn1_decoder.c
asn1_encoder.c
assoc_array.c assoc_array: Fix BUG_ON during garbage collect 2022-06-01 18:29:06 -07:00
atomic64_test.c
atomic64.c
audit.c
base64.c lib/base64: RFC4648-compliant base64 encoding 2022-08-02 17:14:47 -06:00
bcd.c
bch.c
bitfield_kunit.c
bitmap.c lib/bitmap: remove bitmap_ord_to_pos 2022-09-26 12:19:12 -07:00
bitrev.c
bootconfig-data.S
bootconfig.c
bsearch.c
btree.c btree: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:54 -07:00
bucket_locks.c
bug.c cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG 2023-01-31 15:01:45 +01:00
build_OID_registry
buildid.c ELF: fix all "Elf" typos 2023-04-08 13:45:37 -07:00
bust_spinlocks.c kernel/panic: Drop unblank_screen call 2022-09-01 16:55:35 +02:00
check_signature.c
checksum.c
clz_ctz.c
clz_tab.c
cmdline_kunit.c treewide: use get_random_{u8,u16}() when possible, part 1 2022-10-11 17:42:58 -06:00
cmdline.c lib/cmdline: avoid page fault in next_arg 2022-09-11 21:55:06 -07:00
cmpdi2.c
compat_audit.c
cpu_rmap.c lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_add 2023-03-24 16:04:21 -07:00
cpumask_kunit.c cpumask: re-introduce constant-sized cpumask optimizations 2023-03-05 14:30:34 -08:00
cpumask.c lib/cpumask: update comment for cpumask_local_spread() 2023-02-07 18:20:00 -08:00
crc4.c
crc7.c
crc8.c
crc16.c
crc32.c
crc32defs.h
crc32test.c
crc64-rocksoft.c
crc64.c
crc-ccitt.c
crc-itu-t.c crc-itu-t: fix typo in CRC ITU-T polynomial comment 2022-06-07 10:27:38 +02:00
crc-t10dif.c
ctype.c
debug_info.c
debug_locks.c
debugobjects.c debugobjects: Don't wake up kswapd from fill_pool() 2023-05-22 14:52:58 +02:00
dec_and_lock.c perf: Fix perf_event_pmu_context serialization 2023-01-31 20:37:18 +01:00
decompress_bunzip2.c
decompress_inflate.c decompressor: provide missing prototypes 2023-06-09 17:44:17 -07:00
decompress_unlz4.c
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c decompressor: provide missing prototypes 2023-06-09 17:44:17 -07:00
decompress_unzstd.c decompressor: provide missing prototypes 2023-06-09 17:44:17 -07:00
decompress.c
devmem_is_allowed.c lib: devmem_is_allowed: include linux/io.h 2023-06-09 17:44:15 -07:00
devres.c devres: remove devm_ioremap_np 2022-09-01 18:04:43 +02:00
dhry_1.c lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry_2.c lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry_run.c lib: dhry: fix unstable smp_processor_id(_) usage 2023-03-23 17:18:35 -07:00
dhry.h lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
digsig.c
dump_stack.c
dynamic_debug.c dyndbg: use the module notifier callbacks 2023-03-09 12:58:36 -08:00
dynamic_queue_limits.c
earlycpio.c lib: move from strlcpy with unused retval to strscpy 2022-09-11 21:55:10 -07:00
errname.c printf: fix errname.c list 2023-02-15 15:44:43 +01:00
error-inject.c error-injection: remove EI_ETYPE_NONE 2023-02-02 22:50:00 -08:00
errseq.c
extable.c
fault-inject-usercopy.c
fault-inject.c fault-inject: allow configuration via configfs 2023-04-13 07:38:54 -06:00
fdt_addresses.c
fdt_empty_tree.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
fdt.c
find_bit_benchmark.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
find_bit.c cpumask: introduce for_each_cpu_or 2023-03-19 10:02:04 -07:00
flex_proportions.c flex_proportions: Disable preemption entering the write section. 2022-09-19 14:35:08 +02:00
fortify_kunit.c kunit/fortify: Validate __alloc_size attribute results 2022-11-22 21:08:28 -08:00
gen_crc32table.c
gen_crc64table.c
genalloc.c lib/genalloc: use try_cmpxchg in {set,clear}_bits_ll 2023-02-02 22:50:05 -08:00
generic-radix-tree.c
glob.c
globtest.c
group_cpus.c lib/group_cpus: Export group_cpus_evenly() 2023-04-21 03:02:31 -04:00
hashtable_test.c lib/hashtable_test.c: add test for the hashtable structure 2023-02-08 14:28:17 -07:00
hexdump.c
hweight.c
idr.c ida: don't use BUG_ON() for debugging 2022-07-10 13:55:49 -07:00
inflate.c
interval_tree_test.c
interval_tree.c interval-tree: Add a utility to iterate over spans in an interval tree 2022-11-29 16:34:15 -04:00
iomap_copy.c
iomap.c kmsan: add iomap support 2022-10-03 14:03:21 -07:00
iommu-helper.c
iov_iter.c mm: hwpoison: coredump: support recovery from dump_user_range() 2023-05-02 17:21:50 -07:00
irq_poll.c
irq_regs.c
is_signed_type_kunit.c lib: assume char is unsigned 2022-11-19 00:56:15 +01:00
is_single_threaded.c
kasprintf.c
Kconfig Kconfig: introduce HAS_IOPORT option and select it as necessary 2023-04-05 22:15:19 +02:00
Kconfig.debug watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs 2023-06-09 17:44:21 -07:00
Kconfig.kasan kasan: treat meminstrinsic as builtins in uninstrumented files 2023-03-02 21:54:22 -08:00
Kconfig.kcsan Kernel concurrency sanitizer (KCSAN) updates for v6.3 2023-02-25 13:02:20 -08:00
Kconfig.kfence
Kconfig.kgdb parisc: Convert PDC console to an early console 2022-10-11 12:01:24 +02:00
Kconfig.kmsan kmsan: make sure PREEMPT_RT is off 2022-11-08 15:57:24 -08:00
Kconfig.ubsan ubsan: disable UBSAN_DIV_ZERO for clang 2022-07-14 15:45:26 -07:00
kfifo.c
klist.c
kobject_uevent.c
kobject.c kobject: align stacktrace levels to logging message 2023-03-17 15:15:23 +01:00
kstrtox.c
kstrtox.h
libcrc32c.c libcrc32c: remove crc32c_impl 2023-04-17 18:01:23 +02:00
linear_ranges.c
list_debug.c lib/list_debug.c: Detect uninitialized lists 2022-06-16 19:58:20 -07:00
list_sort.c
list-test.c list: test: Test the klist structure 2023-03-31 09:21:35 -06:00
llist.c llist: avoid extra memory read in llist_add_batch 2022-11-18 13:55:06 -08:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-rtmutex.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
lockref.c lockref: stop doing cpu_relax in the cmpxchg loop 2023-01-13 14:35:38 -06:00
logic_iomem.c
logic_pio.c
lru_cache.c lru_cache: remove unused lc_private, lc_set, lc_index_of 2022-11-22 19:38:39 -07:00
lshrdi3.c
Makefile modules-6.4-rc1 2023-04-27 16:36:55 -07:00
maple_tree.c maple_tree: make maple state reusable after mas_empty_area() 2023-05-17 15:24:32 -07:00
memcat_p.c
memcpy_kunit.c kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST 2023-01-25 12:24:40 -08:00
memory-notifier-error-inject.c
memregion.c
memweight.c
muldi3.c
net_utils.c mac_pton: Don't access memory over expected length 2022-11-09 19:28:02 -08:00
netdev-notifier-error-inject.c
nlattr.c netlink: prevent potential spectre v1 gadgets 2023-01-20 17:52:32 -08:00
nmi_backtrace.c x86/nmi: Print reasons why backtrace NMIs are ignored 2023-01-19 15:55:12 -08:00
notifier-error-inject.c lib/notifier-error-inject: fix error when writing -errno to debugfs file 2022-11-30 16:13:16 -08:00
notifier-error-inject.h
objagg.c
of-reconfig-notifier-error-inject.c
oid_registry.c lib/oid_registry.c: remove redundant assignment to variable num 2022-11-18 13:55:06 -08:00
once.c once: rename _SLOW to _SLEEPABLE 2022-10-03 17:34:32 -07:00
overflow_kunit.c overflow: Introduce overflows_type() and castable_to_type() 2022-11-02 12:39:27 -07:00
packing.c lib: packing: remove MODULE_LICENSE in non-modules 2023-03-09 23:08:04 -08:00
parman.c
parser.c lib: parser: update documentation for match_NUMBER functions 2023-03-02 21:54:22 -08:00
pci_iomap.c
percpu_counter.c pcpcntr: remove percpu_counter_sum_all() 2023-03-19 10:02:04 -07:00
percpu_test.c
percpu-refcount.c percpu-refcount: Use call_rcu_hurry() for atomic switch 2022-11-30 13:16:40 -08:00
plist.c
pm-notifier-error-inject.c
polynomial.c
radix-tree.c lib/radix-tree.c: fix uninitialized variable compilation warning 2022-11-30 16:13:17 -08:00
random32.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
ratelimit.c ratelimit: Fix data-races in ___ratelimit(). 2022-08-24 13:46:57 +01:00
rbtree_test.c
rbtree.c lib/rbtree: use '+' instead of '|' for setting color. 2023-04-18 16:39:33 -07:00
rcuref.c atomics: Provide rcuref - scalable reference counting 2023-03-28 10:39:29 +02:00
ref_tracker.c
refcount.c
rhashtable.c rhashtable: Allow rhashtable to be used from irq-safe contexts 2022-12-09 10:42:56 +00:00
sbitmap.c sbitmap: correct wake_batch recalculation to avoid potential IO hung 2023-01-29 20:03:01 -07:00
scatterlist.c lib/scatterlist: Fix to calculate the last_pg properly 2023-01-16 12:08:31 -04:00
seq_buf.c seq_buf: Add seq_buf_do_printk() helper 2023-04-25 21:03:14 -04:00
sg_pool.c lib/sg_pool: change module_init(sg_pool_init) to subsys_initcall 2022-09-23 16:46:19 +02:00
sg_split.c
show_mem.c lib/show_mem.c: use for_each_populated_zone() simplify code 2023-04-21 14:52:02 -07:00
siphash_kunit.c siphash: Convert selftest to KUnit 2022-11-01 10:04:52 -07:00
siphash.c SPDX changes for 5.19-rc1 2022-06-03 10:34:34 -07:00
slub_kunit.c linux-kselftest-kunit-next-6.2-rc1 2022-12-12 16:42:57 -08:00
smp_processor_id.c lib/smp_processor_id: fix imbalanced instrumentation_end() call 2022-07-17 17:31:41 -07:00
sort.c
stackdepot.c lib/stackdepot: kmsan: mark API outputs as initialized 2023-03-28 16:20:13 -07:00
stackinit_kunit.c kernel/range: Uplevel the cxl subsystem's range_contains() helper 2023-02-10 17:32:37 -08:00
stmp_device.c
string_helpers.c lib/string_helpers: Introduce parse_int_array_user() 2022-09-05 14:51:46 +01:00
string.c lib/string: Use strchr() in strpbrk() 2023-01-27 11:42:57 -08:00
strncpy_from_user.c
strnlen_user.c
strscpy_kunit.c fortify: Short-circuit known-safe calls to strscpy() 2022-11-01 10:04:52 -07:00
syscall.c
test_bitmap.c lib/bitmap: add tests for for_each() loops 2022-10-01 10:22:58 -07:00
test_bitops.c
test_bits.c
test_blackhole_dev.c
test_bpf.c net: remove skb->vlan_present 2022-11-11 18:18:05 -08:00
test_debug_virtual.c
test_dynamic_debug.c dyndbg: test DECLARE_DYNDBG_CLASSMAP, sysfs nodes 2022-09-07 17:04:49 +02:00
test_firmware.c test_firmware: Use kstrtobool() instead of strtobool() 2023-01-20 14:09:47 +01:00
test_fprobe.c tracing updates for 6.4: 2023-04-28 15:57:53 -07:00
test_fpu.c
test_free_pages.c lib/test_free_pages.c: pass a pointer to virt_to_page() 2022-07-17 17:14:36 -07:00
test_hash.c
test_hexdump.c treewide: use get_random_u32_inclusive() when possible 2022-11-18 02:18:02 +01:00
test_hmm_uapi.h hmm-tests: add test for migrate_device_range() 2022-10-12 18:51:50 -07:00
test_hmm.c hmm-tests: add test for migrate_device_range() 2022-10-12 18:51:50 -07:00
test_ida.c
test_kmod.c test_kmod: stop kernel-doc warnings 2023-01-25 14:07:21 -08:00
test_kprobes.c test_kprobes: Add recursed kprobe test case 2023-02-21 08:52:42 +09:00
test_linear_ranges.c lib/test_linear_ranges: Use LINEAR_RANGE() 2022-11-16 13:32:32 +00:00
test_list_sort.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
test_lockup.c
test_maple_tree.c test_maple_tree: add more testing for mas_empty_area() 2023-03-23 17:18:32 -07:00
test_memcat_p.c
test_meminit.c lib/test_meminit: add checks for the allocation functions 2022-10-12 18:51:49 -07:00
test_min_heap.c treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
test_module.c
test_objagg.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
test_parman.c
test_printf.c mm, printk: introduce new format %pGt for page_type 2023-03-28 16:20:09 -07:00
test_ref_tracker.c
test_rhashtable.c Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
test_scanf.c
test_sort.c
test_static_key_base.c
test_static_keys.c
test_string.c
test_sysctl.c testing: use the copyleft-next-0.3.1 SPDX tag 2022-11-08 15:44:02 +01:00
test_ubsan.c
test_user_copy.c
test_uuid.c
test_vmalloc.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
test_xarray.c
test-kstrtox.c
test-string_helpers.c lib/test-string_helpers: replace UNESCAPE_ANY by UNESCAPE_ALL_MASK 2023-04-08 13:45:39 -07:00
textsearch.c
timerqueue.c
trace_readwrite.c asm-generic/io: Add _RET_IP_ to MMIO trace for more accurate debug info 2022-11-21 22:02:10 +01:00
ts_bm.c lib/ts_bm.c: remove redundant store to variable consumed after addition 2022-07-17 17:31:39 -07:00
ts_fsm.c
ts_kmp.c
ubsan.c hardening updates for v6.3-rc1 2023-02-21 11:07:23 -08:00
ubsan.h arm64: Support Clang UBSAN trap codes for better reporting 2023-02-08 15:26:58 -08:00
ucmpdi2.c
ucs2_string.c
usercopy.c uaccess: Add speculation barrier to copy_from_user() 2023-02-21 14:45:22 -08:00
uuid.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
vsprintf.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
win_minmax.c lib/win_minmax: use /* notation for regular comments 2023-01-11 16:14:21 -08:00
xarray.c mm/huge_memory: Fix xarray node memory leak 2022-06-09 16:24:25 -04:00
xxhash.c