twx-linux/include
Preeti U Murthy 345527b1ed clockevents: Fix cpu_down() race for hrtimer based broadcasting
It was found when doing a hotplug stress test on POWER, that the
machine either hit softlockups or rcu_sched stall warnings.  The
issue was traced to commit:

  7cba160ad789 ("powernv/cpuidle: Redesign idle states management")

which exposed the cpu_down() race with hrtimer based broadcast mode:

  5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")

The race is the following:

Assume CPU1 is the CPU which holds the hrtimer broadcasting duty
before it is taken down.

	CPU0					CPU1

	cpu_down()				take_cpu_down()
						disable_interrupts()

	cpu_die()

	while (CPU1 != CPU_DEAD) {
		msleep(100);
		switch_to_idle();
		stop_cpu_timer();
		schedule_broadcast();
	}

	tick_cleanup_cpu_dead()
		take_over_broadcast()

So after CPU1 disabled interrupts it cannot handle the broadcast
hrtimer anymore, so CPU0 will be stuck forever.

Fix this by explicitly taking over broadcast duty before cpu_die().

This is a temporary workaround. What we really want is a callback
in the clockevent device which allows us to do that from the dying
CPU by pushing the hrtimer onto a different cpu. That might involve
an IPI and is definitely more complex than this immediate fix.

Changelog was picked up from:

    https://lkml.org/lkml/2015/2/16/213

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: nicolas.pitre@linaro.org
Cc: peterz@infradead.org
Cc: rjw@rjwysocki.net
Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html
Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com
[ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 14:25:39 +02:00
..
acpi Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2015-02-19 11:28:36 -08:00
asm-generic OK, this has the big virtio 1.0 implementation, as specified by OASIS. 2015-02-18 09:24:01 -08:00
clocksource
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2015-02-14 09:47:01 -08:00
drm drm/ttm: device address space != CPU address space 2015-03-05 09:04:39 +10:00
dt-bindings ARM: dts: am43xx: fix SLEWCTRL_FAST pinctrl binding 2015-03-06 09:21:03 -08:00
keys
kvm arm/arm64: KVM: Keep elrsr/aisr in sync with software model 2015-03-14 13:42:07 +01:00
linux clockevents: Fix cpu_down() race for hrtimer based broadcasting 2015-04-02 14:25:39 +02:00
math-emu
media
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2015-03-22 16:57:07 -04:00
pcmcia
ras
rdma
rxrpc
scsi Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-block 2015-02-12 14:13:23 -08:00
soc pm: at91: Workaround DDRSDRC self-refresh bug with LPDDR1 memories. 2015-03-03 19:43:59 +01:00
sound
target target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0 2015-03-19 23:26:46 -07:00
trace regmap: introduce regmap_name to fix syscon regmap trace events 2015-03-19 20:04:55 +00:00
uapi Not entirely surprising: the ongoing QEMU work on virtio 1.0 has revealed 2015-03-17 10:36:01 -07:00
video OMAPDSS: fix regression with display sysfs files 2015-02-26 10:23:15 +02:00
xen xen: Remove trailing semicolon from xenbus_register_frontend() definition 2015-03-02 10:38:59 +00:00
Kbuild