[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874njh7nd5.fsf@nemi.mork.no>
Date: Wed, 19 Dec 2012 22:17:42 +0100
From: Bjørn Mork <bjorn@...k.no>
To: Don Zickus <dzickus@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org,
Norbert Warmuth <nwarmuth@...nline.de>,
Joseph Salisbury <joseph.salisbury@...onical.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RESEND][PATCH v3] watchdog: Fix disable/enable regression
Don Zickus <dzickus@...hat.com> writes:
> On Wed, Dec 19, 2012 at 08:51:31PM +0100, Bjørn Mork wrote:
>> commit 8d451690 ("watchdog: Fix CPU hotplug regression") cause
>> an oops or hard lockup when doing
>>
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>>
>> and the kernel is booted with nmi_watchdog=1 (default)
>>
>> Running laptop-mode-tools and disconnecting/connecting AC power
>> will cause this to trigger, making it a common failure scenario
>> on laptops.
>>
>> Instead of bailing out of watchdog_disable() when !watchdog_enabled
>> we can initialize the hrtimer regardless of watchdog_enabled status.
>> This makes it safe to call watchdog_disable() in the nmi_watchdog=0
>> case, without the negative effect on the enabled => disabled =>
>> enabled case.
>>
>> All these tests pass with this patch:
>> - nmi_watchdog=1
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>>
>> - nmi_watchdog=0
>> echo 0 > /sys/devices/system/cpu/cpu1/online
>>
>> - nmi_watchdog=0
>> echo mem > /sys/power/state
>
> What about the opposite cases?
> nmi_watchdog=1
> echo 1 > /sys/devices/system/cpu/cpu1/online
I don't see why not. But verifying it would be nice. I thought that it
would be a simple thing to test using qemu-kvm, but it seems that the
CPU hotplugging support there isn't quite ready. The guest just dies
with "Assertion `bus->allow_hotplug' failed."
I'll go digging for alternatives, but if anyone else could verify this
then I'd appreciate it.
> Are we going to leak memory by re-initing hrtimer? Or is that caught
> somewhere?
There are no allocations in hrtimer_init() AFAICS? It just initializes
the preallocated hrtimer struct.
> Other than that, the patch seems reasonable to me. Basically just forcing
> the init of hrtimer so there is something to cancel on the disable side.
> Then again, I don't fully understand the original problem.
I believe the original problem was calling hrtimer_cancel on an
uninitialized hrtimer. That is likely to blow up here:
static inline
void unlock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
{
raw_spin_unlock_irqrestore(&timer->base->cpu_base->lock, *flags);
}
Note that the lock_hrtimer_base() function protects access to
timer->base with a NULL test. That should probably be done in unlock as
well, for symmetry? But this is another issue and not at all urgent.
Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists