linux-kernel - Re: [RESEND][PATCH v3] watchdog: Fix disable/enable regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <874njh7nd5.fsf@nemi.mork.no>
Date:	Wed, 19 Dec 2012 22:17:42 +0100
From:	Bjørn Mork <bjorn@...k.no>
To:	Don Zickus <dzickus@...hat.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Norbert Warmuth <nwarmuth@...nline.de>,
	Joseph Salisbury <joseph.salisbury@...onical.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RESEND][PATCH v3] watchdog: Fix disable/enable regression

Don Zickus <dzickus@...hat.com> writes:
> On Wed, Dec 19, 2012 at 08:51:31PM +0100, Bjørn Mork wrote:
>> commit 8d451690 ("watchdog: Fix CPU hotplug regression") cause
>> an oops or hard lockup when doing
>> 
>>  echo 0 > /proc/sys/kernel/nmi_watchdog
>>  echo 1 > /proc/sys/kernel/nmi_watchdog
>> 
>> and the kernel is booted with nmi_watchdog=1 (default)
>> 
>> Running laptop-mode-tools and disconnecting/connecting AC power
>> will cause this to trigger, making it a common failure scenario
>> on laptops.
>> 
>> Instead of bailing out of watchdog_disable() when !watchdog_enabled
>> we can initialize the hrtimer regardless of watchdog_enabled status.
>> This makes it safe to call watchdog_disable() in the nmi_watchdog=0
>> case, without the negative effect on the enabled => disabled =>
>> enabled case.
>> 
>> All these tests pass with this patch:
>> - nmi_watchdog=1
>>   echo 0 > /proc/sys/kernel/nmi_watchdog
>>   echo 1 > /proc/sys/kernel/nmi_watchdog
>> 
>> - nmi_watchdog=0
>>   echo 0 > /sys/devices/system/cpu/cpu1/online
>> 
>> - nmi_watchdog=0
>>   echo mem > /sys/power/state
>
> What about the opposite cases?  
> nmi_watchdog=1
> echo 1 > /sys/devices/system/cpu/cpu1/online

I don't see why not. But verifying it would be nice.  I thought that it
would be a simple thing to test using qemu-kvm, but it seems that the
CPU hotplugging support there isn't quite ready. The guest just dies
with "Assertion `bus->allow_hotplug' failed."

I'll go digging for alternatives, but if anyone else could verify this
then I'd appreciate it.

> Are we going to leak memory by re-initing hrtimer?  Or is that caught
> somewhere?

There are no allocations in hrtimer_init() AFAICS? It just initializes
the preallocated hrtimer struct.

> Other than that, the patch seems reasonable to me.  Basically just forcing
> the init of hrtimer so there is something to cancel on the disable side.
> Then again, I don't fully understand the original problem.

I believe the original problem was calling hrtimer_cancel on an
uninitialized hrtimer.  That is likely to blow up here:

static inline
void unlock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
{
        raw_spin_unlock_irqrestore(&timer->base->cpu_base->lock, *flags);
}


Note that the lock_hrtimer_base() function protects access to
timer->base with a NULL test.  That should probably be done in unlock as
well, for symmetry?  But this is another issue and not at all urgent.




Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/