lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090624105223.GO6760@one.firstfloor.org>
Date:	Wed, 24 Jun 2009 12:52:23 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	David Miller <davem@...emloft.net>
Cc:	andi@...stfloor.org, linux-kernel@...r.kernel.org,
	sparclinux@...r.kernel.org
Subject: Re: NMI watchdog + NOHZ question

On Wed, Jun 24, 2009 at 03:32:33AM -0700, David Miller wrote:
> From: Andi Kleen <andi@...stfloor.org>
> Date: Wed, 24 Jun 2009 12:23:25 +0200
> 
> >> And similarly to sparc64, if that 5+ second qla2xxx interrupt
> >> sequence happens after the tick_nohz_stop_sched_tick() call
> >> we can run into the same situation.
> > 
> > Yes it would be probably safer to do the tick disabling with
> > interrupts off already.
> 
> That only makes sense if you're really putting the cpu to sleep
> until an interrupt or similar happens.

That is what the idle loop is supposed to do, isn't it?

> > These days NMI watchdog is not used much on x86 anymore because it's 
> > default off, so probably people never noticed that.
> 
> I really didn't want to provide the feature that way on sparc64 which
> is why I made it on by default.  It would be interesting to reconsider
> x86's default, perhaps even only on a trial basis in -next.

The reason it was turned off is that there are a few systems (e.g.
laptops from a particular vendor) which don't handle NMIs correctly
in the platform. When the NMI happens while SMI is active
they hang. Also there were a few other strange problems
on other systems that went away when it was disabled.

One way to handle all that would be to have a big NMI white/black
list for specific systems. That would be useful because there are
a few cases where NMIs are really useful: one example right now
is panic which is currently unable to stop other CPUs not
enabling interrupts.

But creating and maintaining such a list would be a lot of 
work (at least initially), and so far nobody was interested
enough to do that.

When you don't have as many different platforms and vendors
things are a lot easier.

> 
> It's so useful, and in the short time sparc64 has had this NMI code I
> can count at least 8 bugs I've fixed only because it was on all the
> time.

Yes when it was still on it also found bugs. On the other hand once
it is default one the number of new bugs you find with it goes
down quite fast.

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ