lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100804140021.GN3353@redhat.com>
Date:	Wed, 4 Aug 2010 10:00:21 -0400
From:	Don Zickus <dzickus@...hat.com>
To:	Robert Richter <robert.richter@....com>
Cc:	Lin Ming <ming.m.lin@...el.com>, Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>,
	Cyrill Gorcunov <gorcunov@...il.com>,
	"fweisbec@...il.com" <fweisbec@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Huang, Ying" <ying.huang@...el.com>
Subject: Re: A question of perf NMI handler

On Wed, Aug 04, 2010 at 12:01:16PM +0200, Robert Richter wrote:
> There is no general mechanism for recording the NMI source (except if
> it was external triggered, e.g. by the southbridge). Also, all nmis
> are mapped to NMI vector 2 and therefore there is no way to find out
> the reason by using apic mask registers.

This is no different than a shared interrupt, no?  All the nmi handlers
need to check their own sources to see if they triggered it.  You can't
expect the generic nmi handler to determine this.

> 
> Now, if multiple perfctrs trigger an nmi, it may happen that a handler
> has nothing to do because the counter was already handled by the
> previous one. Thus, it is valid to have unhandled nmis caused by
> perfctrs.
> 
> So, with counters enabled we always have to return stop for *all* nmis
> as we cannot detect that it was an perfctr nmi. Otherwise we could
> trigger an unhandled nmi. To ensure that all other nmi handlers are
> called, the perfctr's nmi handler must have the lowest priority. Then,
> the handler will be the last in the chain.

But the cases this break are, external NMI buttons, broken firmware that
causes SERRs on the PCI bus, and any other general hardware failures.

So what the perf handler does is really unacceptable.  The only reason we
are noticing this now is because I put the nmi_watchdog on top of the perf
subsystem, so it always has a user and will trigger NOTIFY_STOP.  Before,
it never had a registerd user so instead returned NOTIFY_DONE and
everything worked great.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ