lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 8 Mar 2017 11:28:29 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Mike Travis <mike.travis@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <hpa@...or.com>, Don Zickus <dzickus@...hat.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Dimitri Sivanich <dimitri.sivanich@....com>,
        Frank Ramsay <frank.ramsay@....com>,
        Russ Anderson <russ.anderson@....com>,
        Tony Ernst <tony.ernst@....com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] x86/platform: Add a low priority low frequency NMI
 call chain


* Mike Travis <mike.travis@....com> wrote:

> 
> 
> On 3/6/2017 11:42 PM, Ingo Molnar wrote:
> > 
> > * Mike Travis <mike.travis@....com> wrote:
> > 
> >> Add a new NMI call chain that is called last after all other NMI handlers
> >> have been checked and did not "handle" the NMI.  This mimics the current
> >> NMI_UNKNOWN call chain except it eliminates the WARNING message about
> >> multiple NMI handlers registering on this call chain.
> >>
> >> This call chain dramatically lowers the NMI call frequency when high
> >> frequency NMI tools are in use, notably the perf tools.  It is required
> >> for NMI handlers that cannot sustain a high NMI call rate without
> >> ramifications to the system operability.
> > 
> > So how about we just turn off that warning instead? I don't remember the last time 
> > it actually _helped_ us find any kernel or hardware bug - and it has caused tons 
> > of problems...
> 
> I can do that, with an even simpler patch...
> 
> > 
> > It's not like we warn about excess regular IRQs either - we either handle them or 
> > at most increase a counter somewhere. We could do the same for NMIs: introduce a 
> > counter somewhere that counts the number of seemingly unhandled NMIs.
> 
> Really "unknown" NMI errors are reported by either the "dazed and
> confused" message or if the panic on unknown nmi is set, then the
> system will panic.  So unknown NMI occurrences are already being
> dealt with.

So I'd even remove the 'dazed and confused' message - has it ever helped us?

If NMIs are generated but not handled properly then developers and users will 
notice it just like when IRQs are lost: either through bad system behavior or via 
weird stats in procfs. The kernel log should not get spammed.

So if you could expose the lost NMI stats via procfs or debugfs then we could 
remove both the warning and the dazed-and-confused spam on the system log.

This should make perf all around more usable on UV systems, right?

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ