lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130912172731.GR3966@linux.vnet.ibm.com>
Date:	Thu, 12 Sep 2013 10:27:31 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Mike Travis <travis@....com>, Paul Mackerras <paulus@...ba.org>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Jason Wessel <jason.wessel@...driver.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Dimitri Sivanich <sivanich@....com>,
	Hedi Berriche <hedi@....com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler

On Tue, Sep 10, 2013 at 11:03:49AM +0200, Peter Zijlstra wrote:
> On Mon, Sep 09, 2013 at 10:07:03AM -0700, Mike Travis wrote:
> > On 9/9/2013 5:43 AM, Peter Zijlstra wrote:
> > > On Thu, Sep 05, 2013 at 05:50:41PM -0500, Mike Travis wrote:
> > >> For performance reasons, the NMI handler may be disabled to lessen the
> > >> performance impact caused by the multiple perf tools running concurently.
> > >> If the system nmi command is issued when the UV NMI handler is disabled,
> > >> the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
> > >> disabled by setting the nmi disabled variable to '1'.  Setting it back to
> > >> '0' will re-enable the NMI handler.
> > > 
> > > I'm not entirely sure why this is still needed now that you've moved all
> > > really expensive bits into the UNKNOWN handler.
> > > 
> > 
> > Yes, it could be considered optional.  My primary use was to isolate
> > new bugs I found to see if my NMI changes were causing them.  But it
> > appears that they are not since the problems occur with or without
> > using the NMI entry into KDB.  So it can be safely removed.
> 
> OK, as a debug option it might make sense, but removing it is (of course)
> fine with me ;-)
> 
> > (The basic problem is that if you hang out in KDB too long the machine
> > locks up.  
> 
> Yeah, known issue. Not much you can do about it either I suspect. The
> system generally isn't build for things like that.
> 
> > Other problems like the rcu stall detector does not have a
> > means to be "touched" like the nmi_watchdog_timer so it fires off a
> > few to many, many messages.  
> 
> That however might be easily cured if you ask Paul nicely ;-)

RCU's grace-period mechanism is supposed to be what touches it.  ;-)

But what is it that you are looking for?  If you want to silence it
completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what
you want to use.

> > Another, any network connections will time
> > out if you are in KDB more than say 20 or 30 seconds.)

Ah, you are looking for RCU to refrain from complaining about grace
periods that have been delayed by breakpoints in the kernel?  Is there
some way that RCU can learn that a breakpoint has happened?  If so,
this should not be hard.

If not, I must fall back on the rcu_cpu_stall_suppress that I mentioned
earlier.

> > One other problem is with the perf tool.  It seems running more than
> > about 2 or 3 perf top instances on a medium (1k cpu threads) sized
> > system, they start behaving badly with a bunch of NMI stackdumps
> > appearing on the console.  Eventually the system become unusable.
> 
> Yuck.. I haven't seen anything like that on the 'tiny' systems I have :/

Indeed, with that definition of "medium", large must be truly impressive!

							Thanx, Paul

> > On a large system (4k), the perf tools get an error message (sorry
> > don't have it handy at the moment) the basically implies that the
> > perf config option is not set.  Again, I wanted to remove the new
> > NMI handler to insure that it wasn't doing something weird, and
> > it wasn't.
> 
> Cute.. 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ