linux-kernel - Re: [PATCH] x86 NMI: Be smarter about invoking panic() inside NMI handler.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120327160601.GA19273@redhat.com>
Date:	Tue, 27 Mar 2012 12:06:01 -0400
From:	Don Zickus <dzickus@...hat.com>
To:	"Andrei E. Warkentin" <andrey.warkentin@...il.com>
Cc:	linux-kernel@...r.kernel.org, kgdb-bugreport@...ts.sourceforge.net,
	jason.wessel@...driver.com
Subject: Re: [PATCH] x86 NMI: Be smarter about invoking panic() inside NMI
 handler.

On Tue, Mar 20, 2012 at 01:57:41PM -0400, Andrei E. Warkentin wrote:
> Hi,
> 
> 2012/3/1 Andrei Warkentin <andrey.warkentin@...il.com>:
> > If two (or more) unknown NMIs arrive on different CPUs, there
> > is a large chance both CPUs will wind up inside panic(). This
> > is fine, unless you want to enter KDB - KDB cannot round up
> > all CPUs, because some of them are stuck inside
> > panic_smp_self_stop with NMI latched. This is
> > easy to replicate with QEMU. Boot with -smp 4 and
> > send NMI using the monitor.
> >
> > Solution for this - attempt to enter panic() from NMI
> > handler. If panic() is already active in the system,
> > just exit out of the NMI handler. This lets KDB round
> > up CPUs.
> >
> > Signed-off-by: Andrei Warkentin <andrey.warkentin@...il.com>
> > ---
> 
> Any feedback on this? Who are the right maintainers to bug about this?

Hmm, if try_panic fails, then the cpu continues on executing code.  This
might further corrupt an already broken system.  So I don't think this
patch will work as is.

Perhaps instead of panic'ing in the NMI context, we use irq_work and panic
in an interrupt context instead.  We still get the system to stop (though
it might still execute some interrupts) and it will be out of the NMI
context.

However, you will still run into a similar problem when in the
panic/reboot case we shutdown all the remote cpus and have them sitting in
a similar cpu_relax loop in the NMI context, while the panic'ing cpu
cleans things up.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/