linux-kernel - Re: [RFC][PATCH] irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100624154124.GA6647@aftab>
Date:	Thu, 24 Jun 2010 17:41:24 +0200
From:	Borislav Petkov <bp@...64.org>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Borislav Petkov <bp@...64.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Huang Ying <ying.huang@...el.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Borislav Petkov <petkovbb@...glemail.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mauro@...e.hu" <mauro@...e.hu>
Subject: Re: [RFC][PATCH] irq_work

From: Andi Kleen <andi@...stfloor.org>
Date: Thu, Jun 24, 2010 at 10:01:43AM -0400

> > Please, as Peter and Boris asked you already, quote a concrete, specific 
> > example:
> 
> It was already in my answer to Peter.
> 
> > 
> >   'Specific event X occurs, kernel wants/needs to do Y. This cannot be done
> >    via the suggested method due to Z.'
> > 
> > Your generic arguments look wrong (to the extent they are specified) and it 
> > makes it much easier and faster to address your points if you dont blur them 
> > by vagaries.
> 
> It's one of the fundamental properties of recoverable errors.
> 
> Error happens.
> Machine check or NMI or other exception happens. 
> 	That exception runs on the exception stack
> 	The error is not fatal, but recoverable.
> For example you want to kill a process or call hwpoison or do some other
> 	recovery action. These generally have to sleep to do anything
> 	interesting.
> You cannot do the sleeping on the exception stack, so you push it to
> another context.
> 
> Now just because an error is recoverable doesn't mean it's not critical
> (I think that was the mistake Boris made).

It wasn't a mistake - I was simply trying to lure you into giving a more
concrete example so that we all land on the same page and we know what
the heck you/we/all are talking about.

> If you don't do something
> (like killing or recovery) you could end up in a loop or consume
> corrupted data or something else bad. 
> 
> So the error has to have a fail safe path from detection to handling.

So we are talking about a more involved and "could-sleep" error
recovery.

> That's quite different from logging or performance counting etc.
> where dropping events on overload is normal and expected.

So I went back and reread the whole thread, and correct me if I'm
wrong but the whole run softirq after NMI has one use case for now -
"could-sleep" error handling for MCEs _only_ on x86. So you're changing
a bunch of generic and x86 kernel code just for error handling. Hmm,
that's a kinda big hammer in my book.

A slimmer solution is a much better way to go, IMHO. I think Peter said
something about irq_exit(), which should be just fine.

But AFAICT an arch-specific solution would be even better, e.g.
if you call into your deferred work helper from paranoid_exit in
<arch/x86/kernel/entry_64.S>. I.e, something like

#ifdef CONFIG_X86_MCE
testl $_TIF_NEED_POST_NMI,%ebx
jnz do_post_nmi_work
#endif

Or even slimmer, rewrite the paranoidzeroentry to a MCE-specific variant
which does the added functionality. But that wouldn't be extensible if
other entities want post-NMI work later.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/