linux-kernel - Re: [RFC][PATCH] irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 24 Jun 2010 16:01:43 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Andi Kleen <andi@...stfloor.org>, Borislav Petkov <bp@...64.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Huang Ying <ying.huang@...el.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Borislav Petkov <petkovbb@...glemail.com>,
	linux-kernel@...r.kernel.org, mauro@...e.hu
Subject: Re: [RFC][PATCH] irq_work

> Please, as Peter and Boris asked you already, quote a concrete, specific 
> example:

It was already in my answer to Peter.

> 
>   'Specific event X occurs, kernel wants/needs to do Y. This cannot be done
>    via the suggested method due to Z.'
> 
> Your generic arguments look wrong (to the extent they are specified) and it 
> makes it much easier and faster to address your points if you dont blur them 
> by vagaries.

It's one of the fundamental properties of recoverable errors.

Error happens.
Machine check or NMI or other exception happens. 
	That exception runs on the exception stack
	The error is not fatal, but recoverable.
For example you want to kill a process or call hwpoison or do some other
	recovery action. These generally have to sleep to do anything
	interesting.
You cannot do the sleeping on the exception stack, so you push it to
another context.

Now just because an error is recoverable doesn't mean it's not critical
(I think that was the mistake Boris made). If you don't do something
(like killing or recovery) you could end up in a loop or consume
corrupted data or something else bad. 

So the error has to have a fail safe path from detection to handling.

That's quite different from logging or performance counting etc.
where dropping events on overload is normal and expected.

Normally it can be only done by using dedicated resources.

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/