linux-kernel - Re: [PATCH] i386: improve double fault handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080728134252.GI5515@elte.hu>
Date:	Mon, 28 Jul 2008 15:42:52 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Jan Beulich <jbeulich@...ell.com>
Cc:	Andi Kleen <andi@...stfloor.org>, tglx@...utronix.de,
	linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Joerg Roedel <joro@...tes.org>
Subject: Re: [PATCH] i386: improve double fault handling

* Jan Beulich <jbeulich@...ell.com> wrote:

> >>> Ingo Molnar <mingo@...e.hu> 21.07.08 13:05 >>>
> >this still doesnt apply to latest -git. (or tip/master)
> 
> Indeed, tip/master had a __pa -> __phys_addr_const conversion that I 
> now sync-ed the patch with (without another round of testing):
> 
> Make the double fault handler use CPU-specific stacks. Add some 
> abstraction to simplify future change of other exception handlers to 
> go through task gates. Add a new notification of the event through the 
> die notifier chain, also providing some environmental adjustments so 
> that various infrastructural things work independent of the fact that 
> the fault and the callbacks are running on other then the normal 
> kernel stack.
> 
> Signed-Off-By: Jan Beulich <jbeulich@...ell.com>
> Cc: Andi Kleen <andi@...stfloor.org>
> 
> ---
>  arch/x86/kernel/cpu/common.c     |   17 +++++--
>  arch/x86/kernel/doublefault_32.c |   86 ++++++++++++++++++++++++---------------
>  arch/x86/kernel/smpboot.c        |   44 +++++++++++++++++++
>  arch/x86/kernel/traps_32.c       |   51 ++++++++++++++++++++++-
>  drivers/lguest/segments.c        |    3 -
>  include/asm-x86/kdebug.h         |    1 
>  include/asm-x86/processor.h      |    7 ++-
>  include/asm-x86/segment.h        |   15 ++++--
>  include/asm-x86/thread_info_32.h |    9 +++-
>  9 files changed, 187 insertions(+), 46 deletions(-)

I dont know.

All CPUs hitting a double fault simultaneously and corrupting each 
others' kernel stack is a theoretical possibility - but is handling it 
worth the complexity? It appears to me that a lock plus a short stub 
function that takes the lock (with no stack usage) would handle that 
much better.

Also, you seem to be setting things up to turn NMIs and MCEs into task 
gates too, right?

So i'm really uneasy about all this. Breakage in such rarely used code 
gets found very late, and has thus a high risk of losing debug 
information when we need it the most. (i.e. it works in the exact 
_opposite_ way of the intented goal of making things more robust - it 
makes things less robust)

Firstly, 64-bit does not use a task gate for double faults anymore. (but 
uses a separate IST stack for double faults)

Secondly, task gates are really a relic that should not be proliferated. 
Besides the complications in virtualized environments (if more common 
things like Big Real Mode are not supported well in virtual mode what do 
we expect of more esoteric features as task gates?) it does not get 
nearly as much testing on real silicon as other, more mainstream CPU 
features.

Thirdly, NMI based profiling is quite common, so by turning NMIs into 
task gates we'd slow that down quite a lot.

Also, the change to doublefault_fn is quite ugly - that inner block 
should be split out into a separate function.

Plus the notifier - why do we care about that? It's not like we can 
sanely kexec into a safe kernel from double faulting kernels in most 
cases. In real cases where i've seen double faults it was due to us 
corrupting kernel pagetables - kexec has no chance there. To recover 
from that we'd have to set up the TSS with a safe(r) cr3 as well - but 
your patch leaves _that_ untouched. (nor do we want to waste extra 
unswappable memory on such remote possibilities i think)

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/