lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFy2vKrXKo8Q=UU7AB5FujqS83Cb5E1gSjMFPOoom1X6sA@mail.gmail.com>
Date:	Wed, 19 Nov 2014 18:42:24 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Don Zickus <dzickus@...hat.com>, Dave Jones <davej@...hat.com>,
	"the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: frequent lockups in 3.18rc4

On Wed, Nov 19, 2014 at 5:16 PM, Andy Lutomirski <luto@...capital.net> wrote:
>
> And you were calling me crazy? :)

Hey, I'm crazy like a fox.

> We could be restarting just about anything if that happens. Except
> that if we double-faulted on a trap gate entry instead of an interrupt
> gate entry, then we can't restart, and, unless we can somehow decode
> the error code usefully (it's woefully undocumented), int 0x80 and
> int3 might be impossible to handle correctly if it double-faults.  And
> please don't suggest moving int 0x80 to an IST stack :)

No, no.  So tell me if this won't work:

 - when forking a new process, make sure we allocate the vmalloc stack
*before* we copy the vm

 - this should guarantee that all new processes will at least have its
*own* stack always in its page tables, since vmalloc always fills in
the page table of the current page tables of the thread doing the
vmalloc.

HOWEVER, that leaves the task switch *to* that process, and making
sure that the stack pointer is ok in between the "switch %rsp" and
"switch %cr3".

So then we make the rule be: switch %cr3 *before* switching %rsp, and
only in between those places can we get in trouble. Yes/no?

And that small section is all with interrupts disabled, and nothing
should take an exception. The C code might take a double fault on a
regular access to the old stack (the *new* stack is guaranteed to be
mapped, but the old stack is not), but that should be very similar to
what we already do with "iret". So we can just fill in the page tables
and return.

For safety, add a percpu counter that is cleared before the %cr3
setting, to make sure that we only do a *single* double-fault, but it
really sounds pretty safe. No?

The only deadly thing would be NMI, but that's an IST anyway, so not
an issue. No other traps should be able to happen except the double
page table miss.

But hey, maybe I'm not crazy like a fox. Maybe I'm just plain crazy,
and I missed something else.

And no, I don't think the above is necessarily a *good* idea. But it
doesn't seem really overly complicated either.

                      Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ