linux-kernel - Re: [PATCH 05/11] mm: Introduce arch_pgd_init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 22 Sep 2015 11:37:18 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Brian Gerst <brgerst@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Borislav Petkov <bp@...en8.de>,
	"H. Peter Anvin" <hpa@...or.com>, Oleg Nesterov <oleg@...hat.com>,
	Waiman Long <waiman.long@...com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 05/11] mm: Introduce arch_pgd_init_late()

On Tue, Sep 22, 2015 at 11:26 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Tue, Sep 22, 2015 at 11:00 AM, Andy Lutomirski <luto@...capital.net> wrote:
>>
>> I really really hate the vmalloc fault thing.  It seems to work,
>> rather to my surprise.  It doesn't *deserve* to work, because of
>> things like the percpu TSS accesses in the entry code that happen
>> without a valid stack.
>
> The thing is, I think you're misguided in your hatred.
>
> The reason I say that is because I think we should just embrace the
> fact that faults can and do happen in the kernel in very inconvenient
> places, and not just in code we "control".
>
> Even if you get rid of the vmalloc fault, you'll still have debug
> faults, and you'll still have NMI's and horrible crazy machine check
> faults.
>
> I actually think teh vmalloc fault is a good way to just let people
> know "pretty much anything can trap, deal with it".
>
> And I think trying to eliminate them is the wrong thing, because it
> forces us to be so damn synchronized. This whole patch-series is a
> prime example of why that is a bad bad things. We want to have _less_
> synchronization.

Sure, pretty much anything can trap, but we need to do *something* to
deal with it.

Debug faults can't happen with bad stacks any more (now that we honor
the kprobe blacklist), which means that debug faults could, in theory,
move off the IST stack.  The SYSENTER + debug mess doesn't have any
stack problem.

NMIs and MCEs are special, and we deal with that using IST and all
kinds of mess.

I don't think that anyone really wants to move #PF to IST, which means
that we simply cannot handle vmalloc faults that happen when switching
stacks after SYSCALL, no matter what fanciness we shove into the
page_fault asm.  If we move #PF to IST, then we have to worry about
page_fault -> nmi -> page_fault, which would be a clusterf*ck.

AMD gave us a pile of misguided architectural turds, and we have to
deal with it.  My preference is to simplify dealing with it by getting
rid of vmalloc faults so that we can at least reliably touch percpu
memory without faulting.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/