lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 20 Jun 2016 21:01:40 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Andy Lutomirski <luto@...nel.org>
Cc:	"the arch/x86 maintainers" <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
	Borislav Petkov <bp@...en8.de>,
	Nadav Amit <nadav.amit@...il.com>,
	Kees Cook <keescook@...omium.org>,
	Brian Gerst <brgerst@...il.com>,
	"kernel-hardening@...ts.openwall.com" 
	<kernel-hardening@...ts.openwall.com>,
	Josh Poimboeuf <jpoimboe@...hat.com>,
	Jann Horn <jann@...jh.net>,
	Heiko Carstens <heiko.carstens@...ibm.com>
Subject: Re: [PATCH v3 00/13] Virtually mapped stacks with guard pages (x86, core)

On Mon, Jun 20, 2016 at 4:43 PM, Andy Lutomirski <luto@...nel.org> wrote:
>
> On my laptop, this adds about 1.5µs of overhead to task creation,
> which seems to be mainly caused by vmalloc inefficiently allocating
> individual pages even when a higher-order page is available on the
> freelist.

I really think that problem needs to be fixed before this should be merged.

The easy fix may be to just have a very limited re-use of these stacks
in generic code, rather than try to do anything fancy with multi-page
allocations. Just a few of these allocations held in reserve (perhaps
make the allocations percpu to avoid new locks).

It won't help for a thundering herd problem where you start tons of
new threads, but those don't tend to be short-lived ones anyway. In
contrast, I think one common case is the "run shell scripts" that runs
tons and tons of short-lived processes, and having a small "stack of
stacks" would probably catch that case very nicely. Even a
single-entry cache might be ok, but I see no reason to not make it be
perhaps three or four stacks per CPU.

Make the "thread create/exit" sequence go really fast by avoiding the
allocation/deallocation, and hopefully catching a hot cache and TLB
line too.

Performance is not something that we add later. If the first version
of the patch series doesn't perform well, it should not be considered
ready.

            Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ