lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZfNTSjfE_w50Otnz@casper.infradead.org>
Date: Thu, 14 Mar 2024 19:43:06 +0000
From: Matthew Wilcox <willy@...radead.org>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	akpm@...ux-foundation.org, x86@...nel.org, bp@...en8.de,
	brauner@...nel.org, bristot@...hat.com, bsegall@...gle.com,
	dave.hansen@...ux.intel.com, dianders@...omium.org,
	dietmar.eggemann@....com, eric.devolder@...cle.com,
	hca@...ux.ibm.com, hch@...radead.org, jacob.jun.pan@...ux.intel.com,
	jgg@...pe.ca, jpoimboe@...nel.org, jroedel@...e.de,
	juri.lelli@...hat.com, kent.overstreet@...ux.dev,
	kinseyho@...gle.com, kirill.shutemov@...ux.intel.com,
	lstoakes@...il.com, luto@...nel.org, mgorman@...e.de,
	mic@...ikod.net, michael.christie@...cle.com, mingo@...hat.com,
	mjguzik@...il.com, mst@...hat.com, npiggin@...il.com,
	peterz@...radead.org, pmladek@...e.com, rick.p.edgecombe@...el.com,
	rostedt@...dmis.org, surenb@...gle.com, tglx@...utronix.de,
	urezki@...il.com, vincent.guittot@...aro.org, vschneid@...hat.com
Subject: Re: [RFC 00/14] Dynamic Kernel Stacks

On Tue, Mar 12, 2024 at 10:18:10AM -0700, H. Peter Anvin wrote:
> Second, non-dynamic kernel memory is one of the core design decisions in
> Linux from early on. This means there are lot of deeply embedded assumptions
> which would have to be untangled.

I think there are other ways of getting the benefit that Pasha is seeking
without moving to dynamically allocated kernel memory.  One icky thing
that XFS does is punt work over to a kernel thread in order to use more
stack!  That breaks a number of things including lockdep (because the
kernel thread doesn't own the lock, the thread waiting for the kernel
thread owns the lock).

If we had segmented stacks, XFS could say "I need at least 6kB of stack",
and if less than that was available, we could allocate a temporary
stack and switch to it.  I suspect Google would also be able to use this
API for their rare cases when they need more than 8kB of kernel stack.
Who knows, we might all be able to use such a thing.

I'd been thinking about this from the point of view of allocating more
stack elsewhere in kernel space, but combining what Pasha has done here
with this idea might lead to a hybrid approach that works better; allocate
32kB of vmap space per kernel thread, put 12kB of memory at the top of it,
rely on people using this "I need more stack" API correctly, and free the
excess pages on return to userspace.  No complicated "switch stacks" API
needed, just an "ensure we have at least N bytes of stack remaining" API.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ