lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lkf36cacg2fyy2et424jghci3irtz55gzytt56bcwzbmw372ju@acpa7mtfihfb>
Date: Thu, 14 Mar 2024 15:58:06 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Matthew Wilcox <willy@...radead.org>
Cc: "H. Peter Anvin" <hpa@...or.com>, 
	Pasha Tatashin <pasha.tatashin@...een.com>, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	akpm@...ux-foundation.org, x86@...nel.org, bp@...en8.de, brauner@...nel.org, 
	bristot@...hat.com, bsegall@...gle.com, dave.hansen@...ux.intel.com, 
	dianders@...omium.org, dietmar.eggemann@....com, eric.devolder@...cle.com, 
	hca@...ux.ibm.com, hch@...radead.org, jacob.jun.pan@...ux.intel.com, jgg@...pe.ca, 
	jpoimboe@...nel.org, jroedel@...e.de, juri.lelli@...hat.com, kinseyho@...gle.com, 
	kirill.shutemov@...ux.intel.com, lstoakes@...il.com, luto@...nel.org, mgorman@...e.de, 
	mic@...ikod.net, michael.christie@...cle.com, mingo@...hat.com, mjguzik@...il.com, 
	mst@...hat.com, npiggin@...il.com, peterz@...radead.org, pmladek@...e.com, 
	rick.p.edgecombe@...el.com, rostedt@...dmis.org, surenb@...gle.com, tglx@...utronix.de, 
	urezki@...il.com, vincent.guittot@...aro.org, vschneid@...hat.com
Subject: Re: [RFC 00/14] Dynamic Kernel Stacks

On Thu, Mar 14, 2024 at 07:57:22PM +0000, Matthew Wilcox wrote:
> On Thu, Mar 14, 2024 at 03:53:39PM -0400, Kent Overstreet wrote:
> > On Thu, Mar 14, 2024 at 07:43:06PM +0000, Matthew Wilcox wrote:
> > > On Tue, Mar 12, 2024 at 10:18:10AM -0700, H. Peter Anvin wrote:
> > > > Second, non-dynamic kernel memory is one of the core design decisions in
> > > > Linux from early on. This means there are lot of deeply embedded assumptions
> > > > which would have to be untangled.
> > > 
> > > I think there are other ways of getting the benefit that Pasha is seeking
> > > without moving to dynamically allocated kernel memory.  One icky thing
> > > that XFS does is punt work over to a kernel thread in order to use more
> > > stack!  That breaks a number of things including lockdep (because the
> > > kernel thread doesn't own the lock, the thread waiting for the kernel
> > > thread owns the lock).
> > > 
> > > If we had segmented stacks, XFS could say "I need at least 6kB of stack",
> > > and if less than that was available, we could allocate a temporary
> > > stack and switch to it.  I suspect Google would also be able to use this
> > > API for their rare cases when they need more than 8kB of kernel stack.
> > > Who knows, we might all be able to use such a thing.
> > > 
> > > I'd been thinking about this from the point of view of allocating more
> > > stack elsewhere in kernel space, but combining what Pasha has done here
> > > with this idea might lead to a hybrid approach that works better; allocate
> > > 32kB of vmap space per kernel thread, put 12kB of memory at the top of it,
> > > rely on people using this "I need more stack" API correctly, and free the
> > > excess pages on return to userspace.  No complicated "switch stacks" API
> > > needed, just an "ensure we have at least N bytes of stack remaining" API.
> > 
> > Why would we need an "I need more stack" API? Pasha's approach seems
> > like everything we need for what you're talking about.
> 
> Because double faults are hard, possibly impossible, and the FRED approach
> Peter described has extra overhead?  This was all described up-thread.

*nod*

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ