[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZfY8PSnsLtkHBBZF@casper.infradead.org>
Date: Sun, 17 Mar 2024 00:41:33 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: "H. Peter Anvin" <hpa@...or.com>,
Kent Overstreet <kent.overstreet@...ux.dev>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
akpm@...ux-foundation.org, x86@...nel.org, bp@...en8.de,
brauner@...nel.org, bristot@...hat.com, bsegall@...gle.com,
dave.hansen@...ux.intel.com, dianders@...omium.org,
dietmar.eggemann@....com, eric.devolder@...cle.com,
hca@...ux.ibm.com, hch@...radead.org, jacob.jun.pan@...ux.intel.com,
jgg@...pe.ca, jpoimboe@...nel.org, jroedel@...e.de,
juri.lelli@...hat.com, kinseyho@...gle.com,
kirill.shutemov@...ux.intel.com, lstoakes@...il.com,
luto@...nel.org, mgorman@...e.de, mic@...ikod.net,
michael.christie@...cle.com, mingo@...hat.com, mjguzik@...il.com,
mst@...hat.com, npiggin@...il.com, peterz@...radead.org,
pmladek@...e.com, rick.p.edgecombe@...el.com, rostedt@...dmis.org,
surenb@...gle.com, tglx@...utronix.de, urezki@...il.com,
vincent.guittot@...aro.org, vschneid@...hat.com
Subject: Re: [RFC 00/14] Dynamic Kernel Stacks
On Sat, Mar 16, 2024 at 03:17:57PM -0400, Pasha Tatashin wrote:
> Expanding on Mathew's idea of an interface for dynamic kernel stack
> sizes, here's what I'm thinking:
>
> - Kernel Threads: Create all kernel threads with a fully populated
> THREAD_SIZE stack. (i.e. 16K)
> - User Threads: Create all user threads with THREAD_SIZE kernel stack
> but only the top page mapped. (i.e. 4K)
> - In enter_from_user_mode(): Expand the thread stack to 16K by mapping
> three additional pages from the per-CPU stack cache. This function is
> called early in kernel entry points.
> - exit_to_user_mode(): Unmap the extra three pages and return them to
> the per-CPU cache. This function is called late in the kernel exit
> path.
>
> Both of the above hooks are called with IRQ disabled on all kernel
> entries whether through interrupts and syscalls, and they are called
> early/late enough that 4K is enough to handle the rest of entry/exit.
At what point do we replenish the per-CPU stash of pages? If we're
12kB deep in the stack and call mutex_lock(), we can be scheduled out,
and then the new thread can make a syscall. Do we just assume that
get_free_page() can sleep at kernel entry (seems reasonable)? I don't
think this is an infeasible problem, I'd just like it to be described.
Powered by blists - more mailing lists