lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bATJ7_EUszq4nr0AuZXG76nUhDs9osbxPUs=mLPFtW8Zg@mail.gmail.com>
Date: Sun, 17 Mar 2024 10:19:10 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: "H. Peter Anvin" <hpa@...or.com>, Kent Overstreet <kent.overstreet@...ux.dev>, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, akpm@...ux-foundation.org, 
	x86@...nel.org, bp@...en8.de, brauner@...nel.org, bristot@...hat.com, 
	bsegall@...gle.com, dave.hansen@...ux.intel.com, dianders@...omium.org, 
	dietmar.eggemann@....com, eric.devolder@...cle.com, hca@...ux.ibm.com, 
	hch@...radead.org, jacob.jun.pan@...ux.intel.com, jgg@...pe.ca, 
	jpoimboe@...nel.org, jroedel@...e.de, juri.lelli@...hat.com, 
	kinseyho@...gle.com, kirill.shutemov@...ux.intel.com, lstoakes@...il.com, 
	luto@...nel.org, mgorman@...e.de, mic@...ikod.net, 
	michael.christie@...cle.com, mingo@...hat.com, mjguzik@...il.com, 
	mst@...hat.com, npiggin@...il.com, peterz@...radead.org, pmladek@...e.com, 
	rick.p.edgecombe@...el.com, rostedt@...dmis.org, surenb@...gle.com, 
	tglx@...utronix.de, urezki@...il.com, vincent.guittot@...aro.org, 
	vschneid@...hat.com
Subject: Re: [RFC 00/14] Dynamic Kernel Stacks

On Sat, Mar 16, 2024 at 8:41 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Sat, Mar 16, 2024 at 03:17:57PM -0400, Pasha Tatashin wrote:
> > Expanding on Mathew's idea of an interface for dynamic kernel stack
> > sizes, here's what I'm thinking:
> >
> > - Kernel Threads: Create all kernel threads with a fully populated
> > THREAD_SIZE stack.  (i.e. 16K)
> > - User Threads: Create all user threads with THREAD_SIZE kernel stack
> > but only the top page mapped. (i.e. 4K)
> > - In enter_from_user_mode(): Expand the thread stack to 16K by mapping
> > three additional pages from the per-CPU stack cache. This function is
> > called early in kernel entry points.
> > - exit_to_user_mode(): Unmap the extra three pages and return them to
> > the per-CPU cache. This function is called late in the kernel exit
> > path.
> >
> > Both of the above hooks are called with IRQ disabled on all kernel
> > entries whether through interrupts and syscalls, and they are called
> > early/late enough that 4K is enough to handle the rest of entry/exit.
>
> At what point do we replenish the per-CPU stash of pages?  If we're
> 12kB deep in the stack and call mutex_lock(), we can be scheduled out,
> and then the new thread can make a syscall.  Do we just assume that
> get_free_page() can sleep at kernel entry (seems reasonable)?  I don't
> think this is an infeasible problem, I'd just like it to be described.

Once irq is enabled it is perfectly OK to sleep and wait for the stack
pages to become available.

The following user entries that enable interrupts:
do_user_addr_fault()
   local_irq_enable()

do_syscall_64()
  syscall_enter_from_user_mode()
    local_irq_enable()

__do_fast_syscall_32()
  syscall_enter_from_user_mode_prepare()
    local_irq_enable()

exc_debug_user()
  local_irq_enable()

do_int3_user()
  cond_local_irq_enable()

With those it is perfectly OK to sleep and wait for the page to become
available when we are in a situation where the per-cpu cache is empty,
and alloc_page(GFP_NOWAIT) does not succeed.

The other interrupts from userland never enable IRQs. We can have
3-pages per-cpu reserved for handling specifically IRQ-never enable
cases, as there cannot be more than one ever needed.

Pasha

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ