linux-kernel - Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+CK2bC=6GOkCOwJdhH25r-9hb1BQVoLK7LLAgpm2AKqdmStrg@mail.gmail.com>
Date: Thu, 14 Mar 2024 10:03:55 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	akpm@...ux-foundation.org, x86@...nel.org, bp@...en8.de, brauner@...nel.org, 
	bristot@...hat.com, bsegall@...gle.com, dave.hansen@...ux.intel.com, 
	dianders@...omium.org, dietmar.eggemann@....com, eric.devolder@...cle.com, 
	hca@...ux.ibm.com, hch@...radead.org, hpa@...or.com, 
	jacob.jun.pan@...ux.intel.com, jgg@...pe.ca, jpoimboe@...nel.org, 
	jroedel@...e.de, juri.lelli@...hat.com, kent.overstreet@...ux.dev, 
	kinseyho@...gle.com, kirill.shutemov@...ux.intel.com, lstoakes@...il.com, 
	luto@...nel.org, mgorman@...e.de, mic@...ikod.net, 
	michael.christie@...cle.com, mingo@...hat.com, mjguzik@...il.com, 
	mst@...hat.com, npiggin@...il.com, peterz@...radead.org, pmladek@...e.com, 
	rick.p.edgecombe@...el.com, rostedt@...dmis.org, surenb@...gle.com, 
	urezki@...il.com, vincent.guittot@...aro.org, vschneid@...hat.com
Subject: Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks

On Wed, Mar 13, 2024 at 12:12 PM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Wed, Mar 13 2024 at 11:28, Pasha Tatashin wrote:
> > On Wed, Mar 13, 2024 at 9:43 AM Pasha Tatashin
> > <pasha.tatashin@...een.com> wrote:
> >> Here's a potential solution that is fast, avoids locking, and ensures atomicity:
> >>
> >> 1. Kernel Stack VA Space
> >> Dedicate a virtual address range ([KSTACK_START_VA - KSTACK_END_VA])
> >> exclusively for kernel stacks. This simplifies validation of faulting
> >> addresses to be part of a stack.
> >>
> >> 2. Finding the faulty task
> >> - Use ALIGN(fault_address, THREAD_SIZE) to calculate the end of the
> >> topmost stack page (since stack addresses are aligned to THREAD_SIZE).
> >> - Store the task_struct pointer as the last word on this topmost page,
> >> that is always present as it is a pre-allcated stack page.
> >>
> >> 3. Stack Padding
> >> Increase padding to 8 bytes on x86_64 (TOP_OF_KERNEL_STACK_PADDING 8)
> >> to accommodate the task_struct pointer.
> >
> > Alternatively, do not even look-up the task_struct in
> > dynamic_stack_fault(), but only install the mapping to the faulting
> > address, store va in the per-cpu array, and handle the rest in
> > dynamic_stack() during context switching. At that time spin locks can
> > be taken, and we can do a find_vm_area(addr) call.
> >
> > This way, we would not need to modify TOP_OF_KERNEL_STACK_PADDING to
> > keep task_struct in there.
>
> Why not simply doing the 'current' update right next to the stack
> switching in __switch_to_asm() which has no way of faulting.
>
> That needs to validate whether anything uses current between the stack
> switch and the place where current is updated today. I think nothing
> should do so, but I would not be surprised either if it would be the
> case. Such code would already today just work by chance I think,
>
> That should not be hard to analyze and fixup if necessary.
>
> So that's fixable, but I'm not really convinced that all of this is safe
> and correct under all circumstances. That needs a lot more analysis than
> just the trivial one I did for switch_to().

Agreed, if the current task pointer can be switched later, after loads
and stores to the stack, that would be a better solution. I will
incorporate this approach into my next version.

I also concur that this proposal necessitates more rigorous analysis.
This work remains in the investigative phase, where I am seeking a
viable solution to the problem.

The core issue is that kernel stacks consume excessive memory for
certain workloads. However, we cannot simply reduce their size, as
this leads to machine crashes in the infrequent instances where stacks
do run deep.

Thanks,
Pasha