linux-kernel - Re: [RFC 00/14] Dynamic Kernel Stacks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGudoHHFQPiYkpHrBqSUVDtxaWXLbSc3ZJDOwMEzheBLO8E6Lw@mail.gmail.com>
Date: Mon, 11 Mar 2024 20:21:01 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	akpm@...ux-foundation.org, x86@...nel.org, bp@...en8.de, brauner@...nel.org, 
	bristot@...hat.com, bsegall@...gle.com, dave.hansen@...ux.intel.com, 
	dianders@...omium.org, dietmar.eggemann@....com, hca@...ux.ibm.com, 
	hch@...radead.org, hpa@...or.com, jacob.jun.pan@...ux.intel.com, jgg@...pe.ca, 
	jpoimboe@...nel.org, jroedel@...e.de, juri.lelli@...hat.com, 
	kent.overstreet@...ux.dev, kinseyho@...gle.com, 
	kirill.shutemov@...ux.intel.com, lstoakes@...il.com, luto@...nel.org, 
	mgorman@...e.de, mic@...ikod.net, michael.christie@...cle.com, 
	mingo@...hat.com, mst@...hat.com, npiggin@...il.com, peterz@...radead.org, 
	pmladek@...e.com, rick.p.edgecombe@...el.com, rostedt@...dmis.org, 
	surenb@...gle.com, tglx@...utronix.de, urezki@...il.com, 
	vincent.guittot@...aro.org, vschneid@...hat.com
Subject: Re: [RFC 00/14] Dynamic Kernel Stacks

On 3/11/24, Pasha Tatashin <pasha.tatashin@...een.com> wrote:
> On Mon, Mar 11, 2024 at 1:09 PM Mateusz Guzik <mjguzik@...il.com> wrote:
>> 1. what about faults when the thread holds a bunch of arbitrary locks
>> or has preemption disabled? is the allocation lockless?
>
> Each thread has a stack with 4 pages.
> Pre-allocated page: This page is always allocated and mapped at thread
> creation.
> Dynamic pages (3): These pages are mapped dynamically upon stack faults.
>
> A per-CPU data structure holds 3 dynamic pages for each CPU. These
> pages are used to handle stack faults occurring when a running thread
> faults (even within interrupt-disabled contexts). Typically, only one
> page is needed, but in the rare case where the thread accesses beyond
> that, we might use up to all three pages in a single fault. This
> structure allows for atomic handling of stack faults, preventing
> conflicts from other processes. Additionally, the thread's 16K-aligned
> virtual address (VA) and guaranteed pre-allocated page means no page
> table allocation is required during the fault.
>
> When a thread leaves the CPU in normal kernel mode, we check a flag to
> see if it has experienced stack faults. If so, we charge the thread
> for the new stack pages and refill the per-CPU data structure with any
> missing pages.
>

So this also has to happen if the thread holds a bunch of arbitrary
semaphores and goes off cpu with them? Anyhow, see below.

>> 2. what happens if there is no memory from which to map extra pages in
>> the first place? you may be in position where you can't go off cpu
>
> When the per-CPU data structure cannot be refilled, and a new thread
> faults, we issue a message indicating a critical stack fault. This
> triggers a system-wide panic similar to a guard page access violation
>

OOM handling is fundamentally what I was worried about. I'm confident
this failure mode makes the feature unsuitable for general-purpose
deployments.

Now, I have no vote here, it may be this is perfectly fine as an
optional feature, which it is in your patchset. However, if this is to
go in, the option description definitely needs a big fat warning about
possible panics if enabled.

I fully agree something(tm) should be done about stacks and the
current usage is a massive bummer. I wonder if things would be ok if
they shrinked to just 12K? Perhaps that would provide big enough
saving (of course smaller than the one you are getting now), while
avoiding any of the above.

All that said, it's not my call what do here. Thank you for the explanation.

-- 
Mateusz Guzik <mjguzik gmail.com>