linux-kernel - Re: [PATCH v8 00/38] arm64/gcs: Provide support for GCS in userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c7bdf8fde9cc45174f10b9221fa58ffb450b755.camel@intel.com>
Date: Tue, 20 Feb 2024 18:41:05 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "corbet@....net" <corbet@....net>, "ardb@...nel.org" <ardb@...nel.org>,
	"maz@...nel.org" <maz@...nel.org>, "shuah@...nel.org" <shuah@...nel.org>,
	"Szabolcs.Nagy@....com" <Szabolcs.Nagy@....com>, "keescook@...omium.org"
	<keescook@...omium.org>, "james.morse@....com" <james.morse@....com>,
	"debug@...osinc.com" <debug@...osinc.com>, "akpm@...ux-foundation.org"
	<akpm@...ux-foundation.org>, "catalin.marinas@....com"
	<catalin.marinas@....com>, "oleg@...hat.com" <oleg@...hat.com>,
	"arnd@...db.de" <arnd@...db.de>, "ebiederm@...ssion.com"
	<ebiederm@...ssion.com>, "will@...nel.org" <will@...nel.org>,
	"suzuki.poulose@....com" <suzuki.poulose@....com>, "sorear@...tmail.com"
	<sorear@...tmail.com>, "oliver.upton@...ux.dev" <oliver.upton@...ux.dev>,
	"broonie@...nel.org" <broonie@...nel.org>
CC: "brauner@...nel.org" <brauner@...nel.org>, "fweimer@...hat.com"
	<fweimer@...hat.com>, "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"paul.walmsley@...ive.com" <paul.walmsley@...ive.com>, "hjl.tools@...il.com"
	<hjl.tools@...il.com>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
	"palmer@...belt.com" <palmer@...belt.com>, "kvmarm@...ts.linux.dev"
	<kvmarm@...ts.linux.dev>, "linux-arch@...r.kernel.org"
	<linux-arch@...r.kernel.org>, "thiago.bauermann@...aro.org"
	<thiago.bauermann@...aro.org>, "linux-doc@...r.kernel.org"
	<linux-doc@...r.kernel.org>, "linux-fsdevel@...r.kernel.org"
	<linux-fsdevel@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "linux-kselftest@...r.kernel.org"
	<linux-kselftest@...r.kernel.org>, "musl@...ts.openwall.com"
	<musl@...ts.openwall.com>, "linux-riscv@...ts.infradead.org"
	<linux-riscv@...ts.infradead.org>
Subject: Re: [PATCH v8 00/38] arm64/gcs: Provide support for GCS in userspace

Hi,

I worked on the x86 kernel shadow stack support. I think it is an
interesting suggestion. Some questions below, and I will think more on
it.

On Tue, 2024-02-20 at 11:36 -0500, Stefan O'Rear wrote:
> While discussing the ABI implications of shadow stacks in the context
> of
> Zicfiss and musl a few days ago, I had the following idea for how to
> solve
> the source compatibility problems with shadow stacks in POSIX.1-2004
> and
> POSIX.1-2017:
> 
> 1. Introduce a "flexible shadow stack handling" option.  For what
> follows,
>    it doesn't matter if this is system-wide, per-mm, or per-vma.
> 
> 2. Shadow stack faults on non-shadow stack pages, if flexible shadow
> stack
>    handling is in effect, cause the affected page to become a shadow
> stack
>    page.  When this happens, the page filled with invalid address
> tokens.

Hmm, could the shadow stack underflow onto the real stack then? Not
sure how bad that is. INCSSP (incrementing the SSP register on x86)
loops are not rare so it seems like something that could happen.

> 
>    Faults from non-shadow-stack accesses to a shadow-stack page which
> was
>    created by the previous paragraph will cause the page to revert to
>    non-shadow-stack usage, with or without clearing.

Won't this prevent catching stack overflows when they happen? An
overflow will just turn the shadow stack into normal stack and only get
detected when the shadow stack unwinds?

A related question would be how to handle the expanding nature of the
initial stack. I guess the initial stack could be special and have a
separate shadow stack.

> 
>    Important: a shadow stack operation can only load a valid address
> from
>    a page if that page has been in continuous shadow stack use since
> the
>    address was written by another shadow stack operation; the
> flexibility
>    delays error reporting in cases of stray writes but it never
> allows for
>    corruption of shadow stack operation.

Shadow stacks currently have automatic guard gaps to try to prevent one
thread from overflowing onto another thread's shadow stack. This would
somewhat opens that up, as the stack guard gaps are usually maintained
by userspace for new threads. It would have to be thought through if
these could still be enforced with checking at additional spots.

> 
> 3. Standards-defined operations which use a user-provided stack
>    (makecontext, sigaltstack, pthread_attr_setstack) use a subrange
> of the
>    provided stack for shadow stack storage.  I propose to use a
> shadow
>    stack size of 1/32 of the provided stack size, rounded up to a
> positive
>    integer number of pages, and place the shadow stack allocation at
> the
>    lowest page-aligned address inside the provided stack region.
> 
>    Since page usage is flexible, no change in page permissions is
>    immediately needed; this merely sets the initial shadow stack
> pointer for
>    the new context.
> 
>    If the shadow stack grew in the opposite direction to the
> architectural
>    stack, it would not be necessary to pick a fixed direction.
> 
> 4. SIGSTKSZ and MINSIGSTKSZ are increased by 2 pages to provide
> sufficient
>    space for a minimum-sized shadow stack region and worst case
> alignment.

Do all makecontext() callers ensure the size is greater than this?

I guess glibc's makecontext() could do this scheme to prevent leaking
without any changes to the kernel. Basically steal a little of the
stack address range and overwrite it with a shadow stack mapping. But
only if the apps leave enough room. If they need to be updated, then
they could be updated to manage their own shadow stacks too I think.

> 
> _Without_ doing this, sigaltstack cannot be used to recover from
> stack
> overflows if the shadow stack limit is reached first, and makecontext
> cannot be supported without memory leaks and unreportable error
> conditions.

FWIW, I think the makecontext() shadow stack leaking is a bad idea. I
would prefer the existing makecontext() interface just didn't support
shadow stack, rather than the leaking solution glibc does today.

The situation (for arm and riscv too I think?) is that some
applications will just not work automatically due to custom stack
switching implementations. (user level threading libraries, JITs, etc).
So I think it should be ok to ask for apps to change to enable shadow
stack and we should avoid doing anything too awkward in pursuit of
getting it to work completely transparently.

For ucontext, there was some discussion about implementing changes to
the interface makecontext() interface that allows the app to allocate
and manage their own shadow stacks. So they would be responsible for
freeing and allocating the shadow stacks. It seems a little more
straightforward.


For x86, due to some existing GCC binaries that jumped ahead of the
kernel support, it will likely require an ABI opt-in to enable alt
shadow stacks. So alt shadow stack support design is still pretty open
on the x86 side. Very glad to get broader input on it.

> 
> Kernel-allocated shadow stacks with a unique VM type are still useful
> since
> they allows stray writes to crash at the time the stray write is
> performed,
> rather than delaying the crash until the next shadow stack read.
> 
> The pthread and makecontext changes could be purely libc side, but we
> would
> need kernel support for sigaltstack and page usage changes.
> 
> Luckily, there is no need to support stacks which are simultaneously
> used
> from more than one process, so "is this a shadow stack page" can be
> tracked
> purely at the vma/pte level without any need to involve the inode. 
> POSIX
> explicitly allows using mmap to obtain stack memory and does not
> forbid
> MAP_SHARED; I consider this sufficiently perverse application
> behavior that
> it is not necessary to ensure exclusive use of the underlying pages
> while
> a shadow stack pte exists.  (Applications that use MAP_SHARED for
> stacks
> do not get the full benefit of the shadow stack but they keep
> POSIX.1-2004
> conformance, applications that allocate stacks exclusively in
> MAP_PRIVATE
> memory lose no security.)

On x86 we don't support MAP_SHARED shadow stacks. There is a whole
snarl around the dirty bit in the PTE. I'm not sure it's impossible but
it was gladly avoided. There is also a benefit in avoiding having them
get mapped as writable in a different context.

> 
> The largest complication of this scheme is likely to be that the
> shadow
> stack usage property of a page needs to survive the page being
> swapped out
> and back in, which likely means that it must be present in the swap
> PTE.
> 
> I am substantially less familiar with GCS and SHSTK than with
> Zicfiss.
> It is likely that a syscall or other mechanism is needed to
> initialize the
> shadow stack in flexible memory for makecontext.

The ucontext stacks (and alt shadow stacks is the plan) need to have a
"restore token". So, yea, you would probably need some syscall to
"convert" the normal stack memory into shadow stack with a restore
token.

> 
> Is there interest on the kernel side on having mechanisms to fully
> support
> POSIX.1-2004 with GCS or Zicfiss enabled?

Can you clarify, is the goal to meet compatibility with the spec or try
to make more apps run with shadow stack automatically?