linux-kernel - Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8153f5d15ec6aa4a221fb945e16d315068bd06e4.camel@intel.com>
Date:   Thu, 2 Mar 2023 21:17:49 +0000
From:   "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To:     "david@...hat.com" <david@...hat.com>,
        "bsingharora@...il.com" <bsingharora@...il.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "Syromiatnikov, Eugene" <esyr@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "rdunlap@...radead.org" <rdunlap@...radead.org>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "Eranian, Stephane" <eranian@...gle.com>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        "szabolcs.nagy@....com" <szabolcs.nagy@....com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "fweimer@...hat.com" <fweimer@...hat.com>,
        "nadav.amit@...il.com" <nadav.amit@...il.com>,
        "jannh@...gle.com" <jannh@...gle.com>,
        "dethoma@...rosoft.com" <dethoma@...rosoft.com>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "kcc@...gle.com" <kcc@...gle.com>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "bp@...en8.de" <bp@...en8.de>, "oleg@...hat.com" <oleg@...hat.com>,
        "hjl.tools@...il.com" <hjl.tools@...il.com>,
        "Yang, Weijiang" <weijiang.yang@...el.com>,
        "Lutomirski, Andy" <luto@...nel.org>,
        "pavel@....cz" <pavel@....cz>, "arnd@...db.de" <arnd@...db.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "Schimpe, Christina" <christina.schimpe@...el.com>,
        "mike.kravetz@...cle.com" <mike.kravetz@...cle.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "debug@...osinc.com" <debug@...osinc.com>,
        "jamorris@...ux.microsoft.com" <jamorris@...ux.microsoft.com>,
        "john.allen@....com" <john.allen@....com>,
        "rppt@...nel.org" <rppt@...nel.org>,
        "andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "corbet@....net" <corbet@....net>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "gorcunov@...il.com" <gorcunov@...il.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
CC:     "Yu, Yu-cheng" <yu-cheng.yu@...el.com>, "nd@....com" <nd@....com>
Subject: Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack
 description

On Thu, 2023-03-02 at 16:14 +0000, szabolcs.nagy@....com wrote:
> The 03/01/2023 18:07, Edgecombe, Rick P wrote:
> > On Wed, 2023-03-01 at 14:21 +0000, Szabolcs Nagy wrote:
> > > The 02/27/2023 14:29, Rick Edgecombe wrote:
> > > > +Application Enabling
> > > > +====================
> > > > +
> > > > +An application's CET capability is marked in its ELF note and
> > > > can
> > > > be verified
> > > > +from readelf/llvm-readelf output::
> > > > +
> > > > +    readelf -n <application> | grep -a SHSTK
> > > > +        properties: x86 feature: SHSTK
> > > > +
> > > > +The kernel does not process these applications markers
> > > > directly.
> > > > Applications
> > > > +or loaders must enable CET features using the interface
> > > > described
> > > > in section 4.
> > > > +Typically this would be done in dynamic loader or static
> > > > runtime
> > > > objects, as is
> > > > +the case in GLIBC.
> > > 
> > > Note that this has to be an early decision in libc (ld.so or
> > > static
> > > exe start code), which will be difficult to hook into system wide
> > > security policy settings. (e.g. to force shstk on marked
> > > binaries.)
> > 
> > In the eager enabling (by the kernel) scenario, how is this
> > improved?
> > The loader has to have the option to disable the shadow stack if
> > enabling conditions are not met, so it still has to trust userspace
> > to
> > not do that. Did you have any more specifics on how the policy
> > would
> > work?
> 
> i guess my issue is that the arch prctls only allow self policing.
> there is no kernel mechanism to set policy from outside the process
> that is either inherited or asynchronously set. policy is completely
> managed by libc (and done very early).
> 
> now i understand that async disable does not work (thanks for the
> explanation), but some control for forced enable/locking inherited
> across exec could work.

Is the idea that shadow stack would be forced on regardless of if the
linked libraries support it? In which case it could be allowed to crash
if they do not?

I think the majority of users would prefer the other case where shadow
stack is only used if supported, so this sounds like a special case.
Rather than lose the flexibility for the typical case, I would think
something like this could be an additional enabling mode. glibc could
check if shadow stack is already enabled by the kernel using the
arch_prctl()s in this case.

We are having to work around the existing broken glibc binaries by not
triggering off the elf bits automatically in the kernel, but I suppose
if this was a special "I don't care if it crashes" feature, maybe it
would be ok. Otherwise we would need to change the elf header bit to
exclude the old binaries to even be able to do this, and there was
extreme resistance to this idea from the userspace side.

> 
> > > From userspace POV I'd prefer if a static exe did not have to
> > > parse
> > > its own ELF notes (i.e. kernel enabled shstk based on the
> > > marking).
> > 
> > This is actually exactly what happens in the glibc patches. My
> > understand was that it already been discussed amongst glibc folks.
> 
> there were many glibc patches some of which are committed despite not
> having an accepted linux abi, so i'm trying to review the linux abi
> contracts and expect this patch to be authorative, please bear with
> me.

H.J. has some recent ones that work against this kernel series that
might interest you. The existing upstream glibc support will not get
used due to the enabling interface change to arch_prctl() (this was one
of the inspirations of the change actually).

> 
> > > - I think it's better to have a new limit specifically for shadow
> > >   stack size (which by default can be RLIMIT_STACK) so userspace
> > >   can adjust it if needed (another reason is that stack size is
> > >   not always a good indicator of max call depth).
> > 
> > Hmm, yea. This seems like a good idea, but I don't see why it can't
> > be
> > a follow on. The series is quite big just to get the basics. I have
> > tried to save some of the enhancements (like alt shadow stack) for
> > the
> > future.
> 
> it is actually not obvious how to introduce a limit so it is
> inherited
> or reset in a sensible way so i think it is useful to discuss it
> together with other issues.

Looking at this again, I'm not sure why a new rlimit is needed. It
seems many of those points were just formulations of that the clone3
stack size was not used, but it actually is and just not documented. If
you disagree perhaps you could elaborate on what the requirements are
and we can see if it seems tricky to do in a follow up.

> 
> > > The kernel pushes on the shadow stack on signal entry so shadow
> > > stack
> > > overflow cannot be handled. Please document this as non-
> > > recoverable
> > > failure.
> > 
> > It doesn't hurt to call it out. Please see the below link for
> > future
> > plans to handle this scenario (alt shadow stack).
> > 
> > > 
> > > I think it can be made recoverable if signals with alternate
> > > stack
> > > run
> > > on a different shadow stack. And the top of the thread shadow
> > > stack
> > > is
> > > just corrupted instead of pushed in the overflow case. Then
> > > longjmp
> > > out
> > > can be made to work (common in stack overflow handling cases),
> > > and
> > > reliable crash report from the signal handler works (also
> > > common).
> > > 
> > > Does SSP get stored into the sigcontext struct somewhere?
> > 
> > No, it's pushed to the shadow stack only. See the v2 coverletter of
> > the
> > discussion on the design and reasoning:
> > 
> > 
https://lore.kernel.org/lkml/20220929222936.14584-1-rick.p.edgecombe@intel.com/
> 
> i think this should be part of the initial design as it may be hard
> to change later.

This is actually how it came up. Andy Lutomirski said, paraphrasing,
"what if we want alt shadow stacks someday, does the signal frame ABI
support it?". So I created an ABI that supports it and an initial POC,
and said lets hold off on the implementation for the first version and
just use the sigframe ABI that will allow it for the future. So the
point was to make sure the signal format supported alt shadow stacks to
make it easier in the future.

> 
> "sigaltshstk() is separate from sigaltstack(). You can have one
> without the other, neither or both together. Because the shadow
> stack specific state is pushed to the shadow stack, the two
> features don’t need to know about each other."
> 
> this means they cannot be changed together atomically.

Not sure why this is needed since they can be used separately. So why
tie them together?

> 
> i'd expect most sigaltstack users to want to be resilient
> against shadow stack overflow which means non-portable
> code changes.

Portable between architectures? Or between shadow stack vs non-shadow
stack?

It does seem like it would not be uncommon for users to want both
together, but see below.

> 
> i don't see why automatic alt shadow stack allocation would
> not work (kernel manages it transparently when an alt stack
> is installed or disabled).

Ah, I think I see where maybe I can fill you in. Andy Luto had
discounted this idea out of hand originally, but I didn't see it at
first. sigaltstack lets you set, retrieve, or disable the shadow stack,
right... But this doesn't allocate anything, it just sets where the
next signal will be handled. This is different than things like threads
where there is a new resources being allocated and it makes coming up
with logic to guess when to de-allocate the alt shadow stack difficult.
You probably already know...

But because of this there can be some modes where the shadow stack is
changed while on it. For one example, SS_AUTODISARM will disable the
alt shadow stack while switching to it and restore when sigreturning.
At which point a new altstack can be set. In the non-shadow stack case
this is nice because future signals won't clobber the alt stack if you
switch away from it (swapcontext(), etc). But it also means you can
"change" the alt stack while on it ("change" sort of, the auto disarm
results in the kernel forgetting it temporarily).

I hear where you are coming from with the desire to have it "just work"
with existing code, but I think the resulting ABI around the alt shadow
stack allocation lifecycle would be way too complicated even if it
could be made to work. Hence making a new interface. But also, the idea
was that the x86 signal ABI should support handling alt shadow stacks,
which is what we have done with this series. If a different interface
for configuring it is better than the one from the POC, I'm not seeing
a problem jump out. Is there any specific concern about backwards
compatibility here?

> 
> "Since shadow alt stacks are a new feature, longjmp()ing from an
> alt shadow stack will simply not be supported. If a libc want’s
> to support this it will need to enable WRSS and write it’s own
> restore token."
> 
> i think longjmp should work without enabling writes to the shadow
> stack in the libc. this can also affect unwinding across signal
> handlers (not for c++ but e.g. glibc thread cancellation).

glibc today does not support longjmp()ing from a different stack (for
example even today after a swapcontext()) when shadow stack is used. If
glibc used wrss it could be supported maybe, but otherwise I don't see
how the HW can support it.

HJ and I were actually just discussing this the other day. Are you
looking at this series with respect to the arm shadow stack feature by
any chance? I would love if glibc/tools would document what the shadow
stack limitations are. If the all the arch's have the same or similar
limitations perhaps this could be one developer guide. For the most
part though, the limitations I've encountered are in glibc and the
kernel is more the building blocks.

> 
> i'd prefer overwriting the shadow stack top entry on overflow to
> disallowing longjmp out of a shadow stack overflow handler.
> 
> > > I think the map_shadow_stack syscall should be mentioned in this
> > > document too.
> > 
> > There is a man page prepared for this. I plan to update the docs to
> > reference it when it exists and not duplicate the text. There can
> > be a
> > blurb for the time being but it would be short lived.
> 
> i wanted to comment on the syscall because i think it may be better
> to have a magic mmap MAP_ flag that takes care of everything.
> 
> but i can go comment on the specific patch then.
> 
> thanks.

A general comment. Not sure if you are aware, but this shadow stack
enabling effort is quite old at this point and there have been many
discussions on these topics stretching back years. The latest
conversation was around getting this series into linux-next soon to get
some testing on the MM pieces. I really appreciate getting this ABI
feedback as it is always tricky to get right, but at this stage I would
hope to be focusing mostly on concrete problems.

I also expect to have some amount of ABI growth going forward with all
the normal things that entails. Shadow stack is not special in that it
can come fully finalized without the need for the real world usage
iterative feedback process. At some point we need to move forward with
something, and we have quite a bit of initial changes at this point.

So I would like to minimize the initial implementation unless anyone
sees any likely problems with future growth. Can you be clear if you
see any concrete problems at this point or are more looking to evaluate
the design reasoning? I'm under the assumption there is nothing that
would prohibit linux-next testing while any ABI shakedown happens
concurrently at least?

Thanks,
Rick