linux-kernel - Re: CET shadow stack app compatibility

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a645prpj.fsf@oldenburg.str.redhat.com>
Date:   Fri, 02 Dec 2022 19:48:24 +0100
From:   Florian Weimer <fweimer@...hat.com>
To:     "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
Cc:     "Torvalds, Linus" <torvalds@...ux-foundation.org>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "hjl.tools@...il.com" <hjl.tools@...il.com>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: CET shadow stack app compatibility

* Rick P. Edgecombe:

> For IBT, which seems to be in worse shape than shadow stack from an
> existing userspace perspective, I have also seen shared objects with
> issues.
>
> For shadow stack, it was just JITing binaries.

Except that the actual JITters are usually in shared objects, too, and
you just assume here that they get loaded by a main program from the
same build. 8-) I think most of them are reusable independently, or are
bundled into applications built with a different toolchain.

> Of course if glibc is compiled in non-permissive mode there is an
> additional category of issues around dlopen()ing that we haven't even
> discussed yet. And the past issues around makecontext() we have
> already worked around from the kernel. If you are aware of any other
> specific compatibility problems, please share so we can discuss the
> extent.

H.J. ran most of the experiments on Fedora.  We did some early
validation many years ago, using the first ABI iteration.  We didn't
have as much reach as we liked in terms of hardening at the time, if I
recall correctly, but there were only very few cases where something did
not work and was also not marked as incompatible.

>>   The posted hack didn't even
>> deal with that case.  If the main executable has the current markers,
>> the kernel will not disable shadow stack, and the process will still
>> crash after loading the incorrectly marked shared object.
>
> The proposed glibc changes would not enable shadow stack unless the
> execing binary has the elf bit marked. So if we block those binaries
> (which the kernel can easily check) from enabling shadow stack, none of
> the linked shared objects will have shadow stack either. So I think we
> are ok to hold this in our back pocket to resolve the known issues if
> anyone complains.

See above, the assumption that the JITter and the main program come from
the same build that is implicit in this is not actually true in
practice.

> Where the shared objects could come into play is, in the event that we
> have to block the old elf bit from the kernel, and a new one is
> properly marked on a new executable, future glibcs could decide to
> honor the old bits when checking shared libraries. So you could have an
> executable with SHSTK2 bit loading a problem SO with just SHSTK1 bit.

Right.  But we can also have policies in userspace to paper over this.
I'm not worried about it.  I want to see how far we can get before
making the flip in an upstream version of glibc, but if the kernel
enforces SHSTK2 (even just on executables), I need a toolchain update
plus a rebuild of a large chunk of the distribution.

So with reusing SHSTK1 markup, it goes like this:

1. Get a Fedora rawhide kernel with userspace SHSTK support.
2. Get the glibc patches from H.J., and gate them behind a tunable
   (off by default).  Kernel behavior should not change with this
   new glibc because the required arch_prctl does not happen
   (and the old ones currently in glibc have different numbers).
3. Run the Fedora graphical desktop with the tunable switched on and a few key
   third-party applications to see where we stand in terms of
   compatibility.
3b Do the same thing with RHEL and some enterprise applications
   (using the kernel and glibc from 1 & 2 for a start).
4. (Optional.) Flip the default of the tunable to on.

I don't know how quickly we can get past step 1, but it seems fairly
soon, maybe three months, considering the upcoming end-of-year break.

With SHSTK2 markup required by the kernel, it goes like this:

1. Get a Fedora rawhide kernel with userspace SHSTK support.
2. Get a SHSTK2-enabled toolchain.  GCC is currently freezing for the 13
   release, so this is not a good time of the year for that.  It's
   probably going to be a custom compiler, unless we want to wait a
   couple of months, and even then it's got to be a downstream-only
   backport at first because to upstream, this will have a “not
   finished” whiff (it's the umpteenth ABI change).
3. Get the glibc patches from H.J.  We would probably put it behind
   a tunable as  well.
4. Rebuild key parts of Fedora, probably directly in rawhide (the
   rolling integration distribution).
5. Run the Fedora rawhide graphical desktop etc.
6. RHEL testing will require a SHSTK2 port to a different compiler
   and another mass rebuild.  ISV application testing is not meaningful
   until the ISVs have switched to a newer compiler.

That's going to take much longer than three months.  Maybe we have to do
this in the end, but even then, we have no way of forcing developers to
test on SHSTK-capable hardware on new-enough before turning on the
SHSTK2 bit.

In the end, we might still need SHSTK2, but we don't know that yet, and
the first approach is quite cheap, so I really want to try it.

Keep in mind that just because some useful interface is provided by the
kernel, we can't necessarily use it in glibc immediately because with
all those seccomp filters out there (and other dependencies on internal
glibc/kernel interface details), too much would break if we exposed it
into existing applications without some coordination.  SHSTK isn't
*that* different, except that we have some binary markup to guide us at
run time.

> But I still don't see why doing the order:
> 1. kernel support
> 2. libc support
> 3. compiler support
>
> ...wouldn't have generated a more normal situation where old binaries
> don't break against new kernels and testing can easily happen to reduce
> issues further. So we could still reset and do exactly that.

No matter in which order you do it, some group will want to change ABI
or semantics.  We actually had multiple different iterations in
different orders, and everybody wanted to put their mark onto this
feature, changing the ABI.  I don't care at all about the internal ABI
between glibc and the kernel, but the markup of the binaries (besides
glibc itself) is quite important to me.

In retrospect, separating SHSTK from IBT from the start would have
helped a lot because I think we could have done that in libc without
compiler support.  But I don't think anyone expected this to take four
to five years to implement (or probably longer for IBT).

>>   Instead, we'd have to
>> wait for a rebuild with the new markers, and of course this rebuild
>> will
>> put is in exactly the same position as before: the incompatibilities
>> will be back because they are no longer masked by the kernel.
>
> People building new apps and testing them against upstream kernels and
> finding issues sounds like business as usual. I'm not trying to solve
> all possible userspace mistakes from the kernel.

They also have to test on the right hardware and with a new/unreleased
glibc.

I think it would be helpful to those developers if we could give them an
existing distribution early on they can use for experiments.  Not just
getting SHSTK going, but also playing with the perf integration (which
to me is the real goal here).

Thanks,
Florian