[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c5ae83588a7e107beaf858ab04961e70d16fe32c.camel@intel.com>
Date: Wed, 21 Jun 2023 18:54:17 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "broonie@...nel.org" <broonie@...nel.org>,
"szabolcs.nagy@....com" <szabolcs.nagy@....com>
CC: "Xu, Pengfei" <pengfei.xu@...el.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"kcc@...gle.com" <kcc@...gle.com>,
"Lutomirski, Andy" <luto@...nel.org>,
"nadav.amit@...il.com" <nadav.amit@...il.com>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
"david@...hat.com" <david@...hat.com>,
"Schimpe, Christina" <christina.schimpe@...el.com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"corbet@....net" <corbet@....net>, "nd@....com" <nd@....com>,
"dethoma@...rosoft.com" <dethoma@...rosoft.com>,
"jannh@...gle.com" <jannh@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"debug@...osinc.com" <debug@...osinc.com>,
"pavel@....cz" <pavel@....cz>, "bp@...en8.de" <bp@...en8.de>,
"mike.kravetz@...cle.com" <mike.kravetz@...cle.com>,
"linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
"rppt@...nel.org" <rppt@...nel.org>,
"jamorris@...ux.microsoft.com" <jamorris@...ux.microsoft.com>,
"arnd@...db.de" <arnd@...db.de>,
"john.allen@....com" <john.allen@....com>,
"rdunlap@...radead.org" <rdunlap@...radead.org>,
"bsingharora@...il.com" <bsingharora@...il.com>,
"oleg@...hat.com" <oleg@...hat.com>,
"andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
"keescook@...omium.org" <keescook@...omium.org>,
"x86@...nel.org" <x86@...nel.org>,
"gorcunov@...il.com" <gorcunov@...il.com>,
"Yu, Yu-cheng" <yu-cheng.yu@...el.com>,
"fweimer@...hat.com" <fweimer@...hat.com>,
"hpa@...or.com" <hpa@...or.com>,
"mingo@...hat.com" <mingo@...hat.com>,
"hjl.tools@...il.com" <hjl.tools@...il.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"Syromiatnikov, Eugene" <esyr@...hat.com>,
"Torvalds, Linus" <torvalds@...ux-foundation.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"Yang, Weijiang" <weijiang.yang@...el.com>,
"Eranian, Stephane" <eranian@...gle.com>
Subject: Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack
description
On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@....com wrote:
> > The 06/20/2023 19:34, Edgecombe, Rick P wrote:
> > > > On Tue, 2023-06-20 at 10:17 +0100, szabolcs.nagy@....com wrote:
> > > > > > if there is a fix that's good, i haven't seen it.
> > > > > >
> > > > > > my point was that the current unwinder works with current
> > > > > > kernel
> > > > > > patches, but does not allow future extensions which
> > > > > > prevents
> > > > > > sigaltshstk to work. the unwinder is not versioned so this
> > > > > > cannot
> > > > > > be fixed later. it only works if distros ensure shstk is
> > > > > > disabled
> > > > > > until the unwinder is fixed. (however there is no way to
> > > > > > detect
> > > > > > old unwinder if somebody builds gcc from source.)
> > > >
> > > > This is a problem the kernel is having to deal with, not
> > > > causing. > > The
> > > > userspace changes were upstreamed before the kernel. Userspace
> > > > > > folks
> > > > are adamantly against moving to a new elf bit, to start over
> > > > with a
> > > > clean slate. I tried everything to influence this and was not
> > > > successful. So I'm still not sure what the proposal here is for
> > > > the
> > > > kernel.
> >
> > i agree, the glibc and libgcc patches should not have been accepted
> > before a linux abi.
> >
> > but the other direction also holds: the linux patches should not be
> > pushed before the userspace design is discussed. (the current code
> > upstream is wrong, and new code for the proposed linux abi is not
> > posted yet. this is not your fault, i'm saying it here, because the
> > discussion is here.)
This series has been discussed with glibc/gcc developers regularly
throughout the enabling effort. In fact there have been ongoing
discussions about future shadow stack functionality.
It's not like this feature has been a fast or hidden effort. You are
just walking into the tail end of it. (much of it predates my
involvement BTW, including the initial glibc support)
AFAIK HJ presented the enabling changes at some glibc meeting. The
signal side of glibc is unchanged from what is already upstream. So I'm
not sure characterizing it that way is fair. It seems you were not part
of those old discussions, but that might be because your interest is
new. In any case we are constrained by some of these earlier outcomes.
More on that below.
> >
> > > > I am guessing that the fnon-call-exceptions/expanded frame size
> > > > incompatibilities could end up causing something to grow an
> > > > opt-in > > at
> > > > some point.
> >
> > there are independent userspace components and not every component
> > has a chance to opt-in.
> >
> > > > > > how does "fixed shadow stack signal frame size" relates to
> > > > > > "-fnon-call-exceptions"?
> > > > > >
> > > > > > if there were instruction boundaries within a function
> > > > > > where the
> > > > > > ret addr is not yet pushed or already poped from the shstk
> > > > > > then
> > > > > > the flag would be relevant, but since push/pop happens
> > > > > > atomically
> > > > > > at function entry/return -fnon-call-exceptions makes no
> > > > > > difference as far as shstk unwinding is concerned.
> > > >
> > > > As I said, the existing unwinding code for fnon-call-
> > > > excecptions
> > > > assumes a fixed shadow stack signal frame size of 8 bytes.
> > > > Since > > the
> > > > exception is thrown out of a signal, it needs to know how to
> > > > unwind
> > > > through the shadow stack signal frame.
> >
> > sorry but there is some misunderstanding about -fnon-call-
> > exceptions.
> >
> > it is for emitting cleanup and exception handler data for a
> > function
> > such that throwing from certain instructions within that function
> > works, while normally only throwing from calls work.
> >
> > it is not about *unwinding* from an async signal handler, which is
> > -fasynchronous-unwind-tables and should always work on linux, nor
> > for
> > dealing with cleanup/exception handlers above the interrupted frame
> > (likewise it works on linux without special cflags).
> >
> > as far as i can tell the current unwinder handles shstk unwinding
> > correctly across signal handlers (sync or async and >
> > cleanup/exceptions
> > handlers too), i see no issue with "fixed shadow stack signal frame
> > size of 8 bytes" other than future extensions and discontinous
> > shstk.
HJ, can you link your patch that makes it extensible and we can clear
this up? Maybe the issue extends beyond fnon-call-exceptions, but that
is where I reproduced it.
> >
> > > > > > there is no magic, longjmp should be implemented as:
> > > > > >
> > > > > > target_ssp = read from jmpbuf;
> > > > > > current_ssp = read ssp;
> > > > > > for (p = target_ssp; p != current_ssp; p--) {
> > > > > > if (*p == restore-token) {
> > > > > > // target_ssp is on a different
> > > > > > shstk.
> > > > > > switch_shstk_to(p);
> > > > > > break;
> > > > > > }
> > > > > > }
> > > > > > for (; p != target_ssp; p++)
> > > > > > // ssp is now on the same shstk as target.
> > > > > > inc_ssp();
> > > > > >
> > > > > > this is what setcontext is doing and longjmp can do the
> > > > > > same:
> > > > > > for programs that always longjmp within the same shstk the
> > > > > > first
> > > > > > loop is just p = current_ssp, but it also works when
> > > > > > longjmp
> > > > > > target is on a different shstk assuming nothing is running
> > > > > > on
> > > > > > that shstk, which is only possible if there is a restore
> > > > > > token
> > > > > > on top.
> > > > > >
> > > > > > this implies if the kernel switches shstk on signal entry
> > > > > > it has
> > > > > > to add a restore-token on the switched away shstk.
> > > >
> > > > I actually did a POC for this, but rejected it. The problem is,
> > > > if
> > > > there is a shadow stack overflow at that point then the kernel
> > > > > > can't
> > > > push the shadow stack token to the old stack. And shadow stack
> > > > > > overflow
> > > > is exactly the alt shadow stack use case. So it doesn't really
> > > > > > solve
> > > > the problem.
> >
> > the restore token in the alt shstk case does not regress anything
> > but
> > makes some use-cases work.
> >
> > alt shadow stack is important if code tries to jump in and out of
> > signal handlers (dosemu does this with swapcontext) and for that a
> > restore token is needed.
> >
> > alt shadow stack is important if the original shstk did not
> > overflow
> > but the signal handler would overflow it (small thread stack, huge
> > sigaltstack case).
> >
> > alt shadow stack is also important for crash reporting on shstk
> > overflow even if longjmp does not work then. longjmp to a
> > makecontext
> > stack would still work and longjmp back to the original stack can
> > be
> > made to mostly work by an altshstk option to overwrite the top
> > entry
> > with a restore token on overflow (this can break unwinding though).
> >
There was previously a request to create an alt shadow stack for the
purpose of handling shadow stack overflow. So you are now suggesting to
to exclude that and instead target a different use case for alt shadow
stack?
But I'm not sure how much we should change the ABI at this point since
we are constrained by existing userspace. If you read the history, we
may end up needing to deprecate the whole elf bit for this and other
reasons.
So should we struggle to find a way to grow the existing ABI without
disturbing the existing userspace? Or should we start with something,
finally, and see where we need to grow and maybe get a chance at a
fresh start to grow it?
Like, maybe 3 people will show up saying "hey, I *really* need to use
shadow stack and longjmp from a ucontext stack", and no one says
anything about shadow stack overflow. Then we know what to do. And
maybe dosemu decides it doesn't need to implement shadow stack (highly
likely I would think). Now that I think about it, AFAIU SS_AUTODISARM
was created for dosemu, and the alt shadow stack patch adopted this
behavior. So it's speculation that there is even a problem in that
scenario.
Or maybe people just enable WRSS for longjmp() and directly jump back
to the setjmp() point. Do most people want fast setjmp/longjmp() at the
cost of a little security?
Even if, with enough discussion, we could optimize for all
hypotheticals without real user feedback, I don't see how it helps
users to hold shadow stack. So I think we should move forward with the
current ABI.
Powered by blists - more mailing lists