linux-kernel - Re: extending ucontext (Re: [PATCH v26 25/30] x86/cet/shstk: Handle signals for shadow stack)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YIrSKlGmH84A/RF5@grain>
Date:   Thu, 29 Apr 2021 18:35:06 +0300
From:   Cyrill Gorcunov <gorcunov@...il.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Yu-cheng Yu <yu-cheng.yu@...el.com>,
        linux-arch <linux-arch@...r.kernel.org>, X86 ML <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Linux API <linux-api@...r.kernel.org>,
        Arnd Bergmann <arnd@...db.de>,
        Balbir Singh <bsingharora@...il.com>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Eugene Syromiatnikov <esyr@...hat.com>,
        Florian Weimer <fweimer@...hat.com>,
        "H.J. Lu" <hjl.tools@...il.com>, Jann Horn <jannh@...gle.com>,
        Jonathan Corbet <corbet@....net>,
        Kees Cook <keescook@...omium.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Nadav Amit <nadav.amit@...il.com>,
        Oleg Nesterov <oleg@...hat.com>, Pavel Machek <pavel@....cz>,
        Peter Zijlstra <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Vedvyas Shanbhogue <vedvyas.shanbhogue@...el.com>,
        Dave Martin <Dave.Martin@....com>,
        Weijiang Yang <weijiang.yang@...el.com>,
        Pengfei Xu <pengfei.xu@...el.com>,
        Haitao Huang <haitao.huang@...el.com>,
        Andrey Vagin <avagin@...il.com>
Subject: Re: extending ucontext (Re: [PATCH v26 25/30] x86/cet/shstk: Handle
 signals for shadow stack)

On Thu, Apr 29, 2021 at 07:44:03AM -0700, Andy Lutomirski wrote:
> On Thu, Apr 29, 2021 at 12:28 AM Cyrill Gorcunov <gorcunov@...il.com> wrote:
> >
> > On Wed, Apr 28, 2021 at 04:03:55PM -0700, Andy Lutomirski wrote:
> > > On Tue, Apr 27, 2021 at 1:44 PM Yu-cheng Yu <yu-cheng.yu@...el.com> wrote:
> > > >
> > > > When shadow stack is enabled, a task's shadow stack states must be saved
> > > > along with the signal context and later restored in sigreturn.  However,
> > > > currently there is no systematic facility for extending a signal context.
> > > > There is some space left in the ucontext, but changing ucontext is likely
> > > > to create compatibility issues and there is not enough space for further
> > > > extensions.
> > > >
> > > > Introduce a signal context extension struct 'sc_ext', which is used to save
> > > > shadow stack restore token address.  The extension is located above the fpu
> > > > states, plus alignment.  The struct can be extended (such as the ibt's
> > > > wait_endbr status to be introduced later), and sc_ext.total_size field
> > > > keeps track of total size.
> > >
> > > I still don't like this.
> > >
> > > Here's how the signal layout works, for better or for worse:
> > >
> > > The kernel has:
> > >
> > > struct rt_sigframe {
> > >     char __user *pretcode;
> > >     struct ucontext uc;
> > >     struct siginfo info;
> > >     /* fp state follows here */
> > > };
> > >
> > > This is roughly the actual signal frame.  But userspace does not have
> > > this struct declared, and user code does not know the sizes of the
> > > fields.  So it's accessed in a nonsensical way.  The signal handler
> >
> > Well, not really. While indeed this is not declared as a part of API
> > the structure is widely used for rt_sigreturn syscall (and we're using
> > it inside criu thus any change here will simply break the restore
> > procedure). Sorry out of time right now, I'll read your mail more
> > carefully once time permit.
> 
> I skimmed the CRIU code.  You appear to declare struct rt_sigframe,
> and you use the offset from the start of rt_sigframe to uc.  You also
> use the offset to the /* fp state follows here */ part, but that's
> unnecessary -- you could just as easily have put the fp state at any
> other address -- the kernel will happily follow the pointer you supply
> regardless of where it points.  So the only issues I can see are if

Yes, since rt_sigreturn is a very late stage of restore we've to preallocate
memory early and surely doing it in one slab is a way more convenient, thus
that's why fp context follows the structure declaration (well, it was at
least I didn't touch this code for years already). Anyway, I meant that
assuming "user code does not know the sizes of the fields" is not exactly
true and strictly speaking current layout became a part of nondeclared api
(simply because kernel use this structure run-time only and even if something
get changed in kernel declaration for this structure the kernel itself will
be fine because it doesn't save it on disk between kernel updates) but for
tools like criu such change might require update. IOW, I mean that changing
rt_sigreturn structure itself is not safe and we can't reorder the members
and theirs size.

Or I misunderstood you, and you meant only the fact that fp may be arbitrary,
pointing for any place in userspace? If so then indeed, fp can be pointing
to context placed anywere since kernel will fetch it from the address during
rt_sigreturn processing.

> you write the fp state on top of something else or if you
> inadvertently fill in the proposed extension part of uc_flags.  Right
> now you seem to be ignoring uc_flags, which I presume means that you
> are filling it in as zero.  Even if the offset of the fp state in the

yes

> kernel rt_sigframe changes, the kernel should still successfully parse
> the signal frame you generate.
> 
> I suppose there is another potential issue: would CRIU have issues if
> the *save* runs on a kernel that uses this proposed extension
> mechanism?  Are you doing something with the saved state that would
> get confused?

You know, actually I guess criu may not work with new extensions, we have
a predefined max xsave frame size and any new extensions to cpu states
will require additional support.

I must confess I didn't follow much on CET internals yet so I must be simply
wrong. CC'ing Anrew just in case.

	Cyrill