[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210320173200.GA4153106@gmail.com>
Date: Sat, 20 Mar 2021 18:32:00 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Len Brown <lenb@...nel.org>
Cc: "Chang S. Bae" <chang.seok.bae@...el.com>,
Borislav Petkov <bp@...e.de>,
Thomas Gleixner <tglx@...utronix.de>,
Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
"Brown, Len" <len.brown@...el.com>,
Dave Hansen <dave.hansen@...el.com>, hjl.tools@...il.com,
Dave Martin <Dave.Martin@....com>, jannh@...gle.com,
mpe@...erman.id.au, carlos@...hat.com,
"bothersome-borer for tony.luck@...el.com" <tony.luck@...el.com>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
libc-alpha@...rceware.org, linux-arch@...r.kernel.org,
linux-api@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v7 0/6] x86: Improve Minimum Alternate Stack Size
* Len Brown <lenb@...nel.org> wrote:
> On Wed, Mar 17, 2021 at 6:45 AM Ingo Molnar <mingo@...nel.org> wrote:
> >
> >
> > * Ingo Molnar <mingo@...nel.org> wrote:
> >
> > >
> > > * Chang S. Bae <chang.seok.bae@...el.com> wrote:
> > >
> > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > which has grown over time as new features and larger registers have been
> > > > added to the architecture.
> > > >
> > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > constant indicates to userspace how much data the kernel expects to push on
> > > > the user stack, [2][3].
> > > >
> > > > However, this constant is much too small and does not reflect recent
> > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > >
> > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > cause user stack overflow when delivering a signal.
> > >
> > > > uapi: Define the aux vector AT_MINSIGSTKSZ
> > > > x86/signal: Introduce helpers to get the maximum signal frame size
> > > > x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ
> > > > selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available
> > > > x86/signal: Detect and prevent an alternate signal stack overflow
> > > > selftest/x86/signal: Include test cases for validating sigaltstack
> > >
> > > So this looks really complicated, is this justified?
> > >
> > > Why not just internally round up sigaltstack size if it's too small?
> > > This would be more robust, as it would fix applications that use
> > > MINSIGSTKSZ but don't use the new AT_MINSIGSTKSZ facility.
> > >
> > > I.e. does AT_MINSIGSTKSZ have any other uses than avoiding the
> > > segfault if MINSIGSTKSZ is used to create a small signal stack?
> >
> > I.e. if the kernel sees a too small ->ss_size in sigaltstack() it
> > would ignore ->ss_sp and mmap() a new sigaltstack instead and use that
> > for the signal handler stack.
> >
> > This would automatically make MINSIGSTKSZ - and other too small sizes
> > work today, and in the future.
> >
> > But the question is, is there user-space usage of sigaltstacks that
> > relies on controlling or reading the contents of the stack?
> >
> > longjmp using programs perhaps?
>
> For the legacy binary that requests a too-small sigaltstack, there are
> several choices:
>
> We could detect the too-small stack at sigaltstack(2) invocation and
> return an error.
> This results in two deal-killing problems:
> First, some applications don't check the return value, so the check
> would be fruitless.
> Second, those that check and error-out may be programs that never
> actually take the signal, and so we'd be causing a dusty binary to
> exit, when it didn't exit on another system, or another kernel.
>
> Or we could detect the too small stack at signal registration time.
> This has the same two deal-killers as above.
>
> Then there is the approach in this patch-set, which detects an
> imminent stack overflow at run time.
> It has neither of the two problems above, and the benefit that we now
> prevent data corruption
> that could have been happening on some systems already today. The
> down side is that the dusty binary
> that does request the too-small stack can now die at run time.
>
> So your idea of recognizing the problem and conjuring up a
> sufficient stack is compelling, since it would likely "just work",
> no matter how dumb the program. But where would the the sufficient
> stack come from -- is this a new kernel buffer, or is there a way to
> abscond some user memory? I would expect a signal handler to look
> at the data on its stack and nobody else will look at that stack.
> But this is already an unreasonable program for allocating a special
> signal stack in the first place :-/ So yes, one could imagine the
> signal handler could longjump instead of gracefully completing, and
> if this specially allocated signal stack isn't where the user
> planned, that could be trouble.
We could mmap() (implicitly) new anonymous memory - but I can see why
this is probably more trouble than worth...
> Another idea we discussed was to detect the potential overflow at
> run-time, and instead of killing the process, just push the signal
> onto the regular user stack. this might actually work, but it is
> sort of devious; and it would not work in the case where the user
> overflowed their regular stack already, which may be the most
> (only?) compelling reason that they allocated and declared a special
> sigaltstack in the first place...
Yeah, this doesn't sound deterministic enough.
Ok, thanks for the detailed answers - I withdraw my objections, let's
proceed with the approach you are proposing?
Thanks,
Ingo
Powered by blists - more mailing lists