linux-kernel - Re: [PATCH v7 0/6] x86: Improve Minimum Alternate Stack Size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210320173200.GA4153106@gmail.com>
Date:   Sat, 20 Mar 2021 18:32:00 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Len Brown <lenb@...nel.org>
Cc:     "Chang S. Bae" <chang.seok.bae@...el.com>,
        Borislav Petkov <bp@...e.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
        "Brown, Len" <len.brown@...el.com>,
        Dave Hansen <dave.hansen@...el.com>, hjl.tools@...il.com,
        Dave Martin <Dave.Martin@....com>, jannh@...gle.com,
        mpe@...erman.id.au, carlos@...hat.com,
        "bothersome-borer for tony.luck@...el.com" <tony.luck@...el.com>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        libc-alpha@...rceware.org, linux-arch@...r.kernel.org,
        linux-api@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v7 0/6] x86: Improve Minimum Alternate Stack Size


* Len Brown <lenb@...nel.org> wrote:

> On Wed, Mar 17, 2021 at 6:45 AM Ingo Molnar <mingo@...nel.org> wrote:
> >
> >
> > * Ingo Molnar <mingo@...nel.org> wrote:
> >
> > >
> > > * Chang S. Bae <chang.seok.bae@...el.com> wrote:
> > >
> > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > which has grown over time as new features and larger registers have been
> > > > added to the architecture.
> > > >
> > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > constant indicates to userspace how much data the kernel expects to push on
> > > > the user stack, [2][3].
> > > >
> > > > However, this constant is much too small and does not reflect recent
> > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > >
> > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > cause user stack overflow when delivering a signal.
> > >
> > > >   uapi: Define the aux vector AT_MINSIGSTKSZ
> > > >   x86/signal: Introduce helpers to get the maximum signal frame size
> > > >   x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ
> > > >   selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available
> > > >   x86/signal: Detect and prevent an alternate signal stack overflow
> > > >   selftest/x86/signal: Include test cases for validating sigaltstack
> > >
> > > So this looks really complicated, is this justified?
> > >
> > > Why not just internally round up sigaltstack size if it's too small?
> > > This would be more robust, as it would fix applications that use
> > > MINSIGSTKSZ but don't use the new AT_MINSIGSTKSZ facility.
> > >
> > > I.e. does AT_MINSIGSTKSZ have any other uses than avoiding the
> > > segfault if MINSIGSTKSZ is used to create a small signal stack?
> >
> > I.e. if the kernel sees a too small ->ss_size in sigaltstack() it
> > would ignore ->ss_sp and mmap() a new sigaltstack instead and use that
> > for the signal handler stack.
> >
> > This would automatically make MINSIGSTKSZ - and other too small sizes
> > work today, and in the future.
> >
> > But the question is, is there user-space usage of sigaltstacks that
> > relies on controlling or reading the contents of the stack?
> >
> > longjmp using programs perhaps?
> 
> For the legacy binary that requests a too-small sigaltstack, there are
> several choices:
> 
> We could detect the too-small stack at sigaltstack(2) invocation and
> return an error.
> This results in two deal-killing problems:
> First, some applications don't check the return value, so the check
> would be fruitless.
> Second, those that check and error-out may be programs that never
> actually take the signal, and so we'd be causing a dusty binary to
> exit, when it didn't exit on another system, or another kernel.
> 
> Or we could detect the too small stack at signal registration time.
> This has the same two deal-killers as above.
> 
> Then there is the approach in this patch-set, which detects an
> imminent stack overflow at run time.
> It has neither of the two problems above, and the benefit that we now
> prevent data corruption
> that could have been happening on some systems already today.  The
> down side is that the dusty binary
> that does request the too-small stack can now die at run time.
> 
> So your idea of recognizing the problem and conjuring up a 
> sufficient stack is compelling, since it would likely "just work", 
> no matter how dumb the program. But where would the the sufficient 
> stack come from -- is this a new kernel buffer, or is there a way to 
> abscond some user memory?  I would expect a signal handler to look 
> at the data on its stack and nobody else will look at that stack.  
> But this is already an unreasonable program for allocating a special 
> signal stack in the first place :-/ So yes, one could imagine the 
> signal handler could longjump instead of gracefully completing, and 
> if this specially allocated signal stack isn't where the user 
> planned, that could be trouble.

We could mmap() (implicitly) new anonymous memory - but I can see why 
this is probably more trouble than worth...

> Another idea we discussed was to detect the potential overflow at 
> run-time, and instead of killing the process, just push the signal 
> onto the regular user stack. this might actually work, but it is 
> sort of devious; and it would not work in the case where the user 
> overflowed their regular stack already, which may be the most 
> (only?) compelling reason that they allocated and declared a special 
> sigaltstack in the first place...

Yeah, this doesn't sound deterministic enough.

Ok, thanks for the detailed answers - I withdraw my objections, let's 
proceed with the approach you are proposing?

Thanks,

	Ingo