[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJvTdK=6B8fXasshqOoMknAt25vWPDW6LVLovOhnmY10ZEdL1Q@mail.gmail.com>
Date: Tue, 18 May 2021 16:39:27 -0400
From: Len Brown <lenb@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Borislav Petkov <bp@...en8.de>, Willy Tarreau <w@....eu>,
Andy Lutomirski <luto@...nel.org>,
Florian Weimer <fweimer@...hat.com>,
"Bae, Chang Seok" <chang.seok.bae@...el.com>,
Dave Hansen <dave.hansen@...el.com>, X86 ML <x86@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, linux-abi@...r.kernel.org,
"libc-alpha@...rceware.org" <libc-alpha@...rceware.org>,
Rich Felker <dalias@...c.org>, Kyle Huey <me@...ehuey.com>,
Keno Fischer <keno@...iacomputing.com>
Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features
On Sat, May 8, 2021 at 5:45 AM Thomas Gleixner <tglx@...utronix.de> wrote:
> Where is #6 which describes the signal interaction?
#6 Per the current ABI, Linux gives signal handlers access to all of
the hardware architectural state.
#6a Signal Stack is on User Stack
The architectural state is pushed on the user stack in uncompressed
XSTATE format.
It is established that there exists application code that counts on
this opaque state being complete so that it can do a user-space
XRESTORE instead of a sigreturn(2). (My opinion is that not breaking
that legacy code is a requirement, and I'm actually shocked this view
is not unanimous)
If a feature is enabled in XCR0 but is in INIT state, the XSAVE will
transfer zeros.
While this is established for AVX-512, we optimize for this case for AMX
by checking for this scenario and not transferring any data.
(this optimization, and the self-test for it, is in AMX patch series v5)
The signal hander is empowered to alter everything in XSTATE on the
signal stack.
Upon sigreturn, the kernel will dutifully XRESTORE the XSTATE.
#6b Applications that allocate and register a dedicated alternate signal stack
Run-time is similar to above, except the user has allocated a
dedicated signal stack.
The problem is that the user had to decide this stack's size.
Unfortunately, signal.h ABI contained #define MIN/SIG-STACKSIZE
(2k/8k) constants, which were:
a) constant
b) not updated in decades
The kernel, for its part, also failed to check that an altstack was
big enough before writing to it.
Indeed, AVX-512 made the 2k constant a lie, which Andy points out is
ABI breakage.
This is factual, and there were real programs that broke because of it.
Were AMX to be deployed in this manner without repairing the broken ABI,
the 8K state would exceed both of these constants, and that would be
more severe breakage than AVX-512.
glibc 2.34 addressed both the existing and future problem, by updating
these constants
to be calculated at run-time. The run-time calculation can be done
entirely in glibc,
or if glibc is running on an updated kernel, it will ask the kernel
for the size via altvec.
Further, the kernel has been updated to check for alt-stack too-small
at run-time.
https://lore.kernel.org/lkml/20210518200320.17239-1-chang.seok.bae@intel.com/
I believe that all feedback has been addressed in that patch series,
and that it is ready for linux-next.
There are still two potential failures on systems that have AVX-512/AMX enabled:
1. program, re-compiled or not, that hard-codes its own too-small alt-stack
2. legacy static binary using old signal.h constants to allocate alt-stack.
The kernel will not prohibit these programs from executing, but if they actually
take a signal, the kernel will SIGSEGV them instead of overflowing their stack.
Len Brown, Intel Open Source Technology Center
Powered by blists - more mailing lists