lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrV_sQnu0u+wKZrAL2-500EHoQ6d4LgRhCWwRhK-4Z3X7A@mail.gmail.com>
Date:   Mon, 29 Mar 2021 22:08:59 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Len Brown <lenb@...nel.org>
Cc:     Greg KH <gregkh@...uxfoundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        "Bae, Chang Seok" <chang.seok.bae@...el.com>,
        Dave Hansen <dave.hansen@...el.com>, X86 ML <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        libc-alpha <libc-alpha@...rceware.org>,
        Florian Weimer <fweimer@...hat.com>,
        Rich Felker <dalias@...c.org>, Kyle Huey <me@...ehuey.com>,
        Keno Fischer <keno@...iacomputing.com>,
        Linux API <linux-api@...r.kernel.org>
Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

On Mon, Mar 29, 2021 at 3:38 PM Len Brown <lenb@...nel.org> wrote:
>
> On Mon, Mar 29, 2021 at 2:16 PM Andy Lutomirski <luto@...capital.net> wrote:
> >

> Hi Andy,
>
> Can you provide a concise definition of the exact problemI(s) this thread
> is attempting to address?

The AVX-512 state, all by itself, is more than 2048 bytes.  Quoting
the POSIX sigaltstack page (man 3p sigaltstack):

       The  value  SIGSTKSZ is a system default specifying the number of bytes
       that would be used to cover the usual case when manually allocating  an
       alternate  stack area. The value MINSIGSTKSZ is defined to be the mini‐
       mum stack size for a signal handler. In computing  an  alternate  stack
       size, a program should add that amount to its stack requirements to al‐
       low for the system implementation overhead. The  constants  SS_ONSTACK,
       SS_DISABLE, SIGSTKSZ, and MINSIGSTKSZ are defined in <signal.h>.

arch/x86/include/uapi/asm/signal.h:#define MINSIGSTKSZ    2048
arch/x86/include/uapi/asm/signal.h:#define SIGSTKSZ    8192

Regrettably, the Linux signal frame format is the uncompacted format
and, also regrettably, the uncompacted format has the nasty property
that its format depends on XCR0 but not on the set of registers that
are actually used or wanted, so, with the current ABI, the signal
frame is stuck being quite large for all programs on a machine that
supports avx512 and has it enabled by the kernel.  And it's even
larger for AMX and violates SIGSTKSZ as well as MINSTKSZ.

There are apparently real programs that break as a result.  We need to
find a way to handle new, large extended states without breaking user
ABI.  We should also find a way to handle them without consuming silly
amounts of stack space for programs that don't use them.

Sadly, if the solution we settle on involves context switching XCR0,
performance on first-generation hardware will suffer because VMX does
not have any way to allow guests to write XCR0 without exiting.  I
don't consider this to be a showstopper -- if we end up having this
problem, fixing it in subsequent CPUs is straightforward.

>
> Thank ahead-of-time for excluding "blow up power consumption",
> since that paranoia is not grounded in fact.
>

I will gladly exclude power consumption from this discussion, since
that's a separate issue that has nothing to do with the user<->kernel
ABI.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ