lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrURmk4ZijJVUtJwouj=_0NPiUvUFr9XMvdniRRFqeU+fg@mail.gmail.com>
Date:   Thu, 25 Mar 2021 21:38:24 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     libc-alpha <libc-alpha@...rceware.org>,
        "H. J. Lu" <hjl.tools@...il.com>, X86 ML <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Bae, Chang Seok" <chang.seok.bae@...el.com>,
        Florian Weimer <fweimer@...hat.com>,
        "Carlos O'Donell" <carlos@...hat.com>,
        Rich Felker <dalias@...c.org>
Subject: Why does glibc use AVX-512?

Hi all-

glibc appears to use AVX512F for memcpy by default.  (Unless
Prefer_ERMS is default-on, but I genuinely can't tell if this is the
case.  I did some searching.)  The commit adding it refers to a 2016
email saying that it's 30% on KNL.  Unfortunately, AVX-512 is now
available in normal hardware, and the overhead from switching between
normal and AVX-512 code appears to vary from bad to genuinely
horrible.  And, once anything has used the high parts of YMM and/or
ZMM, those states tend to get stuck with XINUSE=1.

I'm wondering whether glibc should stop using AVX-512 by default.

Meanwhile, some of you may have noticed a little ABI break we have.
On AVX-512 hardware, the size of a signal frame is unreasonably large,
and this is causing problems even for existing software that doesn't
use AVX-512.  Do any of you have any clever ideas for how to fix it?
We have some kernel patches around to try to fail more cleanly, but we
still fail.

I think we should seriously consider solutions in which, for new
tasks, XCR0 has new giant features (e.g. AMX) and possibly even
AVX-512 cleared, and programs need to explicitly request enablement.
This would allow programs to opt into not saving/restoring across
signals or to save/restore in buffers supplied when the feature is
enabled.  This has all kinds of pros and cons, and I'm not sure it's a
great idea.  But, in the absence of some change to the ABI, the
default outcome is that, on AMX-enabled kernels on AMX-enabled
hardware, the signal frame will be more than 8kB, and this will affect
*every* signal regardless of whether AMX is in use.

--Andy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ