[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrURmk4ZijJVUtJwouj=_0NPiUvUFr9XMvdniRRFqeU+fg@mail.gmail.com>
Date: Thu, 25 Mar 2021 21:38:24 -0700
From: Andy Lutomirski <luto@...nel.org>
To: libc-alpha <libc-alpha@...rceware.org>,
"H. J. Lu" <hjl.tools@...il.com>, X86 ML <x86@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
"Bae, Chang Seok" <chang.seok.bae@...el.com>,
Florian Weimer <fweimer@...hat.com>,
"Carlos O'Donell" <carlos@...hat.com>,
Rich Felker <dalias@...c.org>
Subject: Why does glibc use AVX-512?
Hi all-
glibc appears to use AVX512F for memcpy by default. (Unless
Prefer_ERMS is default-on, but I genuinely can't tell if this is the
case. I did some searching.) The commit adding it refers to a 2016
email saying that it's 30% on KNL. Unfortunately, AVX-512 is now
available in normal hardware, and the overhead from switching between
normal and AVX-512 code appears to vary from bad to genuinely
horrible. And, once anything has used the high parts of YMM and/or
ZMM, those states tend to get stuck with XINUSE=1.
I'm wondering whether glibc should stop using AVX-512 by default.
Meanwhile, some of you may have noticed a little ABI break we have.
On AVX-512 hardware, the size of a signal frame is unreasonably large,
and this is causing problems even for existing software that doesn't
use AVX-512. Do any of you have any clever ideas for how to fix it?
We have some kernel patches around to try to fail more cleanly, but we
still fail.
I think we should seriously consider solutions in which, for new
tasks, XCR0 has new giant features (e.g. AMX) and possibly even
AVX-512 cleared, and programs need to explicitly request enablement.
This would allow programs to opt into not saving/restoring across
signals or to save/restore in buffers supplied when the feature is
enabled. This has all kinds of pros and cons, and I'm not sure it's a
great idea. But, in the absence of some change to the ABI, the
default outcome is that, on AMX-enabled kernels on AMX-enabled
hardware, the signal frame will be more than 8kB, and this will affect
*every* signal regardless of whether AMX is in use.
--Andy
Powered by blists - more mailing lists