lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVbbDm1P21yG4HWPCuipPdBT6-Kdd5sRZpaZjjQr9euDw@mail.gmail.com>
Date:   Fri, 16 Apr 2021 15:03:39 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Len Brown <lenb@...nel.org>
Cc:     Andy Lutomirski <luto@...nel.org>, Willy Tarreau <w@....eu>,
        Florian Weimer <fweimer@...hat.com>,
        "Bae, Chang Seok" <chang.seok.bae@...el.com>,
        Dave Hansen <dave.hansen@...el.com>, X86 ML <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, linux-abi@...r.kernel.org,
        "libc-alpha@...rceware.org" <libc-alpha@...rceware.org>,
        Rich Felker <dalias@...c.org>, Kyle Huey <me@...ehuey.com>,
        Keno Fischer <keno@...iacomputing.com>
Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

On Fri, Apr 16, 2021 at 2:54 PM Len Brown <lenb@...nel.org> wrote:
>
> On Thu, Apr 15, 2021 at 12:24 PM Andy Lutomirski <luto@...nel.org> wrote:
> > On Wed, Apr 14, 2021 at 2:48 PM Len Brown <lenb@...nel.org> wrote:
>
> > > > ... the transition penalty into and out of AMX code
>
> The concept of 'transition' exists between AVX and SSE instructions
> because it is possible to mix both instruction sets and touch different
> parts of the same registers.  The "unused" parts of those registers
> need to be tracked to assure that data is not lost when mixing.

I get it.  That does not explain why LDMXCSR and VLDMXCSR cause
pipelines stalls.

>
> This concept is moot with AMX, which has its own dedicated registers.
>
> > What is the actual impact of a trivial function that initializes the
> > tile config, does one tiny math op, and then does TILERELEASE?


^^^^ "does one tiny math op"


AVX-512 *also* has sort-of-dedicated registers: ZMM16 and up.  I still
can't find any conclusive evidence as to whether that avoids the
performance hit.

Intel's track record at actually explaining what operations cause what
particular performance disasters is poor, and your explanation is not
helping the situation.  Sorry.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ