lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUxc1Y=4XmNthSSCeQfJ_SKNS762Nt3B-CynE-rhRSBGw@mail.gmail.com>
Date:   Thu, 15 Apr 2021 09:24:35 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Len Brown <lenb@...nel.org>
Cc:     Andy Lutomirski <luto@...nel.org>, Willy Tarreau <w@....eu>,
        Florian Weimer <fweimer@...hat.com>,
        "Bae, Chang Seok" <chang.seok.bae@...el.com>,
        Dave Hansen <dave.hansen@...el.com>, X86 ML <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, linux-abi@...r.kernel.org,
        "libc-alpha@...rceware.org" <libc-alpha@...rceware.org>,
        Rich Felker <dalias@...c.org>, Kyle Huey <me@...ehuey.com>,
        Keno Fischer <keno@...iacomputing.com>
Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

On Wed, Apr 14, 2021 at 2:48 PM Len Brown <lenb@...nel.org> wrote:
>

>
> > Then I take the transition penalty into and out of AMX code (I'll
> > believe there is no penalty when I see it -- we've had a penalty with
> > VEX and with AVX-512) and my program runs *slower*.
>
> If you have a clear definition of what "transition penalty" is, please share it.

Given the generally awful state of Intel's documentation about these
issues, it's quite hard to tell for real.  But here are some examples.

VEX: Figures 11-1 ("AVX-SSE Transitions in the Broadwell, and Prior
Generation Microarchitectures") and 11-2 ("AVX-SSE Transitions in the
Skylake Microarchitecture").  We *still* have a performance regression
in the upstream kernel because, despite all common sense, the CPUs
consider LDMXCSR to be an SSE instruction and VLDMXCSR to be an AVX
instruction despite the fact that neither one of them touch the XMM or
YMM state at all.

AVX-512:

https://lore.kernel.org/linux-crypto/CALCETrU06cuvUF5NDSm8--dy3dOkxYQ88cGWaakOQUE4Vkz88w@mail.gmail.com/

https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html





>
> Lacking one, I'll assume you are referring to the
> impact on turbo frequency of using AMX hardware?
>
> Again...
>
> On the hardware that supports AMX, there is zero impact on frequency
> due to the presence of AMX state, whether modified or unmodified.
>
> We resolved on another thread that Linux will never allow entry
> into idle with modified AMX state, and so AMX will have zero impact
> on the ability of the process to enter deep power-saving C-states.
>
> It is true that AMX activity is considered when determining max turbo.
> (as it must be)
> However, the *release* of the turbo credits consumed by AMX is
> "several orders of magnitude" faster on this generation
> than it was for AVX-512 on pre-AMX hardware.

What is the actual impact of a trivial function that initializes the
tile config, does one tiny math op, and then does TILERELEASE?

> Yes, the proposal, and the working patch set on the list, context
> switches XFD -- which is exactly what that hardware was designed to do.
> If the old and new tasks have the same value of XFD, the MSR write is skipped.
>
> I'm not aware of any serious proposal to context-switch XCR0,
> as it would break the current programming model, where XCR0
> advertises what the OS supports.  It would also impact performance,
> as every write to XCR0 necessarily provokes a VMEXIT.

You're arguing against a nonsensical straw man.

In the patches, *as submitted*, if you trip the XFD #NM *once* and you
are the only thread on the system to do so, you will eat the cost of a
WRMSR on every subsequent context switch.  This is not free.  If we
use XCR0 (I'm not saying we will -- I'm just mentioning at a
possibility), then the penalty is presumably worse due to the VMX
issue.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ