lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJvTdK=p8mgO3xw9sRxu0c7NTNTG109M442b3UZh8TqLLfkC1Q@mail.gmail.com>
Date:   Mon, 19 Apr 2021 14:18:51 -0400
From:   Len Brown <lenb@...nel.org>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Willy Tarreau <w@....eu>, Andy Lutomirski <luto@...nel.org>,
        Florian Weimer <fweimer@...hat.com>,
        "Bae, Chang Seok" <chang.seok.bae@...el.com>,
        Dave Hansen <dave.hansen@...el.com>, X86 ML <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, linux-abi@...r.kernel.org,
        "libc-alpha@...rceware.org" <libc-alpha@...rceware.org>,
        Rich Felker <dalias@...c.org>, Kyle Huey <me@...ehuey.com>,
        Keno Fischer <keno@...iacomputing.com>
Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

On Mon, Apr 19, 2021 at 10:15 AM Borislav Petkov <bp@...en8.de> wrote:

> > Tasks are created without an 8KB AMX buffer.
> > Tasks have to actually touch the AMX TILE registers for us to allocate
> > one for them.
>
> When tasks do that it doesn't matter too much - for the library it does!
>
> If the library does that by default and the processes which comprise
> of that pipe I mentioned earlier, get all 8K buffers because the
> underlying library decided so and swinging those buffers around when
> saving/restoring contexts turns out to be a performance penalty, then we
> have lost.
>
> Lost because if that thing goes upstream in this way of use of AMX is
> allowed implicitly, there ain't fixing it anymore once it becomes an
> ABI.
>
> So, that library should ask the kernel whether it supports AMX and only
> use it if has gotten a positive answer.

Right, the library *does* ask the kernel whether it supports AMX (below).

> And by default that answer
> should be "no" because the majority of processes - that same pipe I keep
> mentioning - don't need it.

Indeed, the default is "no" because most libraries will *not* ask the system
for AMX support (below).  However, if they *did* probe for it,
and they *did* use it, the kernel would not stand in the way of
any of those requests.

> I have no good idea yet how granulary that should be - per process, per
> thread, whatever, but there should be a way for the kernel to control
> whether the library uses AMX, AVX512 or whatever fat state is out there
> available.
>
> Then, if a process wants the library to use AMX on its behalf, then it
> can say so and the library can do that but only after having asked for
> explicitly.

The ABI works like this:

0. App or library author decides AMX is useful at build-time.

1. App checks CPUID for AMX CPU feature
2. App checks XCR0 for AMX OS support

(if app touches AMX without these two being TRUE,
 it will suffer the consequence of a #UD when it touches an AMX instruction)

This ABI is how AVX works today.

What is new with AMX is the ability of the hardware and the OS
to delay the allocation of the context switch buffer until if/when
it is actually needed.

This is transparent, and thus not part of the ABI, unless you count
the absence of a mandated system call to be an ABI.

3. the application then touches an AMX register, triggering...
4.  #NM handled by the kernel, which allocates a context switch buffer
for that task, and dis-arms XFD.

Yes, we could invent a new system call and mandate that it be called
between #2 and #3.  However, we'd still do #4 in response, so I don't see any
value to that system call.  Indeed, I would advocate that glibc
replace it with a return statement.

So back to the example:
<process> | grep | awk | sed ...

Sure, if grep grows support for some AI feature that we haven't imaged
yet, then something in
its code flow is fully empowered to probe for AMX and use AMX on AMX hardware.
Sort of hard to imagine with the programs above that we know today,
but future programs
certainly could do this if they chose to.

thanks,
Len Brown, Intel Open Source Technology Center

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ