lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 31 Mar 2021 09:53:50 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Len Brown <lenb@...nel.org>
Cc:     David Laight <David.Laight@...lab.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Andy Lutomirski <luto@...nel.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        "Bae, Chang Seok" <chang.seok.bae@...el.com>,
        X86 ML <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
        libc-alpha <libc-alpha@...rceware.org>,
        Florian Weimer <fweimer@...hat.com>,
        Rich Felker <dalias@...c.org>, Kyle Huey <me@...ehuey.com>,
        Keno Fischer <keno@...iacomputing.com>,
        Linux API <linux-api@...r.kernel.org>
Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features


> On Mar 31, 2021, at 9:31 AM, Len Brown <lenb@...nel.org> wrote:
> 
> On Tue, Mar 30, 2021 at 6:01 PM David Laight <David.Laight@...lab.com> wrote:
> 
>>> Can we leave it in live registers?  That would be the speed-of-light
>>> signal handler approach.  But we'd need to teach the signal handler to
>>> not clobber it.  Perhaps that could be part of the contract that a
>>> fast signal handler signs?  INIT=0 AMX state could simply sit
>>> patiently in the AMX registers for the duration of the signal handler.
>>> You can't get any faster than doing nothing :-)
>>> 
>>> Of course part of the contract for the fast signal handler is that it
>>> knows that it can't possibly use XRESTOR of the stuff on the stack to
>>> necessarily get back to the state of the signaled thread (assuming we
>>> even used XSTATE format on the fast signal handler stack, it would
>>> forget the contents of the AMX registers, in this example)
>> 
>> gcc will just use the AVX registers for 'normal' code within
>> the signal handler.
>> So it has to have its own copy of all the registers.
>> (Well, maybe you could make the TMX instructions fault,
>> but that would need a nested signal delivered.)
> 
> This is true, by default, but it doesn't have to be true.
> 
> Today, gcc has an annotation for user-level interrupts
> https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes
> 
> An analogous annotation could be created for fast signals.
> gcc can be told exactly what registers and instructions it can use for
> that routine.
> 
> Of course, this begs the question about what routines that handler calls,
> and that would need to be constrained too.
> 
> Today signal-safety(7) advises programmers to limit what legacy signal handlers
> can call.  There is no reason that a fast-signal-safety(7) could not be created
> for the fast path.
> 
>> There is also the register save buffer that you need in order
>> to long-jump out of a signal handler.
>> Unfortunately that is required to work.
>> I'm pretty sure the original setjmp/longjmp just saved the stack
>> pointer - but that really doesn't work any more.
>> 
>> OTOH most signal handlers don't care - but there isn't a flag
>> to sigset() (etc) so ask for a specific register layout.
> 
> Right, the idea is to optimize for *most* signal handlers,
> since making any changes to *all* signal handlers is intractable.
> 
> So the idea is that opting-in to a fast signal handler would opt-out
> of some legacy signal capibilities.  Complete state is one of them,
> and thus long-jump is not supported, because the complete state
> may not automatically be available.

Long jump is probably the easiest problem of all: sigsetjmp() is a *function*, following ABI, so sigsetjmp() is expected to clobber most or all of the extended state.

But this whole annotation thing will require serious compiler support. We already have problems with compilers inlining functions and getting confused about attributes.

An API like:

if (get_amx()) {
 use AMX;
} else {
 don’t;
}

Avoids this problem. And making XCR0 dynamic, for all its faults, at least helps force a degree of discipline on user code.


> 
> thanks,
> Len Brown, Intel Open Source Technology Center

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ