lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <69721125-4e1c-ca9c-ff59-8e1331933e6c@intel.com>
Date:   Wed, 14 Oct 2020 09:29:21 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     "Brown, Len" <len.brown@...el.com>,
        Andy Lutomirski <luto@...nel.org>,
        "Bae, Chang Seok" <chang.seok.bae@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...e.de>,
        X86 ML <x86@...nel.org>, "Liu, Jing2" <jing2.liu@...el.com>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 13/22] x86/fpu/xstate: Expand dynamic user state area
 on first use

On 10/14/20 9:10 AM, Andy Lutomirski wrote:
>> Actually, I think the modified optimization would survive such a scheme:
>>
>>  * copy page array into percpu area
>>  * XRSTORS from percpu area, modified optimization tuple is saved
>>  * run userspace
>>  * XSAVES back to percpu area.  tuple matches, modified optimization
>>    is still in play
>>  * copy percpu area back to page array
>>
>> Since the XRSTORS->XSAVES pair is both done to the percpu area, the
>> XSAVE tracking hardware never knows it isn't working on the "canonical"
>> buffer (the page array).
> I was suggesting something a little bit different.  We'd keep XMM,
> YMM, ZMM, etc state stored exactly the way we do now and, for
> AMX-using tasks, we would save the AMX state in an entirely separate
> buffer.  This way the pain of having a variable xstate layout is
> confined just to AMX tasks.

OK, got it.

So, we'd either need a second set of XSAVE/XRSTORs, or "manual" copying
of the registers out to memory.  We can preserve the modified
optimization if we're careful about ordering, but only for *ONE* of the
XSAVE buffers (if we use two).

> I'm okay with vmalloc() too, but I do think we need to deal with the
> various corner cases like allocation failing.

Yeah, agreed about handling the corner cases.  Also, if we preserve
plain old vmalloc() for now, we need good tracepoints or stats so we can
precisely figure out how many vmalloc()s (and IPIs) are due to AMX.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ