linux-kernel - Re: [PATCH 1/2] x86/fpu: Measure the Latency of XSAVE and XRSTOR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f2a0e0e0-79f2-1b5c-2bcd-b6037d479d4e@intel.com>
Date:   Mon, 25 Jul 2022 10:44:23 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     David Laight <David.Laight@...LAB.COM>,
        'Yi Sun' <yi.sun@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>
Cc:     "sohil.mehta@...el.com" <sohil.mehta@...el.com>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "heng.su@...el.com" <heng.su@...el.com>
Subject: Re: [PATCH 1/2] x86/fpu: Measure the Latency of XSAVE and XRSTOR

On 7/24/22 13:54, David Laight wrote:
> I've done some experiments that measure short instruction latencies.
> Basically I found:

Short?  The instructions in question can write up to about 12k of data.
 That's not "short" by any means.

I'm also not sure precision here is all that important.  The main things
we want to know here when and where the init and modified optimizations
are coming into play.  In other words, how often is there actual data
that *needs* to be saved and restored and can't be optimized away.

So, sure, if we were measuring a dozen cycles here, you could make an
argument that this _might_ be problematic.

But, in this case, we really just want to be able to tell when
XSAVE/XRSTOR are getting more or less expensive and also get out a
minimal amount of data (RFBM/XINUSE) to make a guess why that might be.

Is it *REALLY* worth throwing serializing instructions in and moving
clock sources to do that?  Is the added precision worth it?