[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2a0e0e0-79f2-1b5c-2bcd-b6037d479d4e@intel.com>
Date: Mon, 25 Jul 2022 10:44:23 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: David Laight <David.Laight@...LAB.COM>,
'Yi Sun' <yi.sun@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>
Cc: "sohil.mehta@...el.com" <sohil.mehta@...el.com>,
"tony.luck@...el.com" <tony.luck@...el.com>,
"heng.su@...el.com" <heng.su@...el.com>
Subject: Re: [PATCH 1/2] x86/fpu: Measure the Latency of XSAVE and XRSTOR
On 7/24/22 13:54, David Laight wrote:
> I've done some experiments that measure short instruction latencies.
> Basically I found:
Short? The instructions in question can write up to about 12k of data.
That's not "short" by any means.
I'm also not sure precision here is all that important. The main things
we want to know here when and where the init and modified optimizations
are coming into play. In other words, how often is there actual data
that *needs* to be saved and restored and can't be optimized away.
So, sure, if we were measuring a dozen cycles here, you could make an
argument that this _might_ be problematic.
But, in this case, we really just want to be able to tell when
XSAVE/XRSTOR are getting more or less expensive and also get out a
minimal amount of data (RFBM/XINUSE) to make a guess why that might be.
Is it *REALLY* worth throwing serializing instructions in and moving
clock sources to do that? Is the added precision worth it?
Powered by blists - more mailing lists