[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bkws6hmc.ffs@tglx>
Date: Fri, 22 Apr 2022 21:30:19 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Tom Lendacky <thomas.lendacky@....com>,
Dave Hansen <dave.hansen@...el.com>,
LKML <linux-kernel@...r.kernel.org>
Cc: x86@...nel.org, Andrew Cooper <andrew.cooper3@...rix.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
Subject: Re: [patch 3/3] x86/fpu/xsave: Optimize XSAVEC/S when XGETBV1 is
supported
On Wed, Apr 20 2022 at 13:15, Tom Lendacky wrote:
> On 4/19/22 16:22, Thomas Gleixner wrote:
>>> That was bare metal and I just checked that this was a production config
>>> and not some weird debug muck which breaks large pages. I'll look deeper
>>> into that.
>>
>> I can't find any reasonable explanation. The pages are definitely large
>> pages, so yes the dTLB miss count does not make sense, but it's
>> consistently faster and it's always the dTLB miss count which makes the
>> big difference according to perf.
>>
>> For enhanced fun, I ran the lot on a AMD Zen3 machine and with the same
>> test case (hackbench -l 10000) repeated 10 times by perf stat this is
>> consistently slower than the non optimized variant. There is at least an
>> explanation for that. A tight loop of 1 Mio xgetbv(1) invocations takes
>> 9 Mio cycles on a SKL-X and 50 Mio cycles on a AMD Zen3.
>
> I'll take a look into this and see what I find. Might be interesting to
> see if the actual XSAVES is slower or quicker, too, based on the input mask.
>
> If the performance slowdown shows up in real world benchmarks, we might
> want to consider not using the xgetbv() call on AMD.
As things stand now, I'm not going to pursue this further at the moment.
The effect on SKL-X is not explainable especially the dTLB miss count
decrease does not make any sense. Aside of that I just figured out that
it is very sensitive to kernel configurations and I have no idea yet
what exactly is the screw to turn to make the effect come and go.
So I just go and add the XSAVEC support alone as that's actually
something which _is_ beneficial for guests.
Thanks,
tglx
Powered by blists - more mailing lists