[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58011258.5000202@linux.intel.com>
Date: Fri, 14 Oct 2016 10:14:00 -0700
From: Dave Hansen <dave.hansen@...ux.intel.com>
To: riel@...hat.com, linux-kernel@...r.kernel.org
Cc: hpa@...or.com, mingo@...nel.org, bp@...en8.de, luto@...nel.org,
oleg@...hat.com
Subject: Re: [PATCH 2/2] x86/fpu: split old & new fpu handling into separate
functions
On 10/14/2016 05:15 AM, riel@...hat.com wrote:
> From: Rik van Riel <riel@...hat.com>
>
> By moving all of the new fpu state handling into switch_fpu_finish,
> the code can be simplified some more. This does get rid of the
> prefetch, but given the size of the fpu register state on modern
> CPUs, and the amount of work done by __switch_to in-between both
> functions, the value of a single cache line prefetch seems somewhat
> dubious anyway.
...
> -
> - if (fpu.preload) {
> - if (fpregs_state_valid(new_fpu, cpu))
> - fpu.preload = 0;
> - else
> - prefetch(&new_fpu->state);
> - fpregs_activate(new_fpu);
> - }
> -
> - return fpu;
> }
Yeah, that prefetch is highly dubious. XRSTOR might not even be
_reading_ that cacheline if the state isn't present (xstate->xfeatures
bit is 0). If we had to pick *a* cacheline to prefetch for XRSTOR, it
would be the XSAVE header, *not* the FPU state.
I actually did some attempts to optimize the PKRU handling by touching
and prefetching the state before calling XRSTOR. It actually made
things overall _worse_ when I touched it before the XRSTOR.
It would be ideal to have some data on whether this actually _does_
anything, but I can't imagine it being a real delta in either direction.
Acked-by: Dave Hansen <dave.hansen@...el.com>
Powered by blists - more mailing lists