[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b14a9c82-3336-0f13-1a27-fba929e6b4fb@linux.intel.com>
Date: Wed, 19 Dec 2018 00:28:11 +0800
From: "Li, Aubrey" <aubrey.li@...ux.intel.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Aubrey Li <aubrey.li@...el.com>, mingo@...hat.com,
peterz@...radead.org, hpa@...or.com, ak@...ux.intel.com,
tim.c.chen@...ux.intel.com, dave.hansen@...el.com,
arjan@...ux.intel.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
On 2018/12/18 23:32, Thomas Gleixner wrote:
> On Tue, 18 Dec 2018, Li, Aubrey wrote:
>
>> On 2018/12/18 22:14, Thomas Gleixner wrote:
>>> On Tue, 18 Dec 2018, Aubrey Li wrote:
>>>> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
>>>> index a38bf5a1e37a..8778ac172255 100644
>>>> --- a/arch/x86/include/asm/fpu/internal.h
>>>> +++ b/arch/x86/include/asm/fpu/internal.h
>>>> @@ -411,6 +411,13 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
>>>> {
>>>> if (likely(use_xsave())) {
>>>> copy_xregs_to_kernel(&fpu->state.xsave);
>>>> +
>>>> + /*
>>>> + * AVX512 state is tracked here because its use is
>>>> + * known to slow the max clock speed of the core.
>>>> + */
>>>> + if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
>>>> + fpu->avx512_timestamp = jiffies_64;
>>>
>>> Even if unlikely this is incorrect when running a 32 bit kernel because
>>> there jiffies_64 cannot be atomically loaded vs. a concurrent update. See
>>> the comment in include/linux/jiffies.h right above the jiffies_64
>>> declaration.
>>>
>> Yeah, I noticed this, because this is under use_xsave() condition, also need
>> valid AVX512 state, so a 32 bit kernel won't enter this branch.
>
> What exactly prevents a 32bit kernel from having the AVX512 feature bit
> set? And if it cannot be set on 32bit, then why are you compiling that code
> in at all?
>
I misunderstood, you mean 32bit kernel, not 32bit machine. Theoretically 32bit
kernel can use AVX512, but not sure if anyone use it like this. get_jiffies_64()
includes jiffies_lock ops so not good in context switch. So I want to use raw
jiffies_64 here. jiffies is a good candidate but it has wraparound overflow issue.
Other time source are expensive here.
Should I limit the code only running on 64bit kernel?
Thanks,
-Aubrey
Powered by blists - more mailing lists