[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <89AB5807-E295-4AB2-9568-9B6306E896F8@amacapital.net>
Date: Tue, 13 Oct 2020 18:47:13 -0700
From: Andy Lutomirski <luto@...capital.net>
To: "Brown, Len" <len.brown@...el.com>
Cc: Andy Lutomirski <luto@...nel.org>,
"Bae, Chang Seok" <chang.seok.bae@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...e.de>,
X86 ML <x86@...nel.org>,
"Hansen, Dave" <dave.hansen@...el.com>,
"Liu, Jing2" <jing2.liu@...el.com>,
"Shankar, Ravi V" <ravi.v.shankar@...el.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 13/22] x86/fpu/xstate: Expand dynamic user state area on first use
> On Oct 13, 2020, at 3:31 PM, Brown, Len <len.brown@...el.com> wrote:
>
>
>>
>> From: Andy Lutomirski <luto@...nel.org>
>
>>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>>> @@ -518,3 +518,40 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr)
> ..
>>> +bool xfirstuse_event_handler(struct fpu *fpu)
>>> +{
>>> + bool handled = false;
>>> + u64 event_mask;
>>> +
>>> + /* Check whether the first-use detection is running. */
>>> + if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled())
>>> + return handled;
>>> +
>
>> MSR_IA32_XFD_ERR needs to be wired up in the exception handler, not in
>> some helper called farther down the stack
>
> xfirstuse_event_handler() is called directly from the IDTENTRY exc_device_not_available():
>
>>> @@ -1028,6 +1028,9 @@ DEFINE_IDTENTRY(exc_device_not_available)
>>> {
>>> unsigned long cr0 = read_cr0();
>>>
>>> + if (xfirstuse_event_handler(¤t->thread.fpu))
>>> + return;
>
> Are you suggesting we should instead open code it inside that routine?
MSR_IA32_XFD_ERR is like CR2 and DR6 — it’s functionally a part of the exception. Let’s handle it as such. (And, a bit like DR6, it’s a bit broken.)
>
>> But this raises an interesting point -- what happens if allocation
>> fails? I think that, from kernel code, we simply cannot support this
>> exception mechanism. If kernel code wants to use AMX (and that would
>> be very strange indeed), it should call x86_i_am_crazy_amx_begin() and
>> handle errors, not rely on exceptions. From user code, I assume we
>> send a signal if allocation fails.
>
> The XFD feature allows us to transparently expand the kernel context switch buffer
> for a user task, when that task first touches this associated hardware.
> It allows applications to operate as if the kernel had allocated the backing AMX
> context switch buffer at initialization time. However, since we expect only
> a sub-set of tasks to actually use AMX, we instead defer allocation until
> we actually see first use for that task, rather than allocating for all tasks.
I sure hope that not all tasks use it. Context-switching it will be absurdly expensive.
>
> While we currently don't propose AMX use inside the kernel, it is conceivable
> that could be done in the future, just like AVX is used by the RAID code;
> and it would be done the same way -- kernel_fpu_begin()/kernel_fpu_end().
> Such future Kernel AMX use would _not_ arm XFD, and would _not_ provoke this fault.
> (note that we context switch the XFD-armed state per task)
How expensive is *that*? Can you give approximate cycle counts for saving, restoring, arming and disarming?
This reminds me of TS. Writing TS was more expensive than saving the whole FPU state. And, for kernel code, we can’t just “not arm” the XFD — we would have to disarm it.
>
> vmalloc() does not fail, and does not return an error, and so there is no concept
> of returning a signal. If we got to the point where vmalloc() sleeps, then the system
> has bigger OOM issues, and the OOM killer would be on the prowl.
>
> If we were concerned about using vmalloc for a couple of pages in the task structure,
> Then we could implement a routine to harvest unused buffers and free them
Kind of like we vmalloc a couple pages for kernel stacks, and we carefully cache them. And we handle failure gracefully.
Powered by blists - more mailing lists