[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e9c629f-eab0-6643-4e47-0a4b39e2a3f8@intel.com>
Date: Fri, 18 Aug 2023 14:35:38 -0500
From: Lijun Pan <lijun.pan@...el.com>
To: Rick Edgecombe <rick.p.edgecombe@...el.com>, <tglx@...utronix.de>,
<mingo@...hat.com>, <bp@...en8.de>, <x86@...nel.org>,
<hpa@...or.com>, <dave.hansen@...ux.intel.com>, <luto@...nel.org>,
<peterz@...radead.org>
CC: <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec
Hi Rick,
On 8/18/2023 12:03 PM, Rick Edgecombe wrote:
> The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
> valid and should be reloaded when returning to userspace. However, the
> kernel will skip doing this if the FPU registers are already valid as
> determined by fpregs_state_valid(). The logic embedded there considers
> the state valid if two cases are both true:
> 1: fpu_fpregs_owner_ctx points to the current tasks FPU state
> 2: the last CPU the registers were live in was the current CPU.
>
> This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
> the current FPU during the fpregs_restore_userregs() operation, so it
> indicates that the registers have been restored on this CPU. But this
> alone doesn’t preclude that the task hasn’t been rescheduled to a
> different CPU, where the registers were modified, and then back to the
> current CPU. To verify that this was not the case the logic relies on the
> second condition. So the assumption is that if the registers have been
> restored, AND they haven’t had the chance to be modified (by being
> loaded on another CPU), then they MUST be valid on the current CPU.
>
> Besides the lazy FPU optimizations, the other cases where the FPU
> registers might not be valid are when the kernel modifies the FPU register
> state or the FPU saved buffer. In this case the operation modifying the
> FPU state needs to let the kernel know the correspondence has been
> broken. The comment in “arch/x86/kernel/fpu/context.h” has:
> /*
> ...
> * If the FPU register state is valid, the kernel can skip restoring the
> * FPU state from memory.
> *
> * Any code that clobbers the FPU registers or updates the in-memory
> * FPU state for a task MUST let the rest of the kernel know that the
> * FPU registers are no longer valid for this task.
> *
> * Either one of these invalidation functions is enough. Invalidate
> * a resource you control: CPU if using the CPU for something else
> * (with preemption disabled), FPU for the current task, or a task that
> * is prevented from running by the current task.
> */
>
> However, this is not completely true. When the kernel modifies the
> registers or saved FPU state, it can only rely on
> __fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
> tracking. The exec path instead relies on fpregs_deactivate(), which sets
> the CPU’s FPU context to NULL. This was observed to fail to restore the
> reset FPU state to the registers when returning to userspace in the
> following scenario:
>
> 1. A task is executing in userspace on CPU0
> - CPU0’s FPU context points to tasks
> - fpu->last_cpu=CPU0
> 2. The task exec()’s
> 3. While in the kernel the task reschedules to CPU1
> - CPU0 gets a thread executing in the kernel (such that no other
> FPU context is activated)
> - Scheduler sets task’s fpu->last_cpu=CPU0
> 4. Continuing the exec, the task gets to
> fpu_flush_thread()->fpu_reset_fpregs()
> - Sets CPU1’s fpu context to NULL
> - Copies the init state to the task’s FPU buffer
> - Sets TIF_NEED_FPU_LOAD on the task
> 5. The task reschedules back to CPU0 before completing the exec and
> returning to userspace
> - During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
> - Skips saving the registers and updating task’s fpu→last_cpu,
> because TIF_NEED_FPU_LOAD is the canonical source.
> 6. Now CPU0’s FPU context is still pointing to the task’s, and
> fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
> though the reset FPU state has not been restored.
>
> So the root cause is that exec() is doing the wrong kind of invalidate. It
> should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
> fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
> going away, they are just getting reset as part of an exec. So switch to
> __fpu_invalidate_fpregs_state().
>
> Also, delete the misleading comment that says that either kind of
> invalidate will be enough, because it’s not always the case.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
>
> ---
> Hi,
>
> This was spotted on a specific pre-production setup running an
> out-of-tree glibc and the x86/shstk tip branch. The symptom observed
> was a shadow stack segfault in ld-linux. The test case was a kernel
> build with a high number of threads and it was able to generate the
> segfault relatively reliably.
>
> I was surprised to find that the root cause was not related to supervisor
> xsave and instead seems to be a general FPU bug where the FPU state will
> not be reset during exec if rescheduling happens twice in certain points
> during the operation. It seems to be so old that I had a hard time
> figuring which commit to blame.
>
> A guess at how this was able to lurk so long is the combination of two
> factors. One is that this specific test environment and workload seemed
> to like to generate this specific pattern of scheduling for some reason.
> So the fact it was reliably reproduced there could be not be indicative of
> the typical case. The other factor is that CET features will rather loudly
> alert to any corrupted FPU state due to the enforcement nature of that
> state. So maybe this FPU reset miss during exec happened less commonly in
> the wild, but most existing apps can survive it silently.
>
> But since it's still a bit surprising, I would appreciate some extra
> scrutiny on the reasoning. I verified the FPU state was not getting reset
> during exec’s that experienced rescheduling to another CPU and back at
> times as described in the commit log. Then following the logic in the
> code, failing to restore the FPU would be expected. And fixing that logic
> fixed the observed issue. But still surprised this wasn't seen before now.
>
> Thanks,
> Rick
> ---
> arch/x86/kernel/fpu/context.h | 3 +--
> arch/x86/kernel/fpu/core.c | 2 +-
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
> index af5cbdd9bd29..f6d856bd50bc 100644
> --- a/arch/x86/kernel/fpu/context.h
> +++ b/arch/x86/kernel/fpu/context.h
> @@ -19,8 +19,7 @@
> * FPU state for a task MUST let the rest of the kernel know that the
> * FPU registers are no longer valid for this task.
> *
> - * Either one of these invalidation functions is enough. Invalidate
> - * a resource you control: CPU if using the CPU for something else
> + * Invalidate a resource you control: CPU if using the CPU for something else
> * (with preemption disabled), FPU for the current task, or a task that
> * is prevented from running by the current task.
> */
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index e03b6b107b20..a86d37052a64 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -713,7 +713,7 @@ static void fpu_reset_fpregs(void)
> struct fpu *fpu = ¤t->thread.fpu;
>
> fpregs_lock();
> - fpu__drop(fpu);
> + __fpu_invalidate_fpregs_state(fpu);
> /*
> * This does not change the actual hardware registers. It just
> * resets the memory image and sets TIF_NEED_FPU_LOAD so a
Thanks for the patch. Let me get back to my server (offline now) next
Monday and will add a "Test-by: Lijun Pan <lijun.pan@...el.com>" if it
passes.
In our bug case, probably just switching to
__fpu_invalidate_fpregs_state(fpu) from fpu__drop(fpu) is enough.
Maybe there are some other cases that need
__this_cpu_write(fpu_fpregs_owner_ctx, NULL) through fpu__drop() call,
which I don't know.
Here is the excerpt of the call sequence:
fpu__drop() -> fpregs_deactivate() ->
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
arch/x86/kernel/fpu/context.h has
static inline void __cpu_invalidate_fpregs_state(void)
{
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
{
fpu->last_cpu = -1;
}
static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
{
return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu ==
fpu->last_cpu;
}
So, I am thinking if it is more rigorous to have both
(__cpu_invalidate_fpregs_state, __fpu_invalidate_fpregs_state)
invalidated, similarly as fpregs_state_valid checks both conditions.
code changes like below:
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index 958accf2ccf0..fd3304bed0a2 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -19,8 +19,8 @@
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
- * Either one of these invalidation functions is enough. Invalidate
- * a resource you control: CPU if using the CPU for something else
+ * To be more rigorous and to prevent from future corner case, Invalidate
+ * both resources you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 97a899bf98b9..08b9cef0e076 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -725,6 +725,7 @@ static void fpu_reset_fpregs(void)
fpregs_lock();
fpu__drop(fpu);
+ __fpu_invalidate_fpregs_state(fpu);
/*
* This does not change the actual hardware registers. It just
* resets the memory image and sets TIF_NEED_FPU_LOAD so a
Thanks,
Lijun
Powered by blists - more mailing lists