linux-kernel - Re: [PATCH] x86: Use __fpu_invalidate_fpregs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e9c629f-eab0-6643-4e47-0a4b39e2a3f8@intel.com>
Date:   Fri, 18 Aug 2023 14:35:38 -0500
From:   Lijun Pan <lijun.pan@...el.com>
To:     Rick Edgecombe <rick.p.edgecombe@...el.com>, <tglx@...utronix.de>,
        <mingo@...hat.com>, <bp@...en8.de>, <x86@...nel.org>,
        <hpa@...or.com>, <dave.hansen@...ux.intel.com>, <luto@...nel.org>,
        <peterz@...radead.org>
CC:     <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec

Hi Rick,

On 8/18/2023 12:03 PM, Rick Edgecombe wrote:
> The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
> valid and should be reloaded when returning to userspace. However, the
> kernel will skip doing this if the FPU registers are already valid as
> determined by fpregs_state_valid(). The logic embedded there considers
> the state valid if two cases are both true:
>    1: fpu_fpregs_owner_ctx points to the current tasks FPU state
>    2: the last CPU the registers were live in was the current CPU.
> 
> This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
> the current FPU during the fpregs_restore_userregs() operation, so it
> indicates that the registers have been restored on this CPU. But this
> alone doesn’t preclude that the task hasn’t been rescheduled to a
> different CPU, where the registers were modified, and then back to the
> current CPU. To verify that this was not the case the logic relies on the
> second condition. So the assumption is that if the registers have been
> restored, AND they haven’t had the chance to be modified (by being
> loaded on another CPU), then they MUST be valid on the current CPU.
> 
> Besides the lazy FPU optimizations, the other cases where the FPU
> registers might not be valid are when the kernel modifies the FPU register
> state or the FPU saved buffer. In this case the operation modifying the
> FPU state needs to let the kernel know the correspondence has been
> broken. The comment in “arch/x86/kernel/fpu/context.h” has:
> /*
> ...
>   * If the FPU register state is valid, the kernel can skip restoring the
>   * FPU state from memory.
>   *
>   * Any code that clobbers the FPU registers or updates the in-memory
>   * FPU state for a task MUST let the rest of the kernel know that the
>   * FPU registers are no longer valid for this task.
>   *
>   * Either one of these invalidation functions is enough. Invalidate
>   * a resource you control: CPU if using the CPU for something else
>   * (with preemption disabled), FPU for the current task, or a task that
>   * is prevented from running by the current task.
>   */
> 
> However, this is not completely true. When the kernel modifies the
> registers or saved FPU state, it can only rely on
> __fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
> tracking. The exec path instead relies on fpregs_deactivate(), which sets
> the CPU’s FPU context to NULL. This was observed to fail to restore the
> reset FPU state to the registers when returning to userspace in the
> following scenario:
> 
> 1. A task is executing in userspace on CPU0
> 	- CPU0’s FPU context points to tasks
> 	- fpu->last_cpu=CPU0
> 2. The task exec()’s
> 3. While in the kernel the task reschedules to CPU1
> 	- CPU0 gets a thread executing in the kernel (such that no other
> 		FPU context is activated)
> 	- Scheduler sets task’s fpu->last_cpu=CPU0
> 4. Continuing the exec, the task gets to
>     fpu_flush_thread()->fpu_reset_fpregs()
> 	- Sets CPU1’s fpu context to NULL
> 	- Copies the init state to the task’s FPU buffer
> 	- Sets TIF_NEED_FPU_LOAD on the task
> 5. The task reschedules back to CPU0 before completing the exec and
> 	returning to userspace
> 	- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
> 	- Skips saving the registers and updating task’s fpu→last_cpu,
> 		because TIF_NEED_FPU_LOAD is the canonical source.
> 6. Now CPU0’s FPU context is still pointing to the task’s, and
>     fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
>     though the reset FPU state has not been restored.
> 
> So the root cause is that exec() is doing the wrong kind of invalidate. It
> should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
> fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
> going away, they are just getting reset as part of an exec. So switch to
> __fpu_invalidate_fpregs_state().
> 
> Also, delete the misleading comment that says that either kind of
> invalidate will be enough, because it’s not always the case.
> 
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> 
> ---
> Hi,
> 
> This was spotted on a specific pre-production setup running an
> out-of-tree glibc and the x86/shstk tip branch. The symptom observed
> was a shadow stack segfault in ld-linux. The test case was a kernel
> build with a high number of threads and it was able to generate the
> segfault relatively reliably.
> 
> I was surprised to find that the root cause was not related to supervisor
> xsave and instead seems to be a general FPU bug where the FPU state will
> not be reset during exec if rescheduling happens twice in certain points
> during the operation. It seems to be so old that I had a hard time
> figuring which commit to blame.
> 
> A guess at how this was able to lurk so long is the combination of two
> factors. One is that this specific test environment and workload seemed
> to like to generate this specific pattern of scheduling for some reason.
> So the fact it was reliably reproduced there could be not be indicative of
> the typical case. The other factor is that CET features will rather loudly
> alert to any corrupted FPU state due to the enforcement nature of that
> state. So maybe this FPU reset miss during exec happened less commonly in
> the wild, but most existing apps can survive it silently.
> 
> But since it's still a bit surprising, I would appreciate some extra
> scrutiny on the reasoning. I verified the FPU state was not getting reset
> during exec’s that experienced rescheduling to another CPU and back at
> times as described in the commit log. Then following the logic in the
> code, failing to restore the FPU would be expected. And fixing that logic
> fixed the observed issue. But still surprised this wasn't seen before now.
> 
> Thanks,
> Rick
> ---
>   arch/x86/kernel/fpu/context.h | 3 +--
>   arch/x86/kernel/fpu/core.c    | 2 +-
>   2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
> index af5cbdd9bd29..f6d856bd50bc 100644
> --- a/arch/x86/kernel/fpu/context.h
> +++ b/arch/x86/kernel/fpu/context.h
> @@ -19,8 +19,7 @@
>    * FPU state for a task MUST let the rest of the kernel know that the
>    * FPU registers are no longer valid for this task.
>    *
> - * Either one of these invalidation functions is enough. Invalidate
> - * a resource you control: CPU if using the CPU for something else
> + * Invalidate a resource you control: CPU if using the CPU for something else
>    * (with preemption disabled), FPU for the current task, or a task that
>    * is prevented from running by the current task.
>    */
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index e03b6b107b20..a86d37052a64 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -713,7 +713,7 @@ static void fpu_reset_fpregs(void)
>   	struct fpu *fpu = &current->thread.fpu;
>   
>   	fpregs_lock();
> -	fpu__drop(fpu);
> +	__fpu_invalidate_fpregs_state(fpu);
>   	/*
>   	 * This does not change the actual hardware registers. It just
>   	 * resets the memory image and sets TIF_NEED_FPU_LOAD so a


Thanks for the patch. Let me get back to my server (offline now) next 
Monday and will add a "Test-by: Lijun Pan <lijun.pan@...el.com>" if it 
passes.

In our bug case, probably just switching to
__fpu_invalidate_fpregs_state(fpu) from fpu__drop(fpu) is enough.

Maybe there are some other cases that need
__this_cpu_write(fpu_fpregs_owner_ctx, NULL) through fpu__drop() call, 
which I don't know.

Here is the excerpt of the call sequence:
fpu__drop() -> fpregs_deactivate() -> 
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
arch/x86/kernel/fpu/context.h has
static inline void __cpu_invalidate_fpregs_state(void)
{
     __this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
{
     fpu->last_cpu = -1;
}
static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
{
    return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == 
fpu->last_cpu;
}

So, I am thinking if it is more rigorous to have both 
(__cpu_invalidate_fpregs_state, __fpu_invalidate_fpregs_state) 
invalidated, similarly as fpregs_state_valid checks both conditions.

code changes like below:
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index 958accf2ccf0..fd3304bed0a2 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -19,8 +19,8 @@
   * FPU state for a task MUST let the rest of the kernel know that the
   * FPU registers are no longer valid for this task.
   *
- * Either one of these invalidation functions is enough. Invalidate
- * a resource you control: CPU if using the CPU for something else
+ * To be more rigorous and to prevent from future corner case, Invalidate
+ * both resources you control: CPU if using the CPU for something else
   * (with preemption disabled), FPU for the current task, or a task that
   * is prevented from running by the current task.
   */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 97a899bf98b9..08b9cef0e076 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -725,6 +725,7 @@ static void fpu_reset_fpregs(void)

         fpregs_lock();
         fpu__drop(fpu);
+       __fpu_invalidate_fpregs_state(fpu);
         /*
          * This does not change the actual hardware registers. It just
          * resets the memory image and sets TIF_NEED_FPU_LOAD so a

Thanks,
Lijun