linux-kernel - Re: [PATCH v10 13/28] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87zgrofw81.ffs@tglx>
Date:   Mon, 04 Oct 2021 21:03:58 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Bae, Chang Seok" <chang.seok.bae@...el.com>
Cc:     "bp@...e.de" <bp@...e.de>, "Lutomirski, Andy" <luto@...nel.org>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "Brown, Len" <len.brown@...el.com>,
        "lenb@...nel.org" <lenb@...nel.org>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "Macieira, Thiago" <thiago.macieira@...el.com>,
        "Liu, Jing2" <jing2.liu@...el.com>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v10 13/28] x86/fpu/xstate: Use feature disable (XFD) to
 protect dynamic user state

On Sun, Oct 03 2021 at 22:39, Chang Seok Bae wrote:
> On Oct 1, 2021, at 13:20, Thomas Gleixner <tglx@...utronix.de> wrote:
>
> Looking at the changelog of the patch to delay XSTATE [1] load:
>
>     This gives the kernel the potential to skip loading FPU state for tasks
>     that stay in kernel mode, or for tasks that end up with repeated
>     invocations of kernel_fpu_begin() & kernel_fpu_end().

Correct.

> But I think XFD state is different from XSTATE. There is no use case for
> XFD-enabled features in kernel mode.

Correct, but your patch does not ensure that XFD features are disabled
on context switch. You write the XFD mask of the next task when it
differs frome the XFD mask of the previous task. So we have the following:

        prev XFD	next XFD
        DISABLED  	DISABLED         XFD features stay disabled
        ENABLED         DISABLED         XFD features are disabled
        DISABLED        ENABLED          XFD features are enabled
        ENABLED         ENABLED          XFD features stay enabled

So it still runs in the kernel with XFD features enabled including
interrupts, soft interupts, exceptions and NMIs. So what's the problem
when it does a user -> kernel -> user transition with the user XFD on?

> So, XFD state was considered to be switched under switch_to() just
> like other user states. E.g. user FSBASE is switched here as kernel
> does not use it.

That's not really a justification.

> But user GSBASE is loaded at returning to userspace.

And so is XSTATE

> Potentially, it is also beneficial as XFD-armed states will hold
> INIT-state [3]:
>
>     If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i, the
>     instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1; instead,
>     it saves bit i of XSTATE_BV field of the XSAVE header as 0 (indicating
>     that the state component is in its initialized state).

How does that matter? The point is that if the FPU registers are
unmodified then a task can return to user space without doing anything
even if it went through five context switches. So how is XFD any
different?

Where is the kernel doing XSAVE / XSAVES:

     1) On context switch which sets TIF_NEED_FPU_LOAD

        Once TIF_NEED_FPU_LOAD is set the kernel does not do XSAVES in
        the context of the task simply because it knows that the content
        is in the memory buffer.

     2) In signal handling

        Only happens when TIF_NEED_FPU_LOAD == 0

Where is the kernel doing XRSTOR / XRSTORS:

     1) On return to user space if the FPU registers are not up to date

        So this can restore XFD as well

     2) In signal handling and related functions

        Only happens when TIF_NEED_FPU_LOAD == 0

So what's the win?

No wrmsrl() on context switch, which means for the user -> kthread ->
user context switch scenario for which the register preserving is
optimized you spare two wrmsrl() invocations, run less code with less
conditionals.

What's the price?

A few trivial XFD sanity checks for debug enabled kernels to ensure that
XFD is correct on XSAVE[S] and XRSTOR[S], which have no runtime overhead
on production systems.

Even if we decide that these checks should be permanent then they happen
in code pathes which are doing a slow X* operation anyway.

Thanks,

        tglx