[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADyq12xG=mRcdkBU9Y=4dW5j8U8xuzU4x+1d_qSy-opcJHZduw@mail.gmail.com>
Date: Tue, 15 Feb 2022 16:36:12 -0500
From: Brian Geffon <bgeffon@...gle.com>
To: Guenter Roeck <groeck@...gle.com>
Cc: Dave Hansen <dave.hansen@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Willis Kung <williskung@...gle.com>,
Borislav Petkov <bp@...e.de>,
Andy Lutomirski <luto@...nel.org>,
"# v4 . 10+" <stable@...r.kernel.org>,
"the arch/x86 maintainers" <x86@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
On Tue, Feb 15, 2022 at 4:14 PM Guenter Roeck <groeck@...gle.com> wrote:
>
> On Tue, Feb 15, 2022 at 9:13 AM Dave Hansen <dave.hansen@...el.com> wrote:
> >
> > On 2/15/22 07:36, Brian Geffon wrote:
> > > There are two issues with PKRU handling prior to 5.13.
> >
> > Are you sure both of these issues were introduced by 0cecca9d03c? I'm
> > surprised that the get_xsave_addr() issue is not older.
> >
> > Should this be two patches?
> >
> > > The first is that when eagerly switching PKRU we check that current
> >
> > Don't forget to write in imperative mood. No "we's", please.
> >
> > https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
> >
> > This goes for changelogs and comments too.
> >
> > > is not a kernel thread as kernel threads will never use PKRU. It's
> > > possible that this_cpu_read_stable() on current_task (ie.
> > > get_current()) is returning an old cached value. By forcing the read
> > > with this_cpu_read() the correct task is used. Without this it's
> > > possible when switching from a kernel thread to a userspace thread
> > > that we'll still observe the PF_KTHREAD flag and never restore the
> > > PKRU. And as a result this issue only occurs when switching from a
> > > kernel thread to a userspace thread, switching from a non kernel
> > > thread works perfectly fine because all we consider in that situation
> > > is the flags from some other non kernel task and the next fpu is
> > > passed in to switch_fpu_finish().
> >
> > It makes *sense* that there would be a place in the context switch code
> > where 'current' is wonky, but I never realized this. This seems really
> > fragile, but *also* trivially detectable.
> >
> > Is the PKRU code really the only code to use 'current' in a buggy way
> > like this?
> >
> > > The second issue is when using write_pkru() we only write to the
> > > xstate when the feature bit is set because get_xsave_addr() returns
> > > NULL when the feature bit is not set. This is problematic as the CPU
> > > is free to clear the feature bit when it observes the xstate in the
> > > init state, this behavior seems to be documented a few places throughout
> > > the kernel. If the bit was cleared then in write_pkru() we would happily
> > > write to PKRU without ever updating the xstate, and the FPU restore on
> > > return to userspace would load the old value agian.
> >
> >
> > ^ again
> >
> > It's probably worth noting that the AMD init tracker is a lot more
> > aggressive than Intel's. On Intel, I think XRSTOR is the only way to
> > get back to the init state. You're obviously hitting this on AMD.
> >
>
> Brian should correct me here, but I think we have seen this with one
> specific Intel CPU.
>
> Brian, would it make sense to list the affected CPU model(s), or at
> least the ones where we have observed the problem ?
The only CPU I have access to at the moment with OSPKE is an 11th Gen
Core i5-1135G7, so that's the only one I've observed it on. I can try
to search around for other hardware.
Brian
>
> Thanks,
> Guenter
>
> > It's also *very* unlikely that PKRU gets back to a value of 0. I think
> > we added a selftest for this case in later kernels.
> >
> > That helps explain why this bug hung around for so long.
> >
> > > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > > index 03b3de491b5e..540bda5bdd28 100644
> > > --- a/arch/x86/include/asm/fpu/internal.h
> > > +++ b/arch/x86/include/asm/fpu/internal.h
> > > @@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
> > > * PKRU state is switched eagerly because it needs to be valid before we
> > > * return to userland e.g. for a copy_to_user() operation.
> > > */
> > > - if (!(current->flags & PF_KTHREAD)) {
> > > + if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
> >
> > This really deserves a specific comment.
> >
> > > /*
> > > * If the PKRU bit in xsave.header.xfeatures is not set,
> > > * then the PKRU component was in init state, which means
> > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > > index 9e71bf86d8d0..aa381b530de0 100644
> > > --- a/arch/x86/include/asm/pgtable.h
> > > +++ b/arch/x86/include/asm/pgtable.h
> > > @@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
> > > if (!boot_cpu_has(X86_FEATURE_OSPKE))
> > > return;
> > >
> > > - pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
> > > -
> > > /*
> > > * The PKRU value in xstate needs to be in sync with the value that is
> > > * written to the CPU. The FPU restore on return to userland would
> > > * otherwise load the previous value again.
> > > */
> > > fpregs_lock();
> > > - if (pk)
> > > - pk->pkru = pkru;
> > > + /*
> > > + * The CPU is free to clear the feature bit when the xstate is in the
> > > + * init state. For this reason, we need to make sure the feature bit is
> > > + * reset when we're explicitly writing to pkru. If we did not then we
> > > + * would write to pkru and it would not be saved on a context switch.
> > > + */
> > > + current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
> >
> > I don't think we need to describe how the init optimization works again.
> > I'm also not sure it's worth mentioning context switches here. It's a
> > wider problem than that. Maybe:
> >
> > /*
> > * All fpregs will be XRSTOR'd from this buffer before returning
> > * to userspace. Ensure that XRSTOR does not init PKRU and that
> > * get_xsave_addr() will work.
> > */
> >
> > > + pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
> > > + BUG_ON(!pk);
> >
> > A BUG_ON() a line before a NULL pointer dereference doesn't tend to do
> > much good.
> >
> > > + pk->pkru = pkru;
> > > __write_pkru(pkru);
> > > fpregs_unlock();
> > > }
> >
Powered by blists - more mailing lists