linux-kernel - Re: [tip: x86/fpu] x86/fpu: Deactivate FPU state after failure during state load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9f3c7f106963ce7f8a74fc084f6de5ad2d4380ed.camel@intel.com>
Date:   Tue, 07 Jan 2020 12:38:20 -0800
From:   Yu-cheng Yu <yu-cheng.yu@...el.com>
To:     Andy Lutomirski <luto@...capital.net>, linux-kernel@...r.kernel.org
Cc:     linux-tip-commits@...r.kernel.org,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Borislav Petkov <bp@...e.de>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        Jann Horn <jannh@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Rik van Riel <riel@...riel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Tony Luck <tony.luck@...el.com>, x86-ml <x86@...nel.org>
Subject: Re: [tip: x86/fpu] x86/fpu: Deactivate FPU state after failure
 during state load

On Tue, 2020-01-07 at 10:41 -1000, Andy Lutomirski wrote:
> > On Jan 7, 2020, at 2:52 AM, tip-bot2 for Sebastian Andrzej Siewior <tip-bot2@...utronix.de> wrote:
> > 
> > The following commit has been merged into the x86/fpu branch of tip:
> > 
> > Commit-ID:     bbc55341b9c67645d1a5471506370caf7dd4a203
> > Gitweb:        https://git.kernel.org/tip/bbc55341b9c67645d1a5471506370caf7dd4a203
> > Author:        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> > AuthorDate:    Fri, 20 Dec 2019 20:59:06 +01:00
> > Committer:     Borislav Petkov <bp@...e.de>
> > CommitterDate: Tue, 07 Jan 2020 13:44:42 +01:00
> > 
> > x86/fpu: Deactivate FPU state after failure during state load
> > 
> > In __fpu__restore_sig(), fpu_fpregs_owner_ctx needs to be reset if the
> > FPU state was not fully restored. Otherwise the following may happen (on
> > the same CPU):
> > 
> >  Task A                     Task B               fpu_fpregs_owner_ctx
> >  *active*                                        A.fpu
> >  __fpu__restore_sig()
> >                             ctx switch           load B.fpu
> >                             *active*             B.fpu
> >  fpregs_lock()
> >  copy_user_to_fpregs_zeroing()
> >    copy_kernel_to_xregs() *modify*
> >    copy_user_to_xregs() *fails*
> >  fpregs_unlock()
> >                            ctx switch            skip loading B.fpu,
> >                            *active*              B.fpu
> > 
> > In the success case, fpu_fpregs_owner_ctx is set to the current task.
> > 
> > In the failure case, the FPU state might have been modified by loading
> > the init state.
> > 
> > In this case, fpu_fpregs_owner_ctx needs to be reset in order to ensure
> > that the FPU state of the following task is loaded from saved state (and
> > not skipped because it was the previous state).
> > 
> > Reset fpu_fpregs_owner_ctx after a failure during restore occurred, to
> > ensure that the FPU state for the next task is always loaded.
> > 
> > The problem was debugged-by Yu-cheng Yu <yu-cheng.yu@...el.com>.
> 
> Wow, __fpu__restore_sig is a mess. We have __copy_from... that is Obviously Incorrect (tm) even though it’s not obviously exploitable. (It’s wrong because the *wrong pointer* is checked with access_ok().). We have a fast path that will execute just enough of the time to make debugging the slow path really annoying. (We should probably delete the fast path.)  There are pagefault_disable() call in there mostly to confuse people. (So we take a fault and sleep — big deal.  We have temporarily corrupt state, but no one will ever read it.  The retry after sleeping will clobber xstate, but lazy save is long gone and this should be fine now.  The real issue is that, if we’re preempted after a successful a successful restore, then the new state will get lost.)
> 
> So either we should delete the fast path or we should make it work reliably and delete the slow path.  And we should get rid of the __copy. And we should have some test cases.
> 
> BTW, how was the bug in here discovered?  It looks like it only affects signal restore failure, which is usually not survivable unless the user program is really trying.

It causes corruption in other tasks, e.g. a non-CET task gets a control-protection fault.

Yu-cheng