linux-kernel - Re: [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181109232521.l2ll2n3coxygkxv4@linutronix.de>
Date:   Sat, 10 Nov 2018 00:25:21 +0100
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
        x86@...nel.org, Andy Lutomirski <luto@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        kvm@...r.kernel.org, "Jason A. Donenfeld" <Jason@...c4.com>,
        Rik van Riel <riel@...riel.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in
 __fpu__restore_sig()

On 2018-11-09 19:52:02 [+0100], Borislav Petkov wrote:
> On Fri, Nov 09, 2018 at 06:35:21PM +0100, Sebastian Andrzej Siewior wrote:
> > fpu__drop() stets ->initialized to 0. As a result the context switch
> 
> "... the context switch path landing in switch_fpu_prepare()... " is what you
> mean, right?
I mean both. switch_fpu_prepare() while the task is leaving and then
switch_fpu_finish() while the task is coming back. But yes.

> > will not save current FPU registers and so _not_ write to fpu->state.
> > This also means that CPU's FPU register will be random (inherited from
> > the last context)
> 
> You mean, the FPU regs will have random values, yes.
correct. Same like for kernel threads.

> > after the context switch. This is also true for usage
> > in softirq via kernel_fpu_begin().
> 
> So far so good.
> 
> Except maybe because I'm dense about FPU, I still am missing something.
> 
> You have this path:
> 
> __fpu__restore_sig
> |-> fpu__clear
>  |-> fpu__drop
> 
> and that happens on the sigreturn() path.
> 
> Now, the context switch happens ... when exactly?
> 
> After the sigreturn is done?

Is fpu__clear() correct here? If so, a context switch after setting
->initialized has been set to 1 wouldn't matter because in the end the
register state is restored from init_fpstate and not from task's FPU
struct.

> 
> It must be because then you'd get that ->state corruption after
> ->initialized has been cleared.
> 
> Right?

I might got your question wrong. If you quote the code and try again and
I do so, too :)

> <snip a bunch of stuff, we'll get back to it later>
> 
> > So. The fix would be:
> > @@ -344,10 +344,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
> >                         sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
> >                 }
> >  
> > +               local_bh_disable();
> >                 fpu->initialized = 1;
> > -               preempt_disable();
> >                 fpu__restore(fpu);
> > -               preempt_enable();
> > +               local_bh_enable();
> >  
> >                 return err;
> >         } else {
> > 
> > local_bh_disable() due to possible kernel_fpu_begin() usage in softirq.
> > How much do we care here about a theoretical race on 32bit anyway? I
> > don't think someone complained :) I would have to rebase my queue…
> > otherwise…
> 
> Funny, you should mention that.
> 
> But this very much rings a bell about a very elusive bug we had on
> 32-bit at the time. See attached mbox (yeah, the web archives were crap
> and couldn't find the links so I'm sending you the whole thread).
> 
> And at the time Ingo said that there's something still missing about
> *why* it would happen.
> 
> And I think it is this context switch happening right after the
> sigreturn - *AFAICT* - which would cause this.
> 
> I could very well be off but this smells very similar to your thing.

So checking out v4.5-rc3-15-g58122bf1d856a and __fpu__restore_sig() is
something like this:

|	fpu__drop(fpu);
…
|	fpu->fpstate_active = 1;
X
|	if (use_eager_fpu()) {
|		preempt_disable();
|		fpu__restore(fpu);
|		preempt_enable();
|	}

fpu__drop() sets fpstate_active & fpregs_active to 0[¹]. A context switch
at X would _not_ save current FPU registers and overwrite what was
prepared because fpregs_active should still be zero.
Now on the switch back to the task, fpstate_active was set which means
fpu.preload might be true. If so it would load the FPU registers and set
fpregs_active to 1. Later fpu__restore() would try the same and
fpregs_activate() would trigger the warning because fpregs_active was
already set to 1.

> Hmmm.
So I just came up with a possible hard to trigger case and a robot
triggered it already a while back. Well, CONFIG_PREEMPT=y is also there
so it matches this part of the story. But you connected the dots. 

[¹] side note: in my early research it took a while to notice that
    fpstate_active and fpregs_active were two different things. My brain
    used fp.*_active for matching. It also helped my confusion that
    those were renamed and removed…

Sebastian