[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <587463c4e5fa82dff8748e5f753890ac390e993e.camel@intel.com>
Date: Wed, 18 Dec 2019 12:53:59 -0800
From: Yu-cheng Yu <yu-cheng.yu@...el.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Tony Luck <tony.luck@...el.com>,
Andy Lutomirski <luto@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Rik van Riel <riel@...riel.com>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
Fenghua Yu <fenghua.yu@...el.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2 3/3] x86/fpu/xstate: Invalidate fpregs when
__fpu_restore_sig() fails
On Wed, 2019-12-18 at 16:54 +0100, Sebastian Andrzej Siewior wrote:
> On 2019-12-12 13:08:55 [-0800], Yu-cheng Yu wrote:
> > In __fpu_restore_sig(),'init_fpstate.xsave' and part of 'fpu->state.xsave'
> > are restored separately to xregs. However, as stated in __cpu_invalidate_
> > fpregs_state(),
> >
> > Any code that clobbers the FPU registers or updates the in-memory
> > FPU state for a task MUST let the rest of the kernel know that the
> > FPU registers are no longer valid for this task.
> >
> > and this code violates that rule. Should the restoration fail, the other
> > task's context is corrupted.
> >
> > This problem does not occur very often because copy_*_to_xregs() succeeds
> > most of the time.
>
> why "most of the time"? It should always succeed. We talk here about
> __fpu__restore_sig() correct? Using init_fpstate as part of restore
> process isn't the "default" case. If the restore _here_ fails then it
> fails.
>
> > It occurs, for instance, in copy_user_to_fpregs_
> > zeroing() when the first half of the restoration succeeds and the other
> > half fails. This can be triggered by running glibc tests, where a non-
> > present user stack page causes the XRSTOR to fail.
>
> So if copy_user_to_fpregs_zeroing() fails then we go to the slowpath.
> Then we load the FPU register with copy_kernel_to_xregs_err().
> In the end they are either enabled (fpregs_mark_activate()) or cleared
> if it failed (fpu__clear()). Don't see here a problem.
I could have explained this better, sorry! I will explain the first
case below; other cases are similar.
In copy_user_to_fpregs_zeroing(), we have:
if (user_xsave()) {
...
if (unlikely(init_bv))
copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
return copy_user_to_xregs(buf, xbv);
...
}
The copy_user_to_xregs() may fail, and when that happens, before going to
the slow path, there is fpregs_unlock() and context switches may happen.
However, at this point, fpu_fpregs_owner_ctx has not been changed; it could
still be another task's FPU. For this to happen and to be detected, the user
stack page needs to be non-present, fpu_fpregs_owner_ctx need to be another task,
and that other task needs to be able to detect its registers are modified.
The last factor is not easy to reproduce, and a CET control-protection fault
helps.
>
> Can you tell me which glibc test? I would like to reproduce this.
>
> > The introduction of supervisor xstates and CET, while not contributing to
> > the problem, makes it more detectable. After init_fpstate and the Shadow
> > Stack pointer have been restored to xregs, the XRSTOR from user stack
> > fails and fpu_fpregs_owner_ctx is not updated. The task currently owning
> > fpregs then uses the corrupted Shadow Stack pointer and triggers a control-
> > protection fault.
>
> So I don't need new HW with supervisor and CET? A plain KVM box with
> SSE2 and so should be enough?
What I do is, clone the whole glibc source, and run mutiple copies of
"make check". In about 40 minutes or so, there are unexplained seg faults,
or a few control-protection faults (if you enable CET). Please let me
know if more clarification is needed.
Thanks,
Yu-cheng
Powered by blists - more mailing lists