[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJhGHyDL2SppJRL83JzKcEf_Zy9zDRMMW_1PrsEwu2ZaHQAkNQ@mail.gmail.com>
Date: Tue, 15 Sep 2020 15:55:51 +0800
From: Lai Jiangshan <jiangshanlai@...il.com>
To: Jann Horn <jannh@...gle.com>
Cc: Lai Jiangshan <laijs@...ux.alibaba.com>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"the arch/x86 maintainers" <x86@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>,
kernel list <linux-kernel@...r.kernel.org>,
Kees Cook <keescook@...omium.org>
Subject: Re: [PATCH V3 1/3] x86/entry: avoid calling into sync_regs() when
entering from userspace
On Sat, Sep 12, 2020 at 5:24 AM Jann Horn <jannh@...gle.com> wrote:
Hello Jann
Thanks for your review.
> As far as I can see, on systems affected by Meltdown, this patch fixes
> register state leakage between tasks because any data that is written
> to the per-CPU trampoline stacks must be considered visible to all
> userspace. I think that makes this a fix that should go into stable
> kernels.
I know what you mean since I did similarly as you said in another
project. But I hesitated to claim that. Not everyone goes too
paranoid to hide all registers. I consider them a nice cleanup
the makes future entry_64.S better.
>
> Therefore, please add:
>
> Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for
> IDT entries")
> Cc: stable@...r.kernel.org
>
>
> > ---
> > arch/x86/entry/entry_64.S | 43 +++++++++++++++++++++++++++++++--------
> > 1 file changed, 34 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 70dea9337816..1a7715430da3 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -928,19 +928,42 @@ SYM_CODE_END(paranoid_exit)
> > SYM_CODE_START_LOCAL(error_entry)
> > UNWIND_HINT_FUNC
> > cld
> > - PUSH_AND_CLEAR_REGS save_ret=1
> > - ENCODE_FRAME_POINTER 8
> > - testb $3, CS+8(%rsp)
> > + testb $3, CS-ORIG_RAX+8(%rsp)
> > jz .Lerror_kernelspace
> >
> > - /*
> > - * We entered from user mode or we're pretending to have entered
> > - * from user mode due to an IRET fault.
> > - */
>
> As far as I can tell, this comment is still correct, and it is
> helpful. Why are you removing it?
This comment actually describes .Lerror_entry_from_usermode_after_swapgs
which is kind of far from here.
"We entered from user mode" is already self-explained by the code above.
"we're pretending to have entered from user mode due to an IRET fault"
notes code reviewers that .Lerror_bad_iret can jump to
.Lerror_entry_from_usermode_after_swapgs.
.Lerror_entry_from_usermode_after_swapgs will be removed in the next
patch.
Since the comment is too far from its target, and the target
will be removed in the next patch, so I remove the comments.
Maybe I should move the removal of the comment in the next patch?
But should I rewrite the comments before that?
>
> > SWAPGS
> > FENCE_SWAPGS_USER_ENTRY
> > - /* We have user CR3. Change to kernel CR3. */
> > - SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
> > + /*
> > + * Switch to the thread stack. The IRET frame and orig_ax are
> > + * on the stack, as well as the return address. RDI..R12 are
>
> Did you mean RDI..R15?
Good catch!
The whole code along with the comments is copied from original
interrupt_entry. The "RDI..R12" in the comment has been moved
several times and can be traced back to 7f2590a110b8
("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
(You already just referred to this commit).
>
> > + * not (yet) on the stack and space has not (yet) been
> > + * allocated for them.
> > + */
> > + pushq %rdx
> > +
> > + /* Need to switch before accessing the thread stack. */
> > + SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
> > + movq %rsp, %rdx
> > + movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>
> Can we avoid spilling %rdx to the meltdown-readable entry stack here?
> We could do something similar to what entry_SYSCALL_64 does, roughly
> like this:
We must spill at least one register besides %rsp. I once used %rbp
for scratch_reg in a not-yet-opensourced project. I think
spilling %rbp is safer than spilling other registers, but I can't
find strong enough reasoning.
The feeble reason at hand is that %rbp is often used as the frame
pointer in userspace which is safer to be leaked since %rsp is
already leaked.
See https://github.com/google/gvisor/pull/2256#discussion_r399005782
("implement KPTI for gvisor")
I once recommended spilling %rbp in gVisor.
In the pull-request in gVisor, even Google didn't show their
eagerness to hide the kernel along with the registers from APPs.
If you also consider spilling %rbp is better than spilling
%rdx and other x86 people also want that, I can do it
in other patches.
1) This patch uses "PUSH_AND_CLEAR_REGS rdx=(%rdx), save_ret=1".
And using %rbp needs to change PUSH_AND_CLEAR_REGS which
is better to be in a separated commit.
2) The code in entry_64.S spills other %rdi %rdx etc. to
entry stack. They need to be changed to %rbp too.
>
>
> /*
> * While there is an iret frame, it won't be easy to find for a
> * few instructions, so let's pretend it doesn't exist.
> */
> UNWIND_HINT_EMPTY
>
> /*
> * Switch to kernel CR3 and stack. To avoid spilling secret
> * userspace register state to the trampoline stack, we use
> * RSP as scratch - we can reconstruct the old RSP afterwards
> * using TSS_sp0.
> */
> SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
It is already in the fine state in the proper place
in the stack.
Abusing %rsp is the source of perdition. Linus, tglx and
many others who have been touched entry_64.S have hated
syscall instruction for not automatically switching %rsp
and it is the source of many hateful IST exceptions.
See https://lore.kernel.org/lkml/875z98jkof.fsf@nanos.tec.linutronix.de/
I would not like to raise any objection for this reason
since I also hate abusing %rsp.
I expected you to get tonnes of replies if tglx ever notices
your such suggestion.
> movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>
> pushq %rdx /* scratch, will be replaced with regs->ss later */
> mov PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rdx
> sub $7*8, %rdx /* return address, orig_ax, IRET frame */
> /*
> * We have return address and orig_ax on the stack on
> * top of the IRET frame. That means offset=2*8
> */
> UNWIND_HINT_IRET_REGS base=%rdx offset=-5*8
>
> pushq -2*8(%rdx) /* regs->rsp */
> pushq -3*8(%rdx) /* regs->eflags */
> pushq -4*8(%rdx) /* regs->cs */
> pushq -5*8(%rdx) /* regs->ip */
> pushq -6*8(%rdx) /* regs->orig_ax */
> pushq -7*8(%rdx) /* return address */
> UNWIND_HINT_FUNC
>
> PUSH_AND_CLEAR_REGS rdx=7*8(%rsp), save_ret=1
>
> /* copy regs->ss from trampoline stack */
> movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rax
> mov -1*8(%rax), %rax
> movq %rax, 20*8(%rsp)
>
> ENCODE_FRAME_POINTER 8
>
> ret
Thanks for your review and your recommendations.
Lai.
Powered by blists - more mailing lists