linux-kernel - Re: [PATCH V2 2/7] x86/entry: Switch the stack after error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJhGHyBh3Ley20+_c=jxrxCq+h1OzTshw+QnEVawiTFOUkSx=A@mail.gmail.com>
Date:   Mon, 7 Mar 2022 22:39:33 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Lai Jiangshan <jiangshan.ljs@...group.com>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns

On Thu, Mar 3, 2022 at 4:00 PM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Mar 03, 2022 at 11:54:29AM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@...group.com>
> >
> > error_entry() calls sync_regs() to settle/copy the pt_regs and switches
> > the stack directly after sync_regs().  But error_entry() itself is also
> > a function call, the switching has to handle the return address of it
> > together, which causes the work complicated and tangly.
> >
> > Switching to the stack after error_entry() makes the code simpler and
> > intuitive.
> >
> > Signed-off-by: Lai Jiangshan <jiangshan.ljs@...group.com>
> > ---
> >  arch/x86/entry/entry_64.S | 16 ++++++----------
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 24846284eda5..a51cad2b7fff 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
> >  .macro idtentry_body cfunc has_error_code:req
> >
> >       call    error_entry
> > +     movq    %rax, %rsp                      /* switch stack settled by sync_regs() */
> > +     ENCODE_FRAME_POINTER
> >       UNWIND_HINT_REGS
> >
> >       movq    %rsp, %rdi                      /* pt_regs pointer into 1st argument*/
> > @@ -980,14 +982,10 @@ SYM_CODE_START_LOCAL(error_entry)
> >       /* We have user CR3.  Change to kernel CR3. */
> >       SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
> >
> > +     leaq    8(%rsp), %rdi                   /* arg0 = pt_regs pointer */
> >  .Lerror_entry_from_usermode_after_swapgs:
> >       /* Put us onto the real thread stack. */
> > -     popq    %r12                            /* save return addr in %12 */
> > -     movq    %rsp, %rdi                      /* arg0 = pt_regs pointer */
> >       call    sync_regs
> > -     movq    %rax, %rsp                      /* switch stack */
> > -     ENCODE_FRAME_POINTER
> > -     pushq   %r12
> >       RET
> >
> >       /*
> > @@ -1019,6 +1017,7 @@ SYM_CODE_START_LOCAL(error_entry)
> >        */
> >  .Lerror_entry_done_lfence:
> >       FENCE_SWAPGS_KERNEL_ENTRY
> > +     leaq    8(%rsp), %rax                   /* return pt_regs pointer */
> >       RET
> >
> >  .Lbstep_iret:
> > @@ -1039,12 +1038,9 @@ SYM_CODE_START_LOCAL(error_entry)
> >        * Pretend that the exception came from user mode: set up pt_regs
> >        * as if we faulted immediately after IRET.
> >        */
> > -     popq    %r12                            /* save return addr in %12 */
> > -     movq    %rsp, %rdi                      /* arg0 = pt_regs pointer */
> > +     leaq    8(%rsp), %rdi                   /* arg0 = pt_regs pointer */
> >       call    fixup_bad_iret
> > -     mov     %rax, %rsp
> > -     ENCODE_FRAME_POINTER
> > -     pushq   %r12
> > +     mov     %rax, %rdi
> >       jmp     .Lerror_entry_from_usermode_after_swapgs
> >  SYM_CODE_END(error_entry)
>
> I can't seem to follow; perhaps I need more wake-up-juice?
>
> Suppose we have .Lerror_bad_iret, then the code flow is something like:
>
>
>         old                                             new
>
> .Lerror_bad_iret
>                                 SWAWPGS business
>
>         popq    %r12                                    leaq    8(%rsp), %rsi
>         movq    %rsp, %rdi
>         call    fixup_bad_iret                          call    fixup_bad_iret
>         mov     %rax, %rsp                              mov     %rax, %rdi
>         ENCODE_FRAME_POINTER
>         pushq   %r12
>
>                 jmp     .Lerror_entry_from_usermode_after_swapgs
>
> .Lerror_entry_from_usermode_after_swapgs
>         pop     %r12
>         movq    %rsp, %rdi
>         call    sync_regs                               call    sync_regs
>         mov     %rax, %rsp
>         ENCODE_FRAME_POINTER
>         push    %r12
>         RET                                             RET
>
>
> The thing that appears to me is that syn_regs is called one stack switch
> short. This isn't mentioned in the Changelog nor in the comments. Why is
> this OK?
>
>

I has not been confident enough with the meaning of "short" or
"call short" in my understanding of English.

So I tried hard to find the semantic difference between the code
before and after the patch.

The logic are the same:
  1) feed fixup_bad_iret() with the pt_regs pushed by ASM code
  2) fixup_bad_iret() moves the partial pt_regs up
  3) feed sync_regs() with the pt_regs returned by fixup_bad_iret()
  4) sync_regs() copies the whole pt_regs to kernel stack if needed
  5) after error_entry(), it is in kernel stack with the pt_regs

Differences:
  Old code switches to copied pt_regs immediately twice in
  error_entry() while new code switches to the copied pt_regs only
  once after error_entry() returns.
  It is correct since sync_regs() doesn't need to be called close
  to the pt_regs it handles.  And this is the point of this
  patch which switches stack once only.

  Old code stashes the return-address of error_entry() in a scratch
  register and new code doesn't stash it.
  It relies on the fact that fixup_bad_iret() and sync_regs() don't
  corrupt the return-address of error_entry().  But even the old code
  also relies on the fact that fixup_bad_iret() and sync_regs() don't
  corrupt the return-address of themselves.  They are the same reliance.

The code had been tested with my additial tracing code in kernel and
the x86 selftest code which includes all kinds of cases for bad iret.
(tools/testing/selftests/x86/sigreturn.c)