linux-kernel - Re: [PATCH] entry: Fix missed trap after single-step on system call return

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrWpouBd+DqVu594B-94MQH_D0D7sECXZHEoAa+=X-_0=A@mail.gmail.com>
Date:   Wed, 3 Feb 2021 10:18:11 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Gabriel Krisman Bertazi <krisman@...labora.com>,
        Kyle Huey <me@...ehuey.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andy Lutomirski <luto@...nel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "Robert O'Callahan" <rocallahan@...il.com>
Subject: Re: [PATCH] entry: Fix missed trap after single-step on system call return

On Wed, Feb 3, 2021 at 10:10 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Wed, Feb 3, 2021 at 10:00 AM Gabriel Krisman Bertazi
> <krisman@...labora.com> wrote:
> >
> > Does the patch below follows your suggestion?  I'm setting the
> > SYSCALL_WORK shadowing TIF_SINGLESTEP every time, instead of only when
> > the child is inside a system call.  Is this acceptable?
>
> Looks sane to me.
>
> My main worry would be about "what about the next system call"? It's
> not what Kyle's case cares about, but let me just give an example:
>
>  - task A traces task B, and starts single-stepping. Task B was *not*
> in a system call at this point.
>
>  - task B happily executes one instruction at a time, takes a TF
> fault, everything is good
>
>  - task B now does a system call. That will disable single-stepping
> while in the kernel
>
>  - task B returns from the system call. TF will be set in eflags, but
> the first instruction *after* the system call will execute unless we
> go through the system call exit path
>
> So I think the tracer basically misses one instruction when single-stepping.

I was hoping you wouldn't ask this :)

The x86 architecture is fundamentally a bit busted here.  If we return
from a system call with SYSRET and TF is set in R11, then SYSRET
traps, which means that #DB is delivered before executing a user
instruction.  I have been asking Intel for quite a while to document
this, and they said they did, but I still can't find it.  IRET is the
opposite: if we return from a system call with IRET and TF is set on
the stack, we execute one user instruction and then trap.

So if we want to reliably single-step a system call and trap after the
system call, we just need to synthesize a trap on the way out.  Doing
this and getting all the nasty corners (e.g. sigreturn setting TF,
sigreturn *clearing* TF, signal delivery as part of the syscall,
ptrace mucking with TF) etc right might be nontrivial.

I suspect the behavior back in the bad old asm-entry-path days was at
best inconsistent.

--Andy