linux-kernel - Re: [RFC PATCH] ptrace: add PTRACE_GET_SYSCALL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrV1v-DPRfDRwiH=xn29bxWxiHdZtAH1nw=dsmDtnT0YGQ@mail.gmail.com>
Date:   Wed, 7 Nov 2018 12:44:58 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     lineprinter@...linux.org
Cc:     Oleg Nesterov <oleg@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "Dmitry V. Levin" <ldv@...linux.org>,
        Eugene Syromiatnikov <esyr@...hat.com>,
        Andrew Lutomirski <luto@...nel.org>,
        strace-devel@...ts.strace.io
Subject: Re: [RFC PATCH] ptrace: add PTRACE_GET_SYSCALL_INFO request

> On Nov 6, 2018, at 7:27 PM, Elvira Khabirova <lineprinter@...linux.org> wrote:
>
> PTRACE_GET_SYSCALL_INFO lets ptracer obtain details of the syscall
> the tracee is blocked in. The request returns meaningful data only
> when the tracee is in a syscall-enter-stop or a syscall-exit-stop.
>
> There are two reasons for a special syscall-related ptrace request.
>
> Firstly, with the current ptrace API there are cases when ptracer cannot
> retrieve necessary information about syscalls. Some examples include:
> * The notorious int-0x80-from-64-bit-task issue. See [1] for details.
> In short, if a 64-bit task performs a syscall through int 0x80, its tracer
> has no reliable means to find out that the syscall was, in fact,
> a compat syscall, and misidentifies it.
> * Syscall-enter-stop and syscall-exit-stop look the same for the tracer.
> Common practice is to keep track of the sequence of ptrace-stops in order
> not to mix the two syscall-stops up. But it is not as simple as it looks;
> for example, strace had a (just recently fixed) long-standing bug where
> attaching strace to a tracee that is performing the execve system call
> led to the tracer identifying the following syscall-exit-stop as
> syscall-enter-stop, which messed up all the state tracking.
> * Since the introduction of commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3
> ("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA
> and process_vm_readv become unavailable when the process dumpable flag
> is cleared. On ia64 this results in all syscall arguments being unavailable.
>
> Secondly, ptracers also have to support a lot of arch-specific code for
> obtaining information about the tracee. For some architectures, this
> requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
> argument and return value.
>
> PTRACE_GET_SYSCALL_INFO returns the following structure:
>
> struct ptrace_syscall_info {
>    __u8 op; /* 0 for entry, 1 for exit */

Please consider adding another op for a seccomp stop.

>    __u8 __pad0[7];
>    union {
>        struct {
>            __u64 nr;
>            __u64 ip;
>            __u64 args[6];
>            __u8 is_compat;
>            __u8 __pad1[7];
>        } entry_info;
>        struct {
>            __s64 rval;
>            __u8 is_error;
>            __u8 __pad2[7];
>        } exit_info;
>    };
> };
>
> The structure was chosen according to [2], except for two changes.
> First: instead of an arch field with a value of AUDIT_ARCH_*, a boolean
> is_compat value is returned, because a) not all arches have an AUDIT_ARCH_*
> defined for them, b) the tracer already knows what *arch* it is running on,
> but it does not know whether the tracee/syscall is in compat mode or not.

I don’t like this for a few reasons:

1. A 32-bit tracer can’t readily tell what is_compat == 0 means.

2. There is no actual guarantee that there are only two syscall
architectures available.  In fact, I think that arm64 is seriously
considering adding a third.  x86 ought to have three, but, for
arguably dubious historical reasons, it only has two, and x32 is
distinguished only by nr.

3. Your patch will be a whole lot shorter if you use
syscall_get_arch().  You'd have to add syscall_get_arch()
implementations for the remaining architectures, but that's still less
code.

> Second: a boolean is_error value is added to rval. This way the tracer can
> more reliably distinguish a return value from an error value.

Sounds reasonable to me.

Also, maybe use the extra parameter to ptrace to have userspace pass
in the size of the structure so that more fields can be added later if
needed.