linux-kernel - Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87inmmbjsq.fsf@xmission.com>
Date:   Sun, 02 Apr 2017 16:07:17 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Aleksa Sarai <asarai@...e.com>,
        Andy Lutomirski <luto@...capital.net>,
        Attila Fazekas <afazekas@...hat.com>,
        Jann Horn <jann@...jh.net>, Kees Cook <keescook@...omium.org>,
        Michal Hocko <mhocko@...nel.org>,
        Ulrich Obergfell <uobergfe@...hat.com>,
        linux-kernel@...r.kernel.org, linux-api@...r.kernel.org
Subject: Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.

Oleg Nesterov <oleg@...hat.com> writes:

> Perhaps I am wrong, but I think you underestimate the problems, and it is
> not clear to me if we really want this.

I worked through quite a bit of it and I realized a few fundamental
issues.  The task struct must remain visible until it is reaped
and we use siglock to protect in unhash process to protect that reaping.
Further tsk->sighand == NULL winds up being a flag used to tell
if release_task has been called.

To get an usable count on sighand struct all that needed to be done
was to change the reference counting of sighand_struct to count
processes and not threads.

Which is what I wound up posting.

> Anyway, Eric, even if we can and want to do this, why we can't do this on
> top of my fix?

Because your reduction in scope of cred_guard_mutex is fundamentally
broken and unnecessary.

> I simply fail to understand why you dislike it that much. Yes it is not
> pretty, I said this many times, but it is safe in that it doesn't really
> change the current behaviour.

No it is not safe.  And it promotes wrong thinking which is even more
dangerous.

I reviewed the code and cred_guard_mutex needs to cover what it covers.

> I am much more worried about 2/2 you didn't argue with, this patch _can_
> break something and this is obviously not good even if PTRACE_EVENT_EXIT
> was always broken.

I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
know what the implications of changing it are.  Let's see...

gdb - no
upstart - no
lldb - yes
strace - no

It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change
to see if anything breaks.

I think we can get away with changing the exec case but it does look
worth testing.  I hadn't realized you hadn't looked to see what was
using PTRACE_O_TRACEEXIT to see if any part of userspace cares.

Hmm.  This is interesting.  From the strace documentation:
> Tracer cannot assume that ptrace-stopped tracee exists. There are many
> scenarios when tracee may die while stopped (such as SIGKILL).
> Therefore, tracer must always be prepared to handle ESRCH error on any
> ptrace operation. Unfortunately, the same error is returned if tracee
> exists but is not ptrace-stopped (for commands which require stopped
> tracee), or if it is not traced by process which issued ptrace call.
> Tracer needs to keep track of stopped/running state, and interpret
> ESRCH as "tracee died unexpectedly" only if it knows that tracee has
> been observed to enter ptrace-stop. Note that there is no guarantee
> that waitpid(WNOHANG) will reliably report tracee's death status if
> ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead.
> IOW: tracee may be "not yet fully dead" but already refusing ptrace
> ops.

If delivering a second SIGKILL to a ptraced stopped processes will
make it continue we have a very interesting out..

When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED)

Delivery of a SIGKILL to that task has queue SIGKILL
and call signal_wake_up_state(t, TASK_WAKEKILL).
Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL)
Which wakes up the process.

So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
before the tracers find it.

Therefore we are only talking a quality of implementation issue
if we actually stop and wait for the tracer or not.

....

Which brings us to your PTRACE_EVENT_EXIT patch.

I think may_ptrace_stop is tested in the wrong place,
and is probably buggy.

- We should send the signal in all cases except when the ptracing parent
  does not exist aka (!current->ptrace).  The siginfo contains enough
  information to understand what happened if anyone is listening.

- Then we should send the group stop.

- Then if we don't want to wait we should:
  __set_current_state(TASK_RUNNING)

- Then we should drop the locks and only call freezable_schedule if
  we want to wait.

That way userspace thinks someone else just sent a SIGKILL and killed
the thread before it had a chance to look (which is effectively what
we are doing).  That sounds idea for both core-dumps and this case.

Eric