linux-kernel - Re: [PATCH] signal: SIGKILL can cause signal effects to appear at PTRACE_EVENT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP045ApAX725ZfujaK-jJNkfCo5s+oVFpBvNfPJk+DKY8K7d=Q@mail.gmail.com>
Date:   Tue, 2 Nov 2021 12:09:56 -0700
From:   Kyle Huey <me@...ehuey.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Jens Axboe <axboe@...nel.dk>,
        Peter Zijlstra <peterz@...radead.org>,
        Marco Elver <elver@...gle.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Collingbourne <pcc@...gle.com>,
        Alexey Gladkov <legion@...nel.org>,
        "Robert O'Callahan" <rocallahan@...il.com>,
        Marko Mäkelä <marko.makela@...iadb.com>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] signal: SIGKILL can cause signal effects to appear at
 PTRACE_EVENT_EXIT without tracer notification

On Tue, Nov 2, 2021 at 11:07 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>
> Kyle Huey <me@...ehuey.com> writes:
>
> > On Tue, Nov 2, 2021 at 7:09 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
> >>
> >> Kyle Huey <me@...ehuey.com> writes:
> >>
> >> > rr, a userspace record and replay debugger[0], uses the recorded register
> >> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
> >> > executing the program during replay.
> >> >
> >> > If a SIGKILL races with processing another signal in get_signal, it is
> >> > possible for the kernel to decline to notify the tracer of the original
> >> > signal. But if the original signal had a handler, the kernel proceeds
> >> > with setting up a signal handler frame as if the tracer had chosen to
> >> > deliver the signal unmodified to the tracee. When the kernel goes to
> >> > execute the signal handler that it has now modified the stack and registers
> >> > for, it will discover the pending SIGKILL, and terminate the tracee
> >> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
> >> > the tracer, however, the effects of handler setup will be visible to
> >> > the tracer.
> >> >
> >> > Because rr (the tracer) was never notified of the signal, it is not aware
> >> > that a signal handler frame was set up and expects the state of the program
> >> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
> >> > by allowing the program to execute from the last event. When that fails
> >> > to happen during replay, rr will assert and die.
> >> >
> >> > The following patches add an explicit check for a newly pending SIGKILL
> >> > after the ptracer has been notified and the siglock has been reacquired.
> >> > If this happens, we stop processing the current signal and proceed
> >> > immediately to handling the SIGKILL. This makes the state reported at
> >> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
> >> > work to set up a signal handler frame that will never be used.
> >> >
> >> > This issue was originally reported by the credited rr user.
> >> >
> >> > [0] https://rr-project.org/
> >>
> >> If I read this correctly the problem is not precisely that the rr
> >> debugger is never notified about the signal, but rather that the program
> >> is killed with SIGKILL before rr can read the notification and see which
> >> signal it is.
> >
> > The precise problem is that the kernel made a modification to the
> > tracee state (setting up the signal handler frame) without telling the
> > tracer about it (delivering the ptrace notification for the pending
> > non-SIGKILL signal).
>
> Except the kernel did make it to ptrace_stop.  The stop just did not
> fully happen because of SIGKILL.  I expect SIGCHLD was sent to the
> tracer as part of that stop that never fully happened.

I don't know whether SIGCHLD was sent to the tracer (rr doesn't use it
directly) but waiting on the process does not produce a wait status
corresponding to the signal delivery stop for the original signal.
Waiting on the tracee skips immediately from whatever the preceding
ptrace event was to the PTRACE_EVENT_EXIT.

(In our particular case, if it had been notified of the signal, we
would have chosen to suppress the signal, because the signal in
question is a SIGSEGV from an rdtsc instruction that has been disabled
via prctl(PR_SET_TSC, PR_TSC_SIGSEGV) and we emulate it in the tracer
due to its non-deterministic behavior. So we really don't expect to
see the tracee signal handler.)

> > That can be fixed either by not modifying the
> > tracee state here or by telling the tracer about the signal (that will
> > never actually run). I suspect we'll all agree that the former seems
> > preferable.
> >
> >> This definitely sounds like a quality of implementation issue.
> >>
> >> The solution that is proposed in your patches simply drops the signal
> >> when SIGKILL is pending.
> >
> > That's right.
> >
> >> I think we can have a slightly better of quality of implementation
> >> than that (as well as a simpler implementation) by requeuing the
> >> signal instead of simply dropping it.  Something like the below.
> >
> > What is the benefit of requeueing the signal? All pending signals will
> > be dropped when the SIGKILL is processed, no?
>
> Not before PTRACE_EVENT_EXIT.  In fact the pending signals are not
> actually flushed until the thread or the entire process is reaped.
>
> Further the coredump code makes some attempt to write out the
> pending signals.  The code appears to predate siginfo support
> in the kernel so it misses a lot but it is there.
>
> The real advantage is that it keeps the logic of dealing with weird
> ptrace_stop logic in ptrace_signal where it belongs.  It also allows the
> common (and missing in this case) idiom of goto relock to be used.
>
> So I think changing ptrace_signal will be much more maintainable.

Ok.

> >> Can you test that and see if it works for you?
> >
> > It does not work. This triggers an infinite loop in get_signal, as we
> > dequeue the signal, attempt to notify the ptracer, see the pending
> > sigkill, requeue the signal, go around the loop, dequeue the original
> > signal ...
>
> Apologies I made a bit of a thinko.  That change also needs to change
> the handling of if (signr == 0) after ptrace_signal.
>
> Which means it would need to be something like the below.
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 056a107e3cbc..eddb745b34a7 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2610,7 +2610,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
>         }
>
>         /* If the (new) signal is now blocked, requeue it.  */
> -       if (sigismember(&current->blocked, signr)) {
> +       if (sigismember(&current->blocked, signr) ||
> +           signal_group_exit(current->signal)) {
>                 send_signal(signr, info, current, PIDTYPE_PID);
>                 signr = 0;
>         }
> @@ -2764,8 +2765,10 @@ bool get_signal(struct ksignal *ksig)
>                 if (unlikely(current->ptrace) && (signr != SIGKILL) &&
>                     !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
>                         signr = ptrace_signal(signr, &ksig->info);
> -                       if (!signr)
> -                               continue;
> +                       if (!signr) {
> +                               spin_unlock_irq(&sighand->siglock);
> +                               goto relock;
> +                       }
>                 }
>
>                 ka = &sighand->action[signr-1];
>
> Eric

Yeah that appears to fix the issue.

- Kyle