[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87pmp67y4r.fsf@email.froward.int.ebiederm.org>
Date: Wed, 05 Jan 2022 13:58:44 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Dmitry Osipenko <digetx@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
linux-api@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Alexey Gladkov <legion@...nel.org>,
Kyle Huey <me@...ehuey.com>, Oleg Nesterov <oleg@...hat.com>,
Kees Cook <keescook@...omium.org>,
Al Viro <viro@...IV.linux.org.uk>
Subject: Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit special case
Dmitry Osipenko <digetx@...il.com> writes:
> 14.12.2021 01:53, Eric W. Biederman пишет:
>> Simplify the code that allows SIGKILL during coredumps to terminate
>> the coredump. As far as I can tell I have avoided breaking it
>> by dumb luck.
>>
>> Historically with all of the other threads stopping in exit_mm the
>> wants_signal loop in complete_signal would find the dumper task and
>> then complete_signal would wake the dumper task with signal_wake_up.
>>
>> After moving the coredump_task_exit above the setting of PF_EXITING in
>> commit 92307383082d ("coredump: Don't perform any cleanups before
>> dumping core") wants_signal will consider all of the threads in a
>> multi-threaded process for waking up, not just the core dumping task.
>>
>> Luckily complete_signal short circuits SIGKILL during a coredump marks
>> every thread with SIGKILL and signal_wake_up. This code is arguably
>> buggy however as it tries to skip creating a group exit when is already
>> present, and it fails that a coredump is in progress.
>>
>> Ever since commit 06af8679449d ("coredump: Limit what can interrupt
>> coredumps") was added dump_interrupted needs not just TIF_SIGPENDING
>> set on the dumper task but also SIGKILL set in it's pending bitmap.
>> This means that if the code is ever fixed not to short-circuit and
>> kill a process after it has already been killed the special case
>> for SIGKILL during a coredump will be broken.
>>
>> Sort all of this out by making the coredump special case more special,
>> and perform all of the work in prepare_signal and leave the rest of
>> the signal delivery path out of it.
>>
>> In prepare_signal when the process coredumping is sent SIGKILL find
>> the task performing the coredump and use sigaddset and signal_wake_up
>> to ensure that task reports fatal_signal_pending.
>>
>> Return false from prepare_signal to tell the rest of the signal
>> delivery path to ignore the signal.
>>
>> Update wait_for_dump_helpers to perform a wait_event_killable wait
>> so that if signal_pending gets set spuriously the wait will not
>> be interrupted unless fatal_signal_pending is true.
>>
>> I have tested this and verified I did not break SIGKILL during
>> coredumps by accident (before or after this change). I actually
>> thought I had and I had to figure out what I had misread that kept
>> SIGKILL during coredumps working.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@...ssion.com>
>> ---
>> fs/coredump.c | 4 ++--
>> kernel/signal.c | 11 +++++++++--
>> 2 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/coredump.c b/fs/coredump.c
>> index a6b3c196cdef..7b91fb32dbb8 100644
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -448,7 +448,7 @@ static void coredump_finish(bool core_dumped)
>> static bool dump_interrupted(void)
>> {
>> /*
>> - * SIGKILL or freezing() interrupt the coredumping. Perhaps we
>> + * SIGKILL or freezing() interrupted the coredumping. Perhaps we
>> * can do try_to_freeze() and check __fatal_signal_pending(),
>> * but then we need to teach dump_write() to restart and clear
>> * TIF_SIGPENDING.
>> @@ -471,7 +471,7 @@ static void wait_for_dump_helpers(struct file *file)
>> * We actually want wait_event_freezable() but then we need
>> * to clear TIF_SIGPENDING and improve dump_interrupted().
>> */
>> - wait_event_interruptible(pipe->rd_wait, pipe->readers == 1);
>> + wait_event_killable(pipe->rd_wait, pipe->readers == 1);
>>
>> pipe_lock(pipe);
>> pipe->readers--;
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 8272cac5f429..7e305a8ec7c2 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -907,8 +907,15 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
>> sigset_t flush;
>>
>> if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
>> - if (!(signal->flags & SIGNAL_GROUP_EXIT))
>> - return sig == SIGKILL;
>> + struct core_state *core_state = signal->core_state;
>> + if (core_state) {
>> + if (sig == SIGKILL) {
>> + struct task_struct *dumper = core_state->dumper.task;
>> + sigaddset(&dumper->pending.signal, SIGKILL);
>> + signal_wake_up(dumper, 1);
>> + }
>> + return false;
>> + }
>> /*
>> * The process is in the middle of dying, nothing to do.
>> */
>>
>
> Hi,
>
> This patch breaks userspace, in particular it breaks gst-plugin-scanner
> of GStreamer which hangs now on next-20211224. IIUC, this tool builds a
> registry of good/working GStreamer plugins by loading them and
> blacklisting those that don't work (crash). Before the hang I see
> systemd-coredump process running, taking snapshot of gst-plugin-scanner
> and then gst-plugin-scanner gets stuck.
>
> Bisection points at this patch, reverting it restores
> gst-plugin-scanner. Systemd-coredump still running, but there is no hang
> anymore and everything works properly as before.
>
> I'm seeing this problem on ARM32 and haven't checked other arches.
> Please fix, thanks in advance.
I have not yet been able to figure out how to run gst-pluggin-scanner in
a way that triggers this yet. In truth I can't figure out how to
run gst-pluggin-scanner in a useful way.
I am going to set up some unit tests and see if I can reproduce your
hang another way, but if you could give me some more information on what
you are doing to trigger this I would appreciate it.
Eric
Powered by blists - more mailing lists