linux-kernel - Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit special case

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5bbb54c4-7504-cd28-5dde-4e5965496625@gmail.com>
Date:   Thu, 6 Jan 2022 00:39:10 +0300
From:   Dmitry Osipenko <digetx@...il.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
        linux-api@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Alexey Gladkov <legion@...nel.org>,
        Kyle Huey <me@...ehuey.com>, Oleg Nesterov <oleg@...hat.com>,
        Kees Cook <keescook@...omium.org>,
        Al Viro <viro@...IV.linux.org.uk>
Subject: Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit
 special case

05.01.2022 22:58, Eric W. Biederman пишет:
> Dmitry Osipenko <digetx@...il.com> writes:
> 
>> 14.12.2021 01:53, Eric W. Biederman пишет:
>>> Simplify the code that allows SIGKILL during coredumps to terminate
>>> the coredump.  As far as I can tell I have avoided breaking it
>>> by dumb luck.
>>>
>>> Historically with all of the other threads stopping in exit_mm the
>>> wants_signal loop in complete_signal would find the dumper task and
>>> then complete_signal would wake the dumper task with signal_wake_up.
>>>
>>> After moving the coredump_task_exit above the setting of PF_EXITING in
>>> commit 92307383082d ("coredump: Don't perform any cleanups before
>>> dumping core") wants_signal will consider all of the threads in a
>>> multi-threaded process for waking up, not just the core dumping task.
>>>
>>> Luckily complete_signal short circuits SIGKILL during a coredump marks
>>> every thread with SIGKILL and signal_wake_up.  This code is arguably
>>> buggy however as it tries to skip creating a group exit when is already
>>> present, and it fails that a coredump is in progress.
>>>
>>> Ever since commit 06af8679449d ("coredump: Limit what can interrupt
>>> coredumps") was added dump_interrupted needs not just TIF_SIGPENDING
>>> set on the dumper task but also SIGKILL set in it's pending bitmap.
>>> This means that if the code is ever fixed not to short-circuit and
>>> kill a process after it has already been killed the special case
>>> for SIGKILL during a coredump will be broken.
>>>
>>> Sort all of this out by making the coredump special case more special,
>>> and perform all of the work in prepare_signal and leave the rest of
>>> the signal delivery path out of it.
>>>
>>> In prepare_signal when the process coredumping is sent SIGKILL find
>>> the task performing the coredump and use sigaddset and signal_wake_up
>>> to ensure that task reports fatal_signal_pending.
>>>
>>> Return false from prepare_signal to tell the rest of the signal
>>> delivery path to ignore the signal.
>>>
>>> Update wait_for_dump_helpers to perform a wait_event_killable wait
>>> so that if signal_pending gets set spuriously the wait will not
>>> be interrupted unless fatal_signal_pending is true.
>>>
>>> I have tested this and verified I did not break SIGKILL during
>>> coredumps by accident (before or after this change).  I actually
>>> thought I had and I had to figure out what I had misread that kept
>>> SIGKILL during coredumps working.
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@...ssion.com>
>>> ---
>>>  fs/coredump.c   |  4 ++--
>>>  kernel/signal.c | 11 +++++++++--
>>>  2 files changed, 11 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/coredump.c b/fs/coredump.c
>>> index a6b3c196cdef..7b91fb32dbb8 100644
>>> --- a/fs/coredump.c
>>> +++ b/fs/coredump.c
>>> @@ -448,7 +448,7 @@ static void coredump_finish(bool core_dumped)
>>>  static bool dump_interrupted(void)
>>>  {
>>>  	/*
>>> -	 * SIGKILL or freezing() interrupt the coredumping. Perhaps we
>>> +	 * SIGKILL or freezing() interrupted the coredumping. Perhaps we
>>>  	 * can do try_to_freeze() and check __fatal_signal_pending(),
>>>  	 * but then we need to teach dump_write() to restart and clear
>>>  	 * TIF_SIGPENDING.
>>> @@ -471,7 +471,7 @@ static void wait_for_dump_helpers(struct file *file)
>>>  	 * We actually want wait_event_freezable() but then we need
>>>  	 * to clear TIF_SIGPENDING and improve dump_interrupted().
>>>  	 */
>>> -	wait_event_interruptible(pipe->rd_wait, pipe->readers == 1);
>>> +	wait_event_killable(pipe->rd_wait, pipe->readers == 1);
>>>  
>>>  	pipe_lock(pipe);
>>>  	pipe->readers--;
>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>> index 8272cac5f429..7e305a8ec7c2 100644
>>> --- a/kernel/signal.c
>>> +++ b/kernel/signal.c
>>> @@ -907,8 +907,15 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
>>>  	sigset_t flush;
>>>  
>>>  	if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
>>> -		if (!(signal->flags & SIGNAL_GROUP_EXIT))
>>> -			return sig == SIGKILL;
>>> +		struct core_state *core_state = signal->core_state;
>>> +		if (core_state) {
>>> +			if (sig == SIGKILL) {
>>> +				struct task_struct *dumper = core_state->dumper.task;
>>> +				sigaddset(&dumper->pending.signal, SIGKILL);
>>> +				signal_wake_up(dumper, 1);
>>> +			}
>>> +			return false;
>>> +		}
>>>  		/*
>>>  		 * The process is in the middle of dying, nothing to do.
>>>  		 */
>>>
>>
>> Hi,
>>
>> This patch breaks userspace, in particular it breaks gst-plugin-scanner
>> of GStreamer which hangs now on next-20211224. IIUC, this tool builds a
>> registry of good/working GStreamer plugins by loading them and
>> blacklisting those that don't work (crash). Before the hang I see
>> systemd-coredump process running, taking snapshot of gst-plugin-scanner
>> and then gst-plugin-scanner gets stuck.
>>
>> Bisection points at this patch, reverting it restores
>> gst-plugin-scanner. Systemd-coredump still running, but there is no hang
>> anymore and everything works properly as before.
>>
>> I'm seeing this problem on ARM32 and haven't checked other arches.
>> Please fix, thanks in advance.
> 
> 
> I have not yet been able to figure out how to run gst-pluggin-scanner in
> a way that triggers this yet.  In truth I can't figure out how to
> run gst-pluggin-scanner in a useful way.
> 
> I am going to set up some unit tests and see if I can reproduce your
> hang another way, but if you could give me some more information on what
> you are doing to trigger this I would appreciate it.

Thanks, Eric. The distro is Arch Linux, but it's a development
environment where I'm running latest GStreamer from git master. I'll try
to figure out the reproduction steps and get back to you.