linux-kernel - Re: [PATCH] sched: __fatal_signal_pending() should also check PF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <871qu6bjp3.fsf@email.froward.int.ebiederm.org>
Date:   Wed, 27 Jul 2022 11:32:08 -0500
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     Tycho Andersen <tycho@...ho.pizza>
Cc:     "Serge E. Hallyn" <serge@...lyn.com>,
        Miklos Szeredi <miklos@...redi.hu>,
        linux-kernel@...r.kernel.org, Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH] sched: __fatal_signal_pending() should also check
 PF_EXITING

Tycho Andersen <tycho@...ho.pizza> writes:

> Hi all,
>
> On Wed, Jul 20, 2022 at 08:54:59PM -0500, Serge E. Hallyn wrote:
>> Oh - I didn't either - checking the sigkill in shared signals *seems*
>> legit if they can be put there - but since you posted the new patch I
>> assumed his reasoning was clear to you.  I know Eric's busy, cc:ing Oleg
>> for his interpretation too.
>
> Any thoughts on this?

Having __fatal_signal_pending check SIGKILL in shared signals is
completely and utterly wrong.

What __fatal_signal_pending reports is if a signal has gone through
short cirucuit delivery after determining that the delivery of the
signal will terminate the process.

Using "sigismember(&tsk->pending.signal, SIGKILL)" to report that a
fatal signal has experienced short circuit delivery is a bit of an
abuse, but essentially harmless as tkill of SIGKILL to a thread will
result in every thread in the process experiencing short circuit
delivery of the fatal SIGKILL.  So a pending SIGKILL can't really mean
anything else.

After having looked at the code a little more I can unfortunately also
say that testing PF_EXITING in __fatal_signal_pending will cause
kernel_wait4 in zap_pid_ns_processes to not sleep, and instead to return
0.  Which will cause zap_pid_ns_processes to busy wait.  That seems very
unfortunate.

I hadn't realized it at the time I wrote zap_pid_ns_processes but I
think anything called from do_exit that cares about signal pending state
is pretty much broken and needs to be fixed.

So the question is how do we fix the problem in fuse that shows up
during a pid namespace exit without having interruptible sleeps we need
to wake up?

What are the code paths that experience the problem?

Will refactoring zap_pid_ns_processes as I have proposed so that it does
not use kernel_wait4 help sort this out?  AKA make it work something
like thread group leader of a process and not allow wait to reap the
init process of a pid namespace until all of the processes in a pid
namespaces have been gone.  Not that I see the problem in using
kernel_wait4 it looks like zap_pid_ns_processes needs to stop calling
kernel_wait4 regardless of the fuse problem.

Eric