linux-kernel - Re: [PATCH] sched: __fatal_signal_pending() should also check PF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220720150328.GA30749@mail.hallyn.com>
Date:   Wed, 20 Jul 2022 10:03:28 -0500
From:   "Serge E. Hallyn" <serge@...lyn.com>
To:     Tycho Andersen <tycho@...ho.pizza>
Cc:     "Eric W . Biederman" <ebiederm@...ssion.com>,
        Miklos Szeredi <miklos@...redi.hu>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched: __fatal_signal_pending() should also check
 PF_EXITING

On Wed, Jul 13, 2022 at 11:53:05AM -0600, Tycho Andersen wrote:
> The wait_* code uses signal_pending_state() to test whether a thread has
> been interrupted, which ultimately uses __fatal_signal_pending() to detect
> if there is a fatal signal.
> 
> When a pid ns dies, it does:
> 
>     group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX);
> 
> for all the tasks in the pid ns. That calls through:
> 
>     group_send_sig_info() ->
>       do_send_sig_info() ->
>         send_signal_locked() ->
>           __send_signal_locked()
> 
> which does:
> 
>     pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
> 
> which puts sigkill in the set of shared signals, but not the individual
> pending ones. When complete_signal() is called at the end of
> __send_signal_locked(), if the task already had PF_EXITING (i.e. was
> already waiting on something in its fd closing path like a fuse flush),
> complete_signal() will not wake up the thread, since wants_signal() checks
> PF_EXITING before testing for SIGKILL.
> 
> If tasks are stuck in a killable wait (e.g. a fuse flush operation), they
> won't see this shared signal, and will hang forever, since TIF_SIGPENDING
> is set, but the fatal signal can't be detected. So, let's also look for
> PF_EXITING in __fatal_signal_pending().
> 
> Signed-off-by: Tycho Andersen <tycho@...ho.pizza>

Cool, thanks for nailing this down!

I assume you've been running this on some boxes with no weird effects?

> ---
>  include/linux/sched/signal.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index cafbe03eed01..c20b7e1d89ef 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -402,7 +402,8 @@ static inline int signal_pending(struct task_struct *p)
>  
>  static inline int __fatal_signal_pending(struct task_struct *p)
>  {
> -	return unlikely(sigismember(&p->pending.signal, SIGKILL));
> +	return unlikely(sigismember(&p->pending.signal, SIGKILL) ||
> +			p->flags & PF_EXITING);

Looking around at the callers this does seem safe, but the name does
now seem misleading.  Should this be renamed to something like
exiting_or_fatal_signal_pending()?  

>  }
>  
>  static inline int fatal_signal_pending(struct task_struct *p)
> 
> base-commit: 32346491ddf24599decca06190ebca03ff9de7f8
> -- 
> 2.34.1
>