[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240116152210.GA12342@redhat.com>
Date: Tue, 16 Jan 2024 16:22:11 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Bernd Edlinger <bernd.edlinger@...mail.de>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
Alexey Dobriyan <adobriyan@...il.com>,
Kees Cook <keescook@...omium.org>,
Andy Lutomirski <luto@...capital.net>,
Will Drewry <wad@...omium.org>,
Christian Brauner <brauner@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.com>, Serge Hallyn <serge@...lyn.com>,
James Morris <jamorris@...ux.microsoft.com>,
Randy Dunlap <rdunlap@...radead.org>,
Suren Baghdasaryan <surenb@...gle.com>,
Yafang Shao <laoar.shao@...il.com>, Helge Deller <deller@....de>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Adrian Reber <areber@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>, Jens Axboe <axboe@...nel.dk>,
Alexei Starovoitov <ast@...nel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
linux-kselftest@...r.kernel.org, linux-mm@...ck.org,
tiozhang <tiozhang@...iglobal.com>,
Luis Chamberlain <mcgrof@...nel.org>,
"Paulo Alcantara (SUSE)" <pc@...guebit.com>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Frederic Weisbecker <frederic@...nel.org>,
YueHaibing <yuehaibing@...wei.com>,
Paul Moore <paul@...l-moore.com>, Aleksa Sarai <cyphar@...har.com>,
Stefan Roesch <shr@...kernel.io>, Chao Yu <chao@...nel.org>,
xu xin <xu.xin16@....com.cn>, Jeff Layton <jlayton@...nel.org>,
Jan Kara <jack@...e.cz>, David Hildenbrand <david@...hat.com>,
Dave Chinner <dchinner@...hat.com>, Shuah Khan <shuah@...nel.org>,
Zheng Yejian <zhengyejian1@...wei.com>,
Elena Reshetova <elena.reshetova@...el.com>,
David Windsor <dwindsor@...il.com>,
Mateusz Guzik <mjguzik@...il.com>, Ard Biesheuvel <ardb@...nel.org>,
"Joel Fernandes (Google)" <joel@...lfernandes.org>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Hans Liljestrand <ishkamiel@...il.com>
Subject: Re: [PATCH v14] exec: Fix dead-lock in de_thread with ptrace_attach
I'll try to recall this problem and actually read the patch tommorrow...
Hmm. but it doesn't apply to Linus's tree, you need to rebase it.
In particular, please note the recent commit 5431fdd2c181dd2eac2
("ptrace: Convert ptrace_attach() to use lock guards")
On 01/15, Bernd Edlinger wrote:
>
> The problem happens when a tracer tries to ptrace_attach
> to a multi-threaded process, that does an execve in one of
> the threads at the same time, without doing that in a forked
> sub-process. That means: There is a race condition, when one
> or more of the threads are already ptraced, but the thread
> that invoked the execve is not yet traced. Now in this
> case the execve locks the cred_guard_mutex and waits for
> de_thread to complete. But that waits for the traced
> sibling threads to exit, and those have to wait for the
> tracer to receive the exit signal, but the tracer cannot
> call wait right now, because it is waiting for the ptrace
> call to complete, and this never does not happen.
> The traced process and the tracer are now in a deadlock
> situation, and can only be killed by a fatal signal.
This looks very confusing to me. And even misleading.
So IIRC the problem is "simple".
de_thread() sleeps with cred_guard_mutex waiting for other threads to
exit and pass release_task/__exit_signal.
If one of the sub-threads is traced, debugger should do ptrace_detach()
or wait() to release this tracee, the killed tracee won't autoreap.
Now. If debugger tries to take the same cred_guard_mutex before
detach/wait we have a deadlock. This is not specific to ptrace_attach(),
proc_pid_attr_write() takes this lock too.
Right? Or are there other issues?
> -static int de_thread(struct task_struct *tsk)
> +static int de_thread(struct task_struct *tsk, struct linux_binprm *bprm)
> {
> struct signal_struct *sig = tsk->signal;
> struct sighand_struct *oldsighand = tsk->sighand;
> spinlock_t *lock = &oldsighand->siglock;
> + struct task_struct *t = tsk;
> + bool unsafe_execve_in_progress = false;
>
> if (thread_group_empty(tsk))
> goto no_thread_group;
> @@ -1066,6 +1068,19 @@ static int de_thread(struct task_struct *tsk)
> if (!thread_group_leader(tsk))
> sig->notify_count--;
>
> + while_each_thread(tsk, t) {
for_other_threads()
> + if (unlikely(t->ptrace)
> + && (t != tsk->group_leader || !t->exit_state))
> + unsafe_execve_in_progress = true;
The !t->exit_state is not right... This sub-thread can already be a zombie
with ->exit_state != 0 but see above, it won't be reaped until the debugger
does wait().
> + if (unlikely(unsafe_execve_in_progress)) {
> + spin_unlock_irq(lock);
> + sig->exec_bprm = bprm;
> + mutex_unlock(&sig->cred_guard_mutex);
> + spin_lock_irq(lock);
I don't understand why do we need to unlock and lock siglock here...
But my main question is why do we need the unsafe_execve_in_progress boolean.
If this patch is correct and de_thread() can drop and re-acquire cread_guard_mutex
when one of the threads is traced, then why can't we do this unconditionally ?
Oleg.
Powered by blists - more mailing lists