linux-kernel - Re: [PATCH v17] exec: Fix dead-lock in de_thread with ptrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRs4zYDhddBQFiXZ@redhat.com>
Date: Mon, 17 Nov 2025 16:01:33 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Bernd Edlinger <bernd.edlinger@...mail.de>
Cc: Christian Brauner <brauner@...nel.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Alexey Dobriyan <adobriyan@...il.com>, Kees Cook <kees@...nel.org>,
	Andy Lutomirski <luto@...capital.net>,
	Will Drewry <wad@...omium.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...e.com>, Serge Hallyn <serge@...lyn.com>,
	James Morris <jamorris@...ux.microsoft.com>,
	Randy Dunlap <rdunlap@...radead.org>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Yafang Shao <laoar.shao@...il.com>, Helge Deller <deller@....de>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Adrian Reber <areber@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>, Jens Axboe <axboe@...nel.dk>,
	Alexei Starovoitov <ast@...nel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	linux-kselftest@...r.kernel.org, linux-mm@...ck.org,
	linux-security-module@...r.kernel.org,
	tiozhang <tiozhang@...iglobal.com>,
	Luis Chamberlain <mcgrof@...nel.org>,
	"Paulo Alcantara (SUSE)" <pc@...guebit.com>,
	Sergey Senozhatsky <senozhatsky@...omium.org>,
	Frederic Weisbecker <frederic@...nel.org>,
	YueHaibing <yuehaibing@...wei.com>,
	Paul Moore <paul@...l-moore.com>, Aleksa Sarai <cyphar@...har.com>,
	Stefan Roesch <shr@...kernel.io>, Chao Yu <chao@...nel.org>,
	xu xin <xu.xin16@....com.cn>, Jeff Layton <jlayton@...nel.org>,
	Jan Kara <jack@...e.cz>, David Hildenbrand <david@...hat.com>,
	Dave Chinner <dchinner@...hat.com>, Shuah Khan <shuah@...nel.org>,
	Elena Reshetova <elena.reshetova@...el.com>,
	David Windsor <dwindsor@...il.com>,
	Mateusz Guzik <mjguzik@...il.com>, Ard Biesheuvel <ardb@...nel.org>,
	"Joel Fernandes (Google)" <joel@...lfernandes.org>,
	"Matthew Wilcox (Oracle)" <willy@...radead.org>,
	Hans Liljestrand <ishkamiel@...il.com>,
	Penglei Jiang <superman.xpt@...il.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Adrian Ratiu <adrian.ratiu@...labora.com>,
	Ingo Molnar <mingo@...nel.org>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	Cyrill Gorcunov <gorcunov@...il.com>,
	Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH v17] exec: Fix dead-lock in de_thread with ptrace_attach

On 11/17, Bernd Edlinger wrote:
>
> On 11/11/25 10:21, Christian Brauner wrote:
> > On Wed, Nov 05, 2025 at 03:32:10PM +0100, Oleg Nesterov wrote:
>
> >> But this is minor. Why do we need "bool unsafe_execve_in_progress" ?
> >> If this patch is correct, de_thread() can drop/reacquire cred_guard_mutex
> >> unconditionally.
> >>
>
> I would not like to drop the mutex when no absolutely necessary for performance reasons.

OK, I won't insist... But I don't really understand how this can help to
improve the performance. If nothing else, this adds another for_other_threads()
loop.

And again, the unsafe_execve_in_progress == T case is unlikely. I'm afraid this
case (de_thread() without cred_guard_mutex) won't have enough testing.

In any case, why you dislike the suggestion to add this unsafe_execve_in_progress
logic in a separate patch?

> >>> +	if (unlikely(unsafe_execve_in_progress)) {
> >>> +		spin_unlock_irq(lock);
> >>> +		sig->exec_bprm = bprm;
> >>> +		mutex_unlock(&sig->cred_guard_mutex);
> >>> +		spin_lock_irq(lock);
> >>
> >> I don't think spin_unlock_irq() + spin_lock_irq() makes any sense...
> >>
>
> Since the spin lock was acquired while holding the mutex, both should be
> unlocked in reverse sequence and the spin lock re-acquired after releasing
> the mutex.

Why?

> I'd expect the scheduler to do a task switch after the cred_guard_mutex is
> unlocked, at least in the RT-linux variant, while the spin lock is not yet
> unlocked.

I must have missed something, but I still don't understand why this would
be wrong...

> >>> @@ -1114,13 +1139,31 @@ int begin_new_exec(struct linux_binprm * bprm)
> >>>  	 */
> >>>  	trace_sched_prepare_exec(current, bprm);
> >>>
> >>> +	/* If the binary is not readable then enforce mm->dumpable=0 */
> >>> +	would_dump(bprm, bprm->file);
> >>> +	if (bprm->have_execfd)
> >>> +		would_dump(bprm, bprm->executable);
> >>> +
> >>> +	/*
> >>> +	 * Figure out dumpability. Note that this checking only of current
> >>> +	 * is wrong, but userspace depends on it. This should be testing
> >>> +	 * bprm->secureexec instead.
> >>> +	 */
> >>> +	if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
> >>> +	    is_dumpability_changed(current_cred(), bprm->cred) ||
> >>> +	    !(uid_eq(current_euid(), current_uid()) &&
> >>> +	      gid_eq(current_egid(), current_gid())))
> >>> +		set_dumpable(bprm->mm, suid_dumpable);
> >>> +	else
> >>> +		set_dumpable(bprm->mm, SUID_DUMP_USER);
> >>> +
> >>
> >> OK, we need to do this before de_thread() drops cred_guard_mutex.
> >> But imo this too should be done in a separate patch, the changelog should
> >> explain this change.
> >>
>
> The dumpability need to be determined before de_thread, because ptrace_may_access
> needs this information to determine if the tracer is allowed to ptrace. That is
> part of the core of the patch, it would not work without that.

Yes,

> I will add more comments to make that more easy to understand.

But again, why this change can't come in a separate patch? Before the patch which
drops cred_guard_mutex in de_thread().

> >> 	int lock_current_cgm(void)
> >> 	{
> >> 		if (mutex_lock_interruptible(&current->signal->cred_guard_mutex))
> >> 			return -ERESTARTNOINTR;
> >>
> >> 		if (!current->signal->group_exec_task)
> >> 			return 0;
> >>
> >> 		WARN_ON(!fatal_signal_pending(current));
> >> 		mutex_unlock(&current->signal->cred_guard_mutex);
> >> 		return -ERESTARTNOINTR;
> >> 	}
> >>
> >> ?
> >>
>
> Some use mutex_lock_interruptible and some use mutex_lock_killable here,
> so it wont work for all of them.  I would not consider this a new kind
> of dead-lock free mutex, but just an open-coded state machine, handling
> the state that the tasks have whild de_thread is running.

OK. and we don't have mutex_lock_state(). I think that all users could
use mutex_lock_killable(), but you are right anyway, and this is minor.

> >> Note that it checks ->group_exec_task, not ->exec_bprm. So this change can
> >> come in a separate patch too, but I won't insist.

Yes. Although this is minor too ;)

> >> This is the most problematic change which I can't review...
> >>
> >> Firstly, it changes task->mm/real_cred for __ptrace_may_access() and this
> >> looks dangerous to me.
> >
> > Yeah, that is not ok. This is effectively override_creds for real_cred
> > and that is not a pattern I want to see us establish at all! Temporary
> > credential overrides for the subjective credentials is already terrible
> > but at least we have the explicit split between real_cred and cred
> > expressely for that. So no, that's not an acceptable solution.
> >
>
> Okay I understand your point.
> I did this originally just to avoid to have to change the interface to all
> the security engines, but instead I could add a flag PTRACE_MODE_BPRMCREDS to
> the ptrace_may_access which must be handled in all security engines, to use
> child->signal->exec_bprm->creds instead of __task_cred(child).

Can't comment... I don't understand your idea, but this is my fault. I guess
this needs more changes, in particular __ptrace_may_access_mm_cred(), but
most probably I misunderstood your idea.

>
> >> Or. check_unsafe_exec() sets LSM_UNSAFE_PTRACE if ptrace. Is it safe to
> >> ptrace the execing task after that? I have no idea what the security hooks
> >> can do...
>
> That means the tracee is already ptraced before the execve, and SUID-bits
> do not work as usual, and are more or less ignored.  But in this patch
> the tracee is not yet ptraced.

Well. I meant that if LSM_UNSAFE_PTRACE is not set, then currently (say)
security_bprm_committing_creds() has all rights to assume that the execing
task is not ptraced. Yes, I don't see any potential problem right now, but
still.

And just in case... Lets look at this code

	+                               rcu_assign_pointer(task->real_cred, bprm->cred);
	+                               task->mm = bprm->mm;
	+                               retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
	+                               rcu_assign_pointer(task->real_cred, old_cred);
	+                               task->mm = old_mm;

again.

This is mostly theoretical, but what if begin_new_exec() fails after de_thread()
and before exec_mmap() and/or commit_creds(bprm->cred) ? In this case the execing
thread will report SIGSEGV to debugger which can (say) read old_mm.

No?

I am starting to think that ptrace_attach() should simply fail with -EWOULDBLOCK
if it detects "unsafe_execve_in_progress" ... And perhaps this is what you already
tried to do in the past, I can't recall :/

Oleg.