linux-kernel - Re: [PATCH v17] exec: Fix dead-lock in de_thread with ptrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <GV2PPF74270EBEEDD43083BE45C6E26F674E4DDA@GV2PPF74270EBEE.EURP195.PROD.OUTLOOK.COM>
Date: Sat, 29 Nov 2025 16:06:57 +0100
From: Bernd Edlinger <bernd.edlinger@...mail.de>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Christian Brauner <brauner@...nel.org>,
 Alexander Viro <viro@...iv.linux.org.uk>,
 Alexey Dobriyan <adobriyan@...il.com>, Kees Cook <kees@...nel.org>,
 Andy Lutomirski <luto@...capital.net>, Will Drewry <wad@...omium.org>,
 Andrew Morton <akpm@...ux-foundation.org>, Michal Hocko <mhocko@...e.com>,
 Serge Hallyn <serge@...lyn.com>, James Morris
 <jamorris@...ux.microsoft.com>, Randy Dunlap <rdunlap@...radead.org>,
 Suren Baghdasaryan <surenb@...gle.com>, Yafang Shao <laoar.shao@...il.com>,
 Helge Deller <deller@....de>, "Eric W. Biederman" <ebiederm@...ssion.com>,
 Adrian Reber <areber@...hat.com>, Thomas Gleixner <tglx@...utronix.de>,
 Jens Axboe <axboe@...nel.dk>, Alexei Starovoitov <ast@...nel.org>,
 "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 linux-kselftest@...r.kernel.org, linux-mm@...ck.org,
 linux-security-module@...r.kernel.org, tiozhang <tiozhang@...iglobal.com>,
 Luis Chamberlain <mcgrof@...nel.org>,
 "Paulo Alcantara (SUSE)" <pc@...guebit.com>,
 Sergey Senozhatsky <senozhatsky@...omium.org>,
 Frederic Weisbecker <frederic@...nel.org>, YueHaibing
 <yuehaibing@...wei.com>, Paul Moore <paul@...l-moore.com>,
 Aleksa Sarai <cyphar@...har.com>, Stefan Roesch <shr@...kernel.io>,
 Chao Yu <chao@...nel.org>, xu xin <xu.xin16@....com.cn>,
 Jeff Layton <jlayton@...nel.org>, Jan Kara <jack@...e.cz>,
 David Hildenbrand <david@...hat.com>, Dave Chinner <dchinner@...hat.com>,
 Shuah Khan <shuah@...nel.org>, Elena Reshetova <elena.reshetova@...el.com>,
 David Windsor <dwindsor@...il.com>, Mateusz Guzik <mjguzik@...il.com>,
 Ard Biesheuvel <ardb@...nel.org>,
 "Joel Fernandes (Google)" <joel@...lfernandes.org>,
 "Matthew Wilcox (Oracle)" <willy@...radead.org>,
 Hans Liljestrand <ishkamiel@...il.com>,
 Penglei Jiang <superman.xpt@...il.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Adrian Ratiu <adrian.ratiu@...labora.com>, Ingo Molnar <mingo@...nel.org>,
 "Peter Zijlstra (Intel)" <peterz@...radead.org>,
 Cyrill Gorcunov <gorcunov@...il.com>, Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH v17] exec: Fix dead-lock in de_thread with ptrace_attach

On 11/23/25 19:32, Oleg Nesterov wrote:
> Hi Bernd,
> 
> sorry for delay, I am on PTO, didn't read emails this week...
> 
> On 11/17, Bernd Edlinger wrote:
>>
>> On 11/17/25 16:01, Oleg Nesterov wrote:
>>> On 11/17, Bernd Edlinger wrote:
>>>>
>>>> On 11/11/25 10:21, Christian Brauner wrote:
>>>>> On Wed, Nov 05, 2025 at 03:32:10PM +0100, Oleg Nesterov wrote:
>>>>
>>>>>> But this is minor. Why do we need "bool unsafe_execve_in_progress" ?
>>>>>> If this patch is correct, de_thread() can drop/reacquire cred_guard_mutex
>>>>>> unconditionally.
>>>>>>
>>>>
>>>> I would not like to drop the mutex when no absolutely necessary for performance reasons.
>>>
>>> OK, I won't insist... But I don't really understand how this can help to
>>> improve the performance. If nothing else, this adds another for_other_threads()
>>> loop.
>>>
>>
>> If no dead-lock is possible it is better to complete the de_thread without
>> releasing the mutex.  For the debugger it is also the better experience,
>> no matter when the ptrace_attack happens it will succeed rather quickly either
>> before the execve or after the execve.
> 
> I still disagree, I still don't understand the "performance reasons", but since I can't
> convince you I won't really argue.
> 
>>>>>>> +	if (unlikely(unsafe_execve_in_progress)) {
>>>>>>> +		spin_unlock_irq(lock);
>>>>>>> +		sig->exec_bprm = bprm;
>>>>>>> +		mutex_unlock(&sig->cred_guard_mutex);
>>>>>>> +		spin_lock_irq(lock);
>>>>>>
>>>>>> I don't think spin_unlock_irq() + spin_lock_irq() makes any sense...
>>>>>>
>>>>
>>>> Since the spin lock was acquired while holding the mutex, both should be
>>>> unlocked in reverse sequence and the spin lock re-acquired after releasing
>>>> the mutex.
>>>
>>> Why?
>>>
>>
>> It is generally more safe when each thread acquires its mutexes in order and
>> releases them in reverse order.
>> Consider this:
>> Thread A:
>> holds spin_lock_irq(siglock);
>> does mutes_unlock(cred_guard_mutex); with irq disabled.
>> task switch happens to Thread B which has irq enabled.
>> and is waiting for cred_guard_mutex.
>> Thrad B:
>> does mutex_lock(cred_guard_mutex);
>> but is interrupted this point and the interrupt handler I executes
>> now iterrupt handler I wants to take siglock and is blocked,
>> because the system one single CPU core.
> 
> I don't follow. Do you mean PREEMPT_RT ?
> 
> If yes. In this case spin_lock_irq() is rt_spin_lock() which doesn't disable irqs,
> it does rt_lock_lock() (takes rt_mutex) + migrate_disable().
> 
> I do think that spin/mutex/whatever_unlock() is always safe. In any order, and
> regardless of RT.
> 

Well, based on my experience with other embedded real-time O/S-es, I would
expect that something named spin_lock_irq locks the task-specific IRQ, and
prevents task switches due to time-slicing, while something called
mutes_unlock may cause an explicit task switch, when another task is waiting
for the mutex.

It is hard to follow how linux implements that spin_lock_irq exactly, but
to me it looks like it is done this way:

include/linux/spinlock_api_smp.h:static inline void __raw_spin_lock_irq(raw_spinlock_t *lock)
include/linux/spinlock_api_smp.h-{
include/linux/spinlock_api_smp.h-       local_irq_disable();
include/linux/spinlock_api_smp.h-       preempt_disable();
include/linux/spinlock_api_smp.h-       spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
include/linux/spinlock_api_smp.h-       LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
include/linux/spinlock_api_smp.h-}

so an explicit task switch while locka_irq_disable looks
very dangerous to me.  Do you know other places where such
a code pattern is used?

I do just ask, because a close look at those might reveal
some serious bugs, WDYT?


Thanks
Bernd.