linux-kernel - Re: [RFC][PATCH] exec: Move cred computation under exec_update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <GV2PPF74270EBEE63737C12FBE0B07D6DDAE4D1A@GV2PPF74270EBEE.EURP195.PROD.OUTLOOK.COM>
Date: Tue, 25 Nov 2025 17:19:09 +0100
From: Bernd Edlinger <bernd.edlinger@...mail.de>
To: "Eric W. Biederman" <ebiederm@...ssion.com>,
 Oleg Nesterov <oleg@...hat.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
 Alexey Dobriyan <adobriyan@...il.com>, Kees Cook <kees@...nel.org>,
 Andy Lutomirski <luto@...capital.net>, Will Drewry <wad@...omium.org>,
 Christian Brauner <brauner@...nel.org>,
 Andrew Morton <akpm@...ux-foundation.org>, Michal Hocko <mhocko@...e.com>,
 Serge Hallyn <serge@...lyn.com>, James Morris
 <jamorris@...ux.microsoft.com>, Randy Dunlap <rdunlap@...radead.org>,
 Suren Baghdasaryan <surenb@...gle.com>, Yafang Shao <laoar.shao@...il.com>,
 Helge Deller <deller@....de>, Adrian Reber <areber@...hat.com>,
 Thomas Gleixner <tglx@...utronix.de>, Jens Axboe <axboe@...nel.dk>,
 Alexei Starovoitov <ast@...nel.org>,
 "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 linux-kselftest@...r.kernel.org, linux-mm@...ck.org,
 linux-security-module@...r.kernel.org, tiozhang <tiozhang@...iglobal.com>,
 Luis Chamberlain <mcgrof@...nel.org>,
 "Paulo Alcantara (SUSE)" <pc@...guebit.com>,
 Sergey Senozhatsky <senozhatsky@...omium.org>,
 Frederic Weisbecker <frederic@...nel.org>, YueHaibing
 <yuehaibing@...wei.com>, Paul Moore <paul@...l-moore.com>,
 Aleksa Sarai <cyphar@...har.com>, Stefan Roesch <shr@...kernel.io>,
 Chao Yu <chao@...nel.org>, xu xin <xu.xin16@....com.cn>,
 Jeff Layton <jlayton@...nel.org>, Jan Kara <jack@...e.cz>,
 David Hildenbrand <david@...hat.com>, Dave Chinner <dchinner@...hat.com>,
 Shuah Khan <shuah@...nel.org>, Elena Reshetova <elena.reshetova@...el.com>,
 David Windsor <dwindsor@...il.com>, Mateusz Guzik <mjguzik@...il.com>,
 Ard Biesheuvel <ardb@...nel.org>,
 "Joel Fernandes (Google)" <joel@...lfernandes.org>,
 "Matthew Wilcox (Oracle)" <willy@...radead.org>,
 Hans Liljestrand <ishkamiel@...il.com>,
 Penglei Jiang <superman.xpt@...il.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Adrian Ratiu <adrian.ratiu@...labora.com>, Ingo Molnar <mingo@...nel.org>,
 "Peter Zijlstra (Intel)" <peterz@...radead.org>,
 Cyrill Gorcunov <gorcunov@...il.com>, Eric Dumazet <edumazet@...gle.com>
Subject: Re: [RFC][PATCH] exec: Move cred computation under exec_update_lock

On 11/24/25 00:22, Eric W. Biederman wrote:
> Oleg Nesterov <oleg@...hat.com> writes:
> 
>> Eric,
>>
>> sorry for delay, I am on PTO, didn't read emails this week...
>>
>> On 11/20, Eric W. Biederman wrote:
>>>
>>> Instead of computing the new cred before we pass the point of no
>>> return compute the new cred just before we use it.
>>>
>>> This allows the removal of fs_struct->in_exec and cred_guard_mutex.
>>>
>>> I am not certain why we wanted to compute the cred for the new
>>> executable so early.  Perhaps I missed something but I did not see any
>>> common errors being signaled.   So I don't think we loose anything by
>>> computing the new cred later.
>>>
>>> We gain a lot.
>>
>> Yes. I LIKE your approach after a quick glance. And I swear, I thought about
>> it too ;)
>>
>> But is it correct? I don't know. I'll try to actually read your patch next
>> week (I am on PTO untill the end of November), but I am not sure I can
>> provide a valuable feedback.
>>
>> One "obvious" problem is that, after this patch, the execing process can crash
>> in a case when currently exec() returns an error...
> 
> Yes.
> 
> I have been testing and looking at it, and I have found a few issues,
> and I am trying to see if I can resolve them.
> 
> The good news is that with the advent of AT_EXECVE_CHECK we have a
> really clear API boundary between errors that must be diagnosed
> and errors of happenstance like running out of memory.
> 
> The bad news is that the implementation of AT_EXECVE_CHECK seems to been
> rather hackish especially with respect to security_bprm_creds_for_exec.
> 
> What I am hoping for is to get the 3 causes of errors of brpm->unsafe
> ( LSM_UNSAFE_SHARE, LSM_UNSAFE_PTRACE, and LSM_UNSAFE_NO_NEW_PRIVS )
> handled cleanly outside of the cred_guard_mutex, and simply
> retested when it is time to build the credentials of the new process.
> 
> In practice that should get the same failures modes as we have now
> but it would get SIGSEGV in rare instances where things changed
> during exec.  That feels acceptable.
> 
> 
> 
> I thought of one other approach that might be enough to put the issue to
> bed if cleaning up exec is too much work.  We could have ptrace_attach
> use a trylock and fail when it doesn't succeed.  That would solve the
> worst of the symptoms.
> 
> I think this would be a complete patch:
> 
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 75a84efad40f..5dd2144e5789 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -444,7 +444,7 @@ static int ptrace_attach(struct task_struct *task, long request,
>  	 * SUID, SGID and LSM creds get determined differently
>  	 * under ptrace.
>  	 */
> -	scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
> +	scoped_cond_guard (mutex_try, return -EAGAIN,
>  			   &task->signal->cred_guard_mutex) {
>  
>  		scoped_guard (task_lock, task) {

This is very similar to my initial attempt of fixing the problem, as you
can see the test expectaion of the currently failing test in vmattach.c
is that ptrace(PTRACE_ATTACH, pid, 0L, 0L) returns -1 with errno = EAGAIN.

The disadvantage of that approach was, that it is a user-visible API-change,
but also that the debugger does not know when to retry the PTRACE_ATTACH,
in worst case it will go into an endless loop not knowing that a waitpid
and/or PTRACE_CONT is necessary to unblock the traced process.

But The main reason why I preferred the overlapping lifetime of the current
and the new credentials, is that the tracee can escape the PTRACE_ATTACH
if it is very short-lived, and indeed I had to cheat a little to make the
test case function TEST(attach) pass reliably:

The traced process does execlp("sleep", "sleep", "2", NULL);

If it did execlp("true", "true", NULL); like the first test case, it would
have failed randomly, because the debugger could not attach quickly enoguh,
and IMHO the expectaion of the debugger is probably to be able to stop the
new process at the first instruction after the execve.


Bernd.