linux-kernel - Re: [syzbot] [fs?] [mm?] KCSAN: data-race in bprm_execve / copy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250323-haftverschonung-rochen-22c230317a23@brauner>
Date: Sun, 23 Mar 2025 21:57:23 +0100
From: Christian Brauner <brauner@...nel.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Al Viro <viro@...iv.linux.org.uk>, Kees Cook <kees@...nel.org>, 
	jack@...e.cz, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, syzkaller-bugs@...glegroups.com, 
	syzbot <syzbot+1c486d0b62032c82a968@...kaller.appspotmail.com>
Subject: Re: [syzbot] [fs?] [mm?] KCSAN: data-race in bprm_execve / copy_fs
 (4)

On Sun, Mar 23, 2025 at 07:14:21PM +0100, Oleg Nesterov wrote:
> On 03/22, Al Viro wrote:
> >
> > On Sat, Mar 22, 2025 at 04:55:39PM +0100, Oleg Nesterov wrote:
> >
> > > And this means that we just need to ensure that ->in_exec is cleared
> > > before this mutex is dropped, no? Something like below?
> >
> > Probably should work, but I wonder if it would be cleaner to have
> > ->in_exec replaced with pointer to task_struct responsible.  Not
> > "somebody with that fs_struct for ->fs is trying to do execve(),
> > has verified that nothing outside of their threads is using this
> > and had been holding ->signal->cred_guard_mutex ever since then",
> > but "this is the thread that..."
> 
> perhaps... or something else to make this "not immediately obvious"
> fs->in_exec more clear.

Well, it would certainly help to document that cred_guard_mutex
serializes concurrent exec.

This is kind of important information given that begin_new_exec() and
finalize_exec() are only called from ->load_binary() and are thus always
located in the individual binfmt_*.c files. That makes this pretty
implicit information.

Let alone that the unlocking is all based on bprm->cred being set or
unset.

Otherwise the patch looks good to me.

> 
> But I guess we need something simple for -stable, so will you agree
> with this fix for now? Apart from changelog/comments.
> 
> 	retval = de_thread(me);
> +	current->fs->in_exec = 0;
> 	if (retval)
> 		current->fs->in_exec = 0;
> 
> is correct but looks confusing. See "V2" below, it clears fs->in_exec
> after the "if (retval)" check.
> 
> syzbot says:
> 
> 	Unfortunately, I don't have any reproducer for this issue yet.
> 
> so I guess "#syz test: " is pointless right now...
> 
> Oleg.
> ---
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 506cd411f4ac..02e8824fc9cd 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1236,6 +1236,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>  	if (retval)
>  		goto out;
>  
> +	current->fs->in_exec = 0;
>  	/*
>  	 * Cancel any io_uring activity across execve
>  	 */
> @@ -1497,6 +1498,8 @@ static void free_bprm(struct linux_binprm *bprm)
>  	}
>  	free_arg_pages(bprm);
>  	if (bprm->cred) {
> +		// for the case exec fails before de_thread()
> +		current->fs->in_exec = 0;
>  		mutex_unlock(&current->signal->cred_guard_mutex);
>  		abort_creds(bprm->cred);
>  	}
> @@ -1862,7 +1865,6 @@ static int bprm_execve(struct linux_binprm *bprm)
>  
>  	sched_mm_cid_after_execve(current);
>  	/* execve succeeded */
> -	current->fs->in_exec = 0;
>  	current->in_execve = 0;
>  	rseq_execve(current);
>  	user_events_execve(current);
> @@ -1881,7 +1883,6 @@ static int bprm_execve(struct linux_binprm *bprm)
>  		force_fatal_sig(SIGSEGV);
>  
>  	sched_mm_cid_after_execve(current);
> -	current->fs->in_exec = 0;
>  	current->in_execve = 0;
>  
>  	return retval;
>