linux-kernel - Re: [PATCH v2] fs: always return zero on success from replace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20250806015449.GA1638962@ZenIV>
Date: Wed, 6 Aug 2025 02:54:49 +0100
From: Al Viro <viro@...iv.linux.org.uk>
To: Christian Brauner <brauner@...nel.org>
Cc: Thomas Weißschuh <thomas.weissschuh@...utronix.de>,
	Jan Kara <jack@...e.cz>, Sargun Dhillon <sargun@...gun.me>,
	Kees Cook <kees@...nel.org>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2] fs: always return zero on success from replace_fd()

On Tue, Aug 05, 2025 at 08:50:03PM +0100, Al Viro wrote:
> On Tue, Aug 05, 2025 at 04:34:57PM +0100, Al Viro wrote:
> 
> > They do no allow to express things like "foo() consumes lock X".
> > >From time to time, we *do* need that, and when that happens guards
> > become a menace.
> > 
> > Another case is
> > 	lock
> > 	if (lock-dependent condition)
> > 		some work
> > 		unlock
> > 		work that can't be under that lock
> > 	else
> > 		some other work
> > 		unlock
> > 		more work that can't be under that lock
> > 
> > Fairly common, especially when that's a spinlock and "can't be under that
> > lock" includes blocking operations.  Can't be expressed with guards, not
> > without a massage that often ends up with bloody awful results.
> 
> FWIW, I'm looking through the raw data I've got during ->d_lock audit.
> Except for a few functions (all static in fs/dcache.c), all scopes
> terminate in the same function where they begin.

... and for ->file_lock we have the following:
	expand_fdtable(): drops and regains
	expand_files(): either nothing or drops and regains
	do_dup2(): drops
everything else is neutral.

20 functions touching that stuff total.  Convertible to guard() or
scoped_guard(): put_unused_fd(), fd_install(), close_fd() (scoped_guard
only), __range_cloexec(), file_close_fd(), set_close_on_exec(),
iterate_fd(), fcntl_setlk() and fcntl_setlk64() (scoped_guard only), 
seq_show() - 10 out of 20.

Anything that uses expand_fdtable() is in the best case an abuse of
guard/scoped_guard; in the worst, it's actively confusing, since
there's *not* one continuous scope there.  expand_files() is in
the same boat.  That covers alloc_fd(), replace_fd() and ksys_dup3().
That's 5 out of remaining 10.  BTW, trying to eliminate gotos from
alloc_fd() is not fun either.

dup_fd():
	spin_lock(&oldf->file_lock);
	...
	while (unlikely(open_files > new_fdt->max_fds)) {
		spin_unlock(&oldf->file_lock);
		... (blocking, possibly return on failure here)
		spin_lock(&oldf->file_lock);
		...
	}
	...
	spin_unlock(&oldf->file_lock);
	...
No way to do that with guard() - not unless you mix it with explicit
unlock/lock in the middle of scope, and even that will be bitch to
deal with due to failure exit after allocation failure.  We'd need
to do this:
	spin_unlock(&oldf->file_lock);
	if (new_fdt != &newf->fdtab)
		__free_fdtable(new_fdt);
	new_fdt = alloc_fdtable(open_files);
	spin_lock(&oldf->file_lock);
	if (IS_ERR(new_fdt)) {
		kmem_cache_free(files_cachep, newf);
		return ERR_CAST(new_fdt);
	}
all of that under guard(spinlock)(&oldf->file_lock).  IMO that would
be too confusing and brittle.

__range_close():
	spin_lock(&files->file_lock);
	...
	for (; fd <= max_fd; fd++) { 
		file = file_close_fd_locked(files, fd);
		if (file) {
			spin_unlock(&files->file_lock);
			filp_close(file, files);
			cond_resched();
			spin_lock(&files->file_lock);
		} else if (need_resched()) {
			spin_unlock(&files->file_lock);
			cond_resched();
			spin_lock(&files->file_lock);
		}
	}
	spin_unlock(&files->file_lock);
Not a good fit for guard(), for the same reasons.

do_close_on_exec():
	...
	spin_lock(&files->file_lock);
	for (i = 0; ; i++) {
		....
		for ( ; set ; fd++, set >>= 1) {
			....
			spin_unlock(&files->file_lock);
			filp_close(file, files);
			cond_resched();
			spin_lock(&files->file_lock);
		}
	}
	spin_unlock(&files->file_lock);
Same story.

io_close():
	might be convertible to scoped_guard; won't be pretty,
AFAICS - that -EAGAIN case in the middle makes it very clumsy.

do_dup2(): well... we could lift filp_close() into the callers,
but that ends up with fairly unpleasant boilerplate in the
callers, and one of those callers is a fairly hot syscall.

And that's the remaining 5.  For some locks scoped_guard() is
a decent fit; for some it really isn't ;-/