[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250806015449.GA1638962@ZenIV>
Date: Wed, 6 Aug 2025 02:54:49 +0100
From: Al Viro <viro@...iv.linux.org.uk>
To: Christian Brauner <brauner@...nel.org>
Cc: Thomas Weißschuh <thomas.weissschuh@...utronix.de>,
Jan Kara <jack@...e.cz>, Sargun Dhillon <sargun@...gun.me>,
Kees Cook <kees@...nel.org>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2] fs: always return zero on success from replace_fd()
On Tue, Aug 05, 2025 at 08:50:03PM +0100, Al Viro wrote:
> On Tue, Aug 05, 2025 at 04:34:57PM +0100, Al Viro wrote:
>
> > They do no allow to express things like "foo() consumes lock X".
> > >From time to time, we *do* need that, and when that happens guards
> > become a menace.
> >
> > Another case is
> > lock
> > if (lock-dependent condition)
> > some work
> > unlock
> > work that can't be under that lock
> > else
> > some other work
> > unlock
> > more work that can't be under that lock
> >
> > Fairly common, especially when that's a spinlock and "can't be under that
> > lock" includes blocking operations. Can't be expressed with guards, not
> > without a massage that often ends up with bloody awful results.
>
> FWIW, I'm looking through the raw data I've got during ->d_lock audit.
> Except for a few functions (all static in fs/dcache.c), all scopes
> terminate in the same function where they begin.
... and for ->file_lock we have the following:
expand_fdtable(): drops and regains
expand_files(): either nothing or drops and regains
do_dup2(): drops
everything else is neutral.
20 functions touching that stuff total. Convertible to guard() or
scoped_guard(): put_unused_fd(), fd_install(), close_fd() (scoped_guard
only), __range_cloexec(), file_close_fd(), set_close_on_exec(),
iterate_fd(), fcntl_setlk() and fcntl_setlk64() (scoped_guard only),
seq_show() - 10 out of 20.
Anything that uses expand_fdtable() is in the best case an abuse of
guard/scoped_guard; in the worst, it's actively confusing, since
there's *not* one continuous scope there. expand_files() is in
the same boat. That covers alloc_fd(), replace_fd() and ksys_dup3().
That's 5 out of remaining 10. BTW, trying to eliminate gotos from
alloc_fd() is not fun either.
dup_fd():
spin_lock(&oldf->file_lock);
...
while (unlikely(open_files > new_fdt->max_fds)) {
spin_unlock(&oldf->file_lock);
... (blocking, possibly return on failure here)
spin_lock(&oldf->file_lock);
...
}
...
spin_unlock(&oldf->file_lock);
...
No way to do that with guard() - not unless you mix it with explicit
unlock/lock in the middle of scope, and even that will be bitch to
deal with due to failure exit after allocation failure. We'd need
to do this:
spin_unlock(&oldf->file_lock);
if (new_fdt != &newf->fdtab)
__free_fdtable(new_fdt);
new_fdt = alloc_fdtable(open_files);
spin_lock(&oldf->file_lock);
if (IS_ERR(new_fdt)) {
kmem_cache_free(files_cachep, newf);
return ERR_CAST(new_fdt);
}
all of that under guard(spinlock)(&oldf->file_lock). IMO that would
be too confusing and brittle.
__range_close():
spin_lock(&files->file_lock);
...
for (; fd <= max_fd; fd++) {
file = file_close_fd_locked(files, fd);
if (file) {
spin_unlock(&files->file_lock);
filp_close(file, files);
cond_resched();
spin_lock(&files->file_lock);
} else if (need_resched()) {
spin_unlock(&files->file_lock);
cond_resched();
spin_lock(&files->file_lock);
}
}
spin_unlock(&files->file_lock);
Not a good fit for guard(), for the same reasons.
do_close_on_exec():
...
spin_lock(&files->file_lock);
for (i = 0; ; i++) {
....
for ( ; set ; fd++, set >>= 1) {
....
spin_unlock(&files->file_lock);
filp_close(file, files);
cond_resched();
spin_lock(&files->file_lock);
}
}
spin_unlock(&files->file_lock);
Same story.
io_close():
might be convertible to scoped_guard; won't be pretty,
AFAICS - that -EAGAIN case in the middle makes it very clumsy.
do_dup2(): well... we could lift filp_close() into the callers,
but that ends up with fairly unpleasant boilerplate in the
callers, and one of those callers is a fairly hot syscall.
And that's the remaining 5. For some locks scoped_guard() is
a decent fit; for some it really isn't ;-/
Powered by blists - more mailing lists