[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251215-irdisch-aufkochen-d97a7a3ed4a3@brauner>
Date: Mon, 15 Dec 2025 12:55:12 +0100
From: Christian Brauner <brauner@...nel.org>
To: Jan Kara <jack@...e.cz>
Cc: me@...ck-desk.cn, Alexander Viro <viro@...iv.linux.org.uk>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] vfs: fix EBUSY on FSCONFIG_CMD_CREATE retry
On Mon, Dec 15, 2025 at 09:46:19AM +0100, Jan Kara wrote:
> On Sat 13-12-25 02:03:56, Chen Linxuan via B4 Relay wrote:
> > From: Chen Linxuan <me@...ck-desk.cn>
> >
> > When using fsconfig(..., FSCONFIG_CMD_CREATE, ...), the filesystem
> > context is retrieved from the file descriptor. Since the file structure
> > persists across syscall restarts, the context state is preserved:
> >
> > // fs/fsopen.c
> > SYSCALL_DEFINE5(fsconfig, ...)
> > {
> > ...
> > fc = fd_file(f)->private_data;
> > ...
> > ret = vfs_fsconfig_locked(fc, cmd, ¶m);
> > ...
> > }
> >
> > In vfs_cmd_create(), the context phase is transitioned to
> > FS_CONTEXT_CREATING before calling vfs_get_tree():
> >
> > // fs/fsopen.c
> > static int vfs_cmd_create(struct fs_context *fc, bool exclusive)
> > {
> > ...
> > fc->phase = FS_CONTEXT_CREATING;
> > ...
> > ret = vfs_get_tree(fc);
> > ...
> > }
> >
> > However, vfs_get_tree() may return -ERESTARTNOINTR if the filesystem
> > implementation needs to restart the syscall. For example, cgroup v1 does
> > this when it encounters a race condition where the root is dying:
> >
> > // kernel/cgroup/cgroup-v1.c
> > int cgroup1_get_tree(struct fs_context *fc)
> > {
> > ...
> > if (unlikely(ret > 0)) {
> > msleep(10);
> > return restart_syscall();
> > }
> > return ret;
> > }
> >
> > If the syscall is restarted, fsconfig() is called again and retrieves
> > the *same* fs_context. However, vfs_cmd_create() rejects the call
> > because the phase was left as FS_CONTEXT_CREATING during the first
> > attempt:
>
> Well, not quite. The phase is actually set to FS_CONTEXT_FAILED if
> vfs_get_tree() returns any error. Still the effect is the same.
Uh, I'm not sure we should do this. If this only affects cgroup v1 then
I say we should simply not care at all. It's a deprecated api and anyone
using it uses something that is inherently broken and a big portion of
userspace has already migrated. The current or upcoming systemd release
has dropped all cgroup v1 support.
Generally, making fsconfig() restartable is not as trivial as it looks
because once you called into the filesystem the config that was setup
might have already been consumed. That's definitely the case for stuff
in overlayfs and others. So no, that patch won't work and btw, I
remembered that we already had that discussion a few years ago and I was
right:
https://lore.kernel.org/20200923201958.b27ecda5a1e788fb5f472bcd@virtuozzo.com
Powered by blists - more mailing lists