lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251215-irdisch-aufkochen-d97a7a3ed4a3@brauner>
Date: Mon, 15 Dec 2025 12:55:12 +0100
From: Christian Brauner <brauner@...nel.org>
To: Jan Kara <jack@...e.cz>
Cc: me@...ck-desk.cn, Alexander Viro <viro@...iv.linux.org.uk>, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] vfs: fix EBUSY on FSCONFIG_CMD_CREATE retry

On Mon, Dec 15, 2025 at 09:46:19AM +0100, Jan Kara wrote:
> On Sat 13-12-25 02:03:56, Chen Linxuan via B4 Relay wrote:
> > From: Chen Linxuan <me@...ck-desk.cn>
> > 
> > When using fsconfig(..., FSCONFIG_CMD_CREATE, ...), the filesystem
> > context is retrieved from the file descriptor. Since the file structure
> > persists across syscall restarts, the context state is preserved:
> > 
> > 	// fs/fsopen.c
> > 	SYSCALL_DEFINE5(fsconfig, ...)
> > 	{
> > 		...
> > 		fc = fd_file(f)->private_data;
> > 		...
> > 		ret = vfs_fsconfig_locked(fc, cmd, &param);
> > 		...
> > 	}
> > 
> > In vfs_cmd_create(), the context phase is transitioned to
> > FS_CONTEXT_CREATING before calling vfs_get_tree():
> > 
> > 	// fs/fsopen.c
> > 	static int vfs_cmd_create(struct fs_context *fc, bool exclusive)
> > 	{
> > 		...
> > 		fc->phase = FS_CONTEXT_CREATING;
> > 		...
> > 		ret = vfs_get_tree(fc);
> > 		...
> > 	}
> > 
> > However, vfs_get_tree() may return -ERESTARTNOINTR if the filesystem
> > implementation needs to restart the syscall. For example, cgroup v1 does
> > this when it encounters a race condition where the root is dying:
> > 
> > 	// kernel/cgroup/cgroup-v1.c
> > 	int cgroup1_get_tree(struct fs_context *fc)
> > 	{
> > 		...
> > 		if (unlikely(ret > 0)) {
> > 			msleep(10);
> > 			return restart_syscall();
> > 		}
> > 		return ret;
> > 	}
> > 
> > If the syscall is restarted, fsconfig() is called again and retrieves
> > the *same* fs_context. However, vfs_cmd_create() rejects the call
> > because the phase was left as FS_CONTEXT_CREATING during the first
> > attempt:
> 
> Well, not quite. The phase is actually set to FS_CONTEXT_FAILED if
> vfs_get_tree() returns any error. Still the effect is the same.

Uh, I'm not sure we should do this. If this only affects cgroup v1 then
I say we should simply not care at all. It's a deprecated api and anyone
using it uses something that is inherently broken and a big portion of
userspace has already migrated. The current or upcoming systemd release
has dropped all cgroup v1 support.

Generally, making fsconfig() restartable is not as trivial as it looks
because once you called into the filesystem the config that was setup
might have already been consumed. That's definitely the case for stuff
in overlayfs and others. So no, that patch won't work and btw, I
remembered that we already had that discussion a few years ago and I was
right:

https://lore.kernel.org/20200923201958.b27ecda5a1e788fb5f472bcd@virtuozzo.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ