linux-kernel - [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in do_lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250818201428.GC222315@ZenIV>
Date: Mon, 18 Aug 2025 21:14:28 +0100
From: Al Viro <viro@...iv.linux.org.uk>
To: Ryan Chung <seokwoo.chung130@...il.com>
Cc: brauner@...nel.org, jack@...e.cz, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-kernel-mentees@...ts.linux.dev,
	kernel test robot <lkp@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with
 empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in
 do_lock_mount())

On Tue, Aug 19, 2025 at 02:22:35AM +0900, Ryan Chung wrote:
> Updates documentation for do_lock_mount() in fs/namespace.c
> to clarify its parameters and return description to fix
> warning reported by syzbot.
> 
> Reported-by: kernel test robot <lkp@...el.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202506301911.uysRaP8b-lkp@intel.com/
> Signed-off-by: Ryan Chung <seokwoo.chung130@...il.com>
> ---
>  fs/namespace.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ddfd4457d338..577fdff9f1a8 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2741,6 +2741,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>  /**
>   * do_lock_mount - lock mount and mountpoint
>   * @path:    target path
> + * @pinned: on success, holds a pin guarding the mountpoint

I'm not sure if 'pin' is suitable here and in any case, that's not the
only problem in that description - take a look at "Return:" part in there.

The underlying problem is the semantics of function itself.  lock_mount()
assumed that it was called on the result of pathname resolution; the
question is what to do if we race with somebody mounting something
on top of the same location while we had been grabbing namespace_sem?
"Follow through to the root of whatever's been mounted on top, same as
we'd done if pathname resolution happened slightly later" used to be a
reasonable answer, but these days we have move_mount(2), where we have
	* MOVE_MOUNT_T_EMPTY_PATH combined with empty pathname, which
will have us start with whatever the descriptor is pointing to, mounts
or no mounts.  Choosing to treat that as "follow mounts anyway" is not
a big deal.
	* MOVE_MOUNT_BENEATH - treated as "follow mounts and slip the
damn thing under the topmost one".  Again, OK for non-empty pathname,
but... for empty ones the rationale is weaker.

Alternative would be to treat these races as "act as if we'd won and
the other guy had overmounted ours", i.e. *NOT* follow mounts.  Again,
for old syscalls that's fine - if another thread has raced with us and
mounted something on top of the place we want to mount on, it could just
as easily have come *after* we'd completed mount(2) and mounted their
stuff on top of ours.  If userland is not fine with such outcome, it needs
to provide serialization between the callers.  For move_mount(2)... again,
the only real question is empty to_path case.

Comments?

Note, BTW, that attach_recursive_mnt() used to require dest_mnt/dest_mp
to be on the very top; since 6.16 it treats that as "slip it under
whatever's on top of that" - that's exactly what happens in 'beneath'
case.  So the second alternative is easily doable these days.  And
it would really simplify the lock_mount()/do_lock_mount()...