linux-kernel - Re: [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in do_lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250818205606.GD222315@ZenIV>
Date: Mon, 18 Aug 2025 21:56:06 +0100
From: Al Viro <viro@...iv.linux.org.uk>
To: linux-fsdevel@...r.kernel.org
Cc: brauner@...nel.org, jack@...e.cz, linux-kernel@...r.kernel.org,
	linux-kernel-mentees@...ts.linux.dev,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ryan Chung <seokwoo.chung130@...il.com>
Subject: Re: [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2)
 with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling
 in do_lock_mount())

On Mon, Aug 18, 2025 at 09:14:28PM +0100, Al Viro wrote:

> Alternative would be to treat these races as "act as if we'd won and
> the other guy had overmounted ours", i.e. *NOT* follow mounts.  Again,
> for old syscalls that's fine - if another thread has raced with us and
> mounted something on top of the place we want to mount on, it could just
> as easily have come *after* we'd completed mount(2) and mounted their
> stuff on top of ours.  If userland is not fine with such outcome, it needs
> to provide serialization between the callers.  For move_mount(2)... again,
> the only real question is empty to_path case.
> 
> Comments?

Thinking about it a bit more...  Unfortunately, there's another corner
case: "." as mountpoint.  That would affect that old syscalls as well
and I'm not sure that there's no userland code that relies upon the
current behaviour.

Background: pathname resolution does *NOT* follow mounts on the starting
point and it does not follow mounts after "."

; mkdir /tmp/foo
; mount -t tmpfs none /tmp/foo
; cd /tmp/foo
; echo under > a
; cat /tmp/foo/a
under
; mount -t tmpfs none /tmp/foo
; cat a
under
; cat /tmp/foo/a
cat: /tmp/foo/a: no such file or directory
; echo under > b
; cat b
under
; cat /tmp/foo/b
cat: /tmp/foo/b: no such file or directory
;

It's been a bad decision (if it can be called that - it's been more
of an accident, AFAICT), but it's decades too late to change it.
And interaction with mount is also fun: mount(2) *DOES* follow mounts
on the end of any pathname, no matter what.  So in case when we are
standing in an overmounted directory, ls . will show the contents of
that directory, but mount <something> . will mount on top of whatever's
mounted there.

So the alternative I've mentioned above would change the behaviour of
old syscalls in a corner case that just might be actually used in userland
code - including the scripts run at the boot time, of all things ;-/

IOW, it probably falls under "can't touch that, no matter how much we'd
like to" ;-/  Pity, that...

That leaves the question of MOVE_MOUNT_BENEATH with empty pathname -
do we want a variant that would say "slide precisely under the opened
directory I gave you, no matter what might overmount it"?

At the very least this corner case needs to be documented in move_mount(2)
- behaviour of
	move_mount(_, _, dir_fd, "",
		   MOVE_MOUNT_T_EMPTY | MOVE_MOUNT_BENEATH)
has two apriori reasonable variants ("slide right under the top of
whatever pile there might be over dir_fd" and "slide right under dir_fd
itself, no matter what pile might be on top of that") and leaving it
unspecified is not good, IMO...