[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aNAKRIcAirFMXWmO@gmail.com>
Date: Sun, 21 Sep 2025 23:23:00 +0900
From: Ryan Chung <seokwoo.chung130@...il.com>
To: Christian Brauner <brauner@...nel.org>
Cc: Al Viro <viro@...iv.linux.org.uk>, linux-fsdevel@...r.kernel.org,
jack@...e.cz, linux-kernel@...r.kernel.org,
linux-kernel-mentees@...ts.linux.dev,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2)
with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling
in do_lock_mount())
On Tue, Aug 19, 2025 at 11:40:14AM +0200, Christian Brauner wrote:
> On Mon, Aug 18, 2025 at 09:56:06PM +0100, Al Viro wrote:
> > On Mon, Aug 18, 2025 at 09:14:28PM +0100, Al Viro wrote:
> >
> > > Alternative would be to treat these races as "act as if we'd won and
> > > the other guy had overmounted ours", i.e. *NOT* follow mounts. Again,
> > > for old syscalls that's fine - if another thread has raced with us and
> > > mounted something on top of the place we want to mount on, it could just
> > > as easily have come *after* we'd completed mount(2) and mounted their
> > > stuff on top of ours. If userland is not fine with such outcome, it needs
> > > to provide serialization between the callers. For move_mount(2)... again,
> > > the only real question is empty to_path case.
> > >
> > > Comments?
> >
> > Thinking about it a bit more... Unfortunately, there's another corner
> > case: "." as mountpoint. That would affect that old syscalls as well
> > and I'm not sure that there's no userland code that relies upon the
> > current behaviour.
> >
> > Background: pathname resolution does *NOT* follow mounts on the starting
> > point and it does not follow mounts after "."
> >
> > ; mkdir /tmp/foo
> > ; mount -t tmpfs none /tmp/foo
> > ; cd /tmp/foo
> > ; echo under > a
> > ; cat /tmp/foo/a
> > under
> > ; mount -t tmpfs none /tmp/foo
> > ; cat a
> > under
> > ; cat /tmp/foo/a
> > cat: /tmp/foo/a: no such file or directory
> > ; echo under > b
> > ; cat b
> > under
> > ; cat /tmp/foo/b
> > cat: /tmp/foo/b: no such file or directory
> > ;
> >
> > It's been a bad decision (if it can be called that - it's been more
> > of an accident, AFAICT), but it's decades too late to change it.
> > And interaction with mount is also fun: mount(2) *DOES* follow mounts
> > on the end of any pathname, no matter what. So in case when we are
> > standing in an overmounted directory, ls . will show the contents of
> > that directory, but mount <something> . will mount on top of whatever's
> > mounted there.
> >
> > So the alternative I've mentioned above would change the behaviour of
> > old syscalls in a corner case that just might be actually used in userland
> > code - including the scripts run at the boot time, of all things ;-/
> >
> > IOW, it probably falls under "can't touch that, no matter how much we'd
> > like to" ;-/ Pity, that...
> >
> > That leaves the question of MOVE_MOUNT_BENEATH with empty pathname -
> > do we want a variant that would say "slide precisely under the opened
> > directory I gave you, no matter what might overmount it"?
>
> Afaict, right now MOVE_MOUNT_BENEATH will take the overmount into
> account even for "." just like mount(2) will lookup the topmost mount no
> matter what. That is what userspace expects. I don't think we need a
> variant where "." ignores overmounts for MOVE_MOUNT_BENEATH and really
> not unless someone has a specific use-case for it. If it comes to that
> we should probably add a new flag.
>
> >
> > At the very least this corner case needs to be documented in move_mount(2)
> > - behaviour of
> > move_mount(_, _, dir_fd, "",
> > MOVE_MOUNT_T_EMPTY | MOVE_MOUNT_BENEATH)
> > has two apriori reasonable variants ("slide right under the top of
> > whatever pile there might be over dir_fd" and "slide right under dir_fd
>
> Yes, that's what's intended and documented also what I wrote in my
> commit messages and what the selftests should test for. I specifically
> did not make it deviate from standard mount(2) behavior.
>
> > itself, no matter what pile might be on top of that") and leaving it
> > unspecified is not good, IMO...
>
> Sure, Aleksa can pull that into his documentation patches.
Hello all,
I am writing to follow up on this RFC patch. The last discussion was a
month ago and it seems like the conversation has stalled.
Thank you.
Best regards,
Ryan Chung
Powered by blists - more mailing lists