[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250530.euz5beesaSha@digikod.net>
Date: Fri, 30 May 2025 14:20:39 +0200
From: Mickaël Salaün <mic@...ikod.net>
To: Song Liu <song@...nel.org>
Cc: Al Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>,
bpf@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-security-module@...r.kernel.org, kernel-team@...a.com, andrii@...nel.org, eddyz87@...il.com,
ast@...nel.org, daniel@...earbox.net, martin.lau@...ux.dev, brauner@...nel.org,
kpsingh@...nel.org, mattbobrowski@...gle.com, amir73il@...il.com, repnop@...gle.com,
jlayton@...nel.org, josef@...icpanda.com, gnoack@...gle.com,
Tingmao Wang <m@...wtm.org>
Subject: Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator
On Thu, May 29, 2025 at 05:42:16PM -0700, Song Liu wrote:
> On Thu, May 29, 2025 at 4:10 PM Al Viro <viro@...iv.linux.org.uk> wrote:
> >
> > On Thu, May 29, 2025 at 03:13:10PM -0700, Song Liu wrote:
> >
> > > Is it an issue if we only hold a reference to a MNT_LOCKED mount for
> > > short period of time? "Short period" means it may get interrupted, page
> > > faults, or wait for an IO (read xattr), but it won't hold a reference to the
> > > mount and sleep indefinitely.
> >
> > MNT_LOCKED mount itself is not a problem. What shouldn't be done is
> > looking around in the mountpoint it covers. It depends upon the things
> > you are going to do with that, but it's very easy to get an infoleak
> > that way.
> >
> > > > OTOH, there's a good cause for moving some of the flags, MNT_LOCKED
> > > > included, out of ->mnt_flags and into a separate field in struct mount.
> > > > However, that would conflict with any code using that to deal with
> > > > your iterator safely.
> > > >
> > > > What's more, AFAICS in case of a stack of mounts each covering the root
> > > > of parent mount, you stop in each of those. The trouble is, umount(2)
> > > > propagation logics assumes that intermediate mounts can be pulled out of
> > > > such stack without causing trouble. For pathname resolution that is
> > > > true; it goes through the entire stack atomically wrt that stuff.
> > > > For your API that's not the case; somebody who has no idea about an
> > > > intermediate mount being there might get caught on it while it's getting
> > > > pulled from the stack.
> > > >
> > > > What exactly do you need around the mountpoint crossing?
> > >
> > > I thought about skipping intermediate mounts (that are hidden by
> > > other mounts). AFAICT, not skipping them will not cause any issue.
> >
> > It can. Suppose e.g. that /mnt gets propagation from another namespace,
> > but not the other way round and you mount something on /mnt.
> >
> > Later, in that another namespace, somebody mounts something on wherever
> > your /mnt gets propagation to. A copy will be propagated _between_
> > your /mnt and whatever you've mounted on top of it; it will be entirely
> > invisible until you umount your /mnt. At that point the propagated
> > copy will show up there, same as if it had appeared just after your
> > umount. Prior to that it's entirely invisible. If its original
> > counterpart in another namespace gets unmounted first, the copy will
> > be quietly pulled out.
>
> Thanks for sharing this information!
>
> > Note that choose_mountpoint_rcu() callers (including choose_mountpoint())
> > will have mount_lock seqcount sampled before the traversal _and_ recheck
> > it after having reached the bottom of stack. IOW, if you traverse ..
> > on the way to root, you won't get caught on the sucker being pulled out.
>
> In some of our internal discussions, we talked about using
> choose_mountpoint() instead of follow_up(). I didn't go that direction in this
> version because it requires holding "root". But if it makes more sense
> to use, choose_mountpoint(), we sure can hold "root".
>
> Alternatively, I think it is also OK to pass a zero'ed root to
> choose_mountpoint().
>
> > Your iterator, OTOH, would stop in that intermediate mount - and get
> > an unpleasant surprise when it comes back to do the next step (towards
> > /mnt on root filesystem, that is) and finds that path->mnt points
> > to something that is detached from everything - no way to get from
> > it any further. That - despite the fact that location you've started
> > from is still mounted, still has the same pathname, etc. and nothing
> > had been disrupted for it.
> >
> > And yes, landlock has a narrow race in the matching place. Needs to
> > be fixed. At least it does ignore those as far as any decisions are
> > concerned...
Thanks for pointing this out. In the case of Landlock, walking to a
disconnected mount point (because of this umount race condition) would
deny the requested access whereas it may be allowed otherwise. This is
not a security issue but still an issue because an event unrelated to
the request (umount) can abort a path resolution, which should not be
the case.
Without access to mount_lock, what would be the best way to fix this
Landlock issue while making it backportable?
>
> If we update path_parent in this patchset with choose_mountpoint(),
> and use it in Landlock, we will close this race condition, right?
choose_mountpoint() is currently private, but if we add a new filesystem
helper, I think the right approach would be to expose follow_dotdot(),
updating its arguments with public types. This way the intermediates
mount points will not be exposed, RCU optimization will be leveraged,
and usage of this new helper will be simplified.
>
> >
> > Note, BTW, that it might be better off by doing that similar to
> > d_path.c - without arseloads of dget_parent/dput et.al.; not sure
> > how feasible it is, but if everything in it can be done under
> > rcu_read_lock(), that's something to look into.
>
> I don't think we can do everything here inside rcu_read_lock().
> But d_path.c does have some code we can probably reuse or
> learn from. Also, we probably need two variations of iterators,
> one walk until absolute root, while the other walk until root of
> current->fs, just like d_path() vs. d_absolute_path(). Does this
> sound reasonable?
Passing the root to a public follow_dotdot() helper should do the job.
>
> Thanks,
> Song
>
Powered by blists - more mailing lists