linux-kernel - Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250602-beunruhigen-sichtweise-9effb0388fdb@brauner>
Date: Mon, 2 Jun 2025 11:30:13 +0200
From: Christian Brauner <brauner@...nel.org>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: Song Liu <song@...nel.org>, Jan Kara <jack@...e.cz>, 
	bpf@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-security-module@...r.kernel.org, kernel-team@...a.com, andrii@...nel.org, eddyz87@...il.com, 
	ast@...nel.org, daniel@...earbox.net, martin.lau@...ux.dev, kpsingh@...nel.org, 
	mattbobrowski@...gle.com, amir73il@...il.com, repnop@...gle.com, jlayton@...nel.org, 
	josef@...icpanda.com, mic@...ikod.net, gnoack@...gle.com
Subject: Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator

On Fri, May 30, 2025 at 12:10:18AM +0100, Al Viro wrote:
> On Thu, May 29, 2025 at 03:13:10PM -0700, Song Liu wrote:
> 
> > Is it an issue if we only hold a reference to a MNT_LOCKED mount for
> > short period of time? "Short period" means it may get interrupted, page
> > faults, or wait for an IO (read xattr), but it won't hold a reference to the
> > mount and sleep indefinitely.
> 
> MNT_LOCKED mount itself is not a problem.  What shouldn't be done is
> looking around in the mountpoint it covers.  It depends upon the things
> you are going to do with that, but it's very easy to get an infoleak
> that way.
> 
> > > OTOH, there's a good cause for moving some of the flags, MNT_LOCKED
> > > included, out of ->mnt_flags and into a separate field in struct mount.
> > > However, that would conflict with any code using that to deal with
> > > your iterator safely.
> > >
> > > What's more, AFAICS in case of a stack of mounts each covering the root
> > > of parent mount, you stop in each of those.  The trouble is, umount(2)
> > > propagation logics assumes that intermediate mounts can be pulled out of
> > > such stack without causing trouble.  For pathname resolution that is
> > > true; it goes through the entire stack atomically wrt that stuff.
> > > For your API that's not the case; somebody who has no idea about an
> > > intermediate mount being there might get caught on it while it's getting
> > > pulled from the stack.
> > >
> > > What exactly do you need around the mountpoint crossing?
> > 
> > I thought about skipping intermediate mounts (that are hidden by
> > other mounts). AFAICT, not skipping them will not cause any issue.
> 
> It can.  Suppose e.g. that /mnt gets propagation from another namespace,
> but not the other way round and you mount something on /mnt.
> 
> Later, in that another namespace, somebody mounts something on wherever
> your /mnt gets propagation to.  A copy will be propagated _between_
> your /mnt and whatever you've mounted on top of it; it will be entirely
> invisible until you umount your /mnt.  At that point the propagated
> copy will show up there, same as if it had appeared just after your
> umount.  Prior to that it's entirely invisible.  If its original
> counterpart in another namespace gets unmounted first, the copy will
> be quietly pulled out.

Fwiw, I have explained these and similar issues at length multiple times.

> 
> Note that choose_mountpoint_rcu() callers (including choose_mountpoint())
> will have mount_lock seqcount sampled before the traversal _and_ recheck
> it after having reached the bottom of stack.  IOW, if you traverse ..
> on the way to root, you won't get caught on the sucker being pulled out.
> 
> Your iterator, OTOH, would stop in that intermediate mount - and get
> an unpleasant surprise when it comes back to do the next step (towards
> /mnt on root filesystem, that is) and finds that path->mnt points
> to something that is detached from everything - no way to get from
> it any further.  That - despite the fact that location you've started
> from is still mounted, still has the same pathname, etc. and nothing
> had been disrupted for it.

Same...