linux-kernel - Re: [PATCH v5 bpf-next 0/5] bpf path iterator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <175089992300.2280845.10831299451925894203@noble.neil.brown.name>
Date: Thu, 26 Jun 2025 11:05:23 +1000
From: "NeilBrown" <neil@...wn.name>
To: "Tingmao Wang" <m@...wtm.org>
Cc: Mickaël Salaün <mic@...ikod.net>,
 "Song Liu" <song@...nel.org>, bpf@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-security-module@...r.kernel.org, brauner@...nel.org,
 kernel-team@...a.com, andrii@...nel.org, eddyz87@...il.com, ast@...nel.org,
 daniel@...earbox.net, martin.lau@...ux.dev, viro@...iv.linux.org.uk,
 jack@...e.cz, kpsingh@...nel.org, mattbobrowski@...gle.com,
 Günther Noack <gnoack@...gle.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator

On Thu, 26 Jun 2025, Tingmao Wang wrote:
> On 6/26/25 00:04, NeilBrown wrote:
> > On Wed, 25 Jun 2025, Mickaël Salaün wrote:
> >> On Wed, Jun 25, 2025 at 07:38:53AM +1000, NeilBrown wrote:
> >>>
> >>> Can you spell out the minimum that you need?
> >>
> >> Sure.  We'd like to call this new helper in a RCU
> >> read-side critical section and leverage this capability to speed up path
> >> walk when there is no concurrent hierarchy modification.  This use case
> >> is similar to handle_dots() with LOOKUP_RCU calling follow_dotdot_rcu().
> >>
> >> The main issue with this approach is to keep some state of the path walk
> >> to know if the next call to "path_walk_parent_rcu()" would be valid
> >> (i.e. something like a very light version of nameidata, mainly sequence
> >> integers), and to get back to the non-RCU version otherwise.
> >>
> >>>
> >>> My vague impression is that you want to search up from a given strut path,
> >>> no further then some other given path, looking for a dentry that matches
> >>> some rule.  Is that correct?
> >>
> >> Yes
> >>
> >>>
> >>> In general, the original dentry could be moved away from under the
> >>> dentry you find moments after the match is reported.  What mechanisms do
> >>> you have in place to ensure this doesn't happen, or that it doesn't
> >>> matter?
> >>
> >> In the case of Landlock, by default, a set of access rights are denied
> >> and can only be allowed by an element in the file hierarchy.  The goal
> >> is to only allow access to files under a specific directory (or directly
> >> a specific file).  That's why we only care of the file hierarchy at the
> >> time of access check.  It's not an issue if the file/directory was
> >> moved or is being moved as long as we can walk its "current" hierarchy.
> >> Furthermore, a sandboxed process is restricted from doing arbitrary
> >> mounts (and renames/links are controlled with the
> >> LANDLOCK_ACCESS_FS_REFER right).
> >>
> >> However, we need to get a valid "snapshot" of the set of dentries that
> >> (could) lead to the evaluated file/directory.
> > 
> > A "snapshot" is an interesting idea - though looking at the landlock
> > code you one need inodes, not dentries.
> > I imagine an interface where you give it a starting path, a root, and
> > and array of inode pointers, and it fills in the pointers with the path
> > - all under rcu so no references are needed.
> > But you would need some fallback if the array isn't big enough, so maybe
> > that isn't a good idea.
> > 
> > Based on the comments by Al and Christian, I think the only viable
> > approach is to pass a callback to some vfs function that does the
> > walking.
> > 
> >    vfs_walk_ancestors(struct path *path, struct path *root,
> > 		      int (*walk_cb)(struct path *ancestor, void *data),
> > 		      void *data)
> > 
> > where walk_cb() returns a negative number if it wants to abort, and is
> > given a NULL ancestor if vfs_walk_ancestors() needed to restart.
> > 
> > vfs_walk_ancestors() would initialise a "struct nameidata" and
> > effectively call handle_dots(&nd, LAST_DOTDOT) repeatedly, calling
> >     walk_cb(&nd.path, data)
> > each time.
> 
> handle_dots semantically does more than dget_parent + choose_mountpoint
> tho (which is what Landlock currently does, and is also what Song's
> iterator will do).  There is the step_into which will step into
> mountpoints (there is also code to handle symlinks, although I'm not sure
> if that's relevant for following ".."), and it will also return ENOENT if
> the path is disconnected.

Is any of this a problem for you?

> 
> Also I guess we might not need to have an entire nameidata?  In theory it
> only needs to do what follow_dotdot_rcu does without the path_connected
> check.  So it seems like given we have path and root as function argument,
> it would only need nd->{seq,m_seq}.

Those are implementation details internal to namei.c.  Certainly this
function wouldn't use all of the fields in nameidata, but it doesn't
hurt to have a few fields in a struct on the stack which don't get used.
Keeping the code simple and uniform is much more important.  Using
nameidata would help achieve that.

> 
> I might be wrong tho, but certainly the behaviour is different.

Here we get back to the question of "precisely what behaviour do you
need?".
"The same as what a previous patch did" is not a reasonable answer.

If, from userspace, you repeatedly did chdir("..") and then examined the
current directory you would get exactly the sequence of directories
provided by repeatedly calling handle_dots(..., LAST_DOTDOT).  If there
is some circumstance where that would be not acceptable for your use
case, you need to explain (and we need to document) what differences you
need and why use need it.

> 
> > 
> > How would you feel about that sort of interface?
> 
> I can't speak for Mickaël, but a callback-based interface is less flexible
> (and _maybe_ less performant?).  Also, probably we will want to fallback
> to a reference-taking walk if the walk fails (rather than, say, retry
> infinitely), and this should probably use Song's proposed iterator.  I'm
> not sure if Song would be keen to rewrite this iterator patch series in
> callback style (to be clear, it doesn't necessarily seem like a good idea
> to me, and I'm not asking him to), which means that we will end up with
> the reference walk API being a "call this function repeatedly", and the
> rcu walk API taking a callback.  I think it is still workable (after all,
> if Landlock wants to reuse the code in the callback it can just call the
> callback function itself when doing the reference walk), but it seems a
> bit "ugly" to me.

call-back can have a performance impact (less opportunity for compiler
optimisation and CPU speculation), though less than taking spinlock and
references.  However Al and Christian have drawn a hard line against
making seq numbers visible outside VFS code so I think it is the
approach most likely to be accepted.

Certainly vfs_walk_ancestors() would fallback to ref-walk if rcu-walk
resulted in -ECHILD - just like all other path walking code in namei.c.
This would be largely transparent to the caller - the caller would only
see that the callback received a NULL path indicating a restart.  It
wouldn't need to know why.

NeilBrown