[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4577db64-64f2-4102-b00e-2e7921638a7c@maowtm.org>
Date: Thu, 26 Jun 2025 01:07:36 +0100
From: Tingmao Wang <m@...wtm.org>
To: NeilBrown <neil@...wn.name>, Mickaël Salaün
<mic@...ikod.net>
Cc: Song Liu <song@...nel.org>, bpf@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-security-module@...r.kernel.org, brauner@...nel.org,
kernel-team@...a.com, andrii@...nel.org, eddyz87@...il.com, ast@...nel.org,
daniel@...earbox.net, martin.lau@...ux.dev, viro@...iv.linux.org.uk,
jack@...e.cz, kpsingh@...nel.org, mattbobrowski@...gle.com,
Günther Noack <gnoack@...gle.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator
On 6/26/25 00:04, NeilBrown wrote:
> On Wed, 25 Jun 2025, Mickaël Salaün wrote:
>> On Wed, Jun 25, 2025 at 07:38:53AM +1000, NeilBrown wrote:
>>>
>>> Can you spell out the minimum that you need?
>>
>> Sure. We'd like to call this new helper in a RCU
>> read-side critical section and leverage this capability to speed up path
>> walk when there is no concurrent hierarchy modification. This use case
>> is similar to handle_dots() with LOOKUP_RCU calling follow_dotdot_rcu().
>>
>> The main issue with this approach is to keep some state of the path walk
>> to know if the next call to "path_walk_parent_rcu()" would be valid
>> (i.e. something like a very light version of nameidata, mainly sequence
>> integers), and to get back to the non-RCU version otherwise.
>>
>>>
>>> My vague impression is that you want to search up from a given strut path,
>>> no further then some other given path, looking for a dentry that matches
>>> some rule. Is that correct?
>>
>> Yes
>>
>>>
>>> In general, the original dentry could be moved away from under the
>>> dentry you find moments after the match is reported. What mechanisms do
>>> you have in place to ensure this doesn't happen, or that it doesn't
>>> matter?
>>
>> In the case of Landlock, by default, a set of access rights are denied
>> and can only be allowed by an element in the file hierarchy. The goal
>> is to only allow access to files under a specific directory (or directly
>> a specific file). That's why we only care of the file hierarchy at the
>> time of access check. It's not an issue if the file/directory was
>> moved or is being moved as long as we can walk its "current" hierarchy.
>> Furthermore, a sandboxed process is restricted from doing arbitrary
>> mounts (and renames/links are controlled with the
>> LANDLOCK_ACCESS_FS_REFER right).
>>
>> However, we need to get a valid "snapshot" of the set of dentries that
>> (could) lead to the evaluated file/directory.
>
> A "snapshot" is an interesting idea - though looking at the landlock
> code you one need inodes, not dentries.
> I imagine an interface where you give it a starting path, a root, and
> and array of inode pointers, and it fills in the pointers with the path
> - all under rcu so no references are needed.
> But you would need some fallback if the array isn't big enough, so maybe
> that isn't a good idea.
>
> Based on the comments by Al and Christian, I think the only viable
> approach is to pass a callback to some vfs function that does the
> walking.
>
> vfs_walk_ancestors(struct path *path, struct path *root,
> int (*walk_cb)(struct path *ancestor, void *data),
> void *data)
>
> where walk_cb() returns a negative number if it wants to abort, and is
> given a NULL ancestor if vfs_walk_ancestors() needed to restart.
>
> vfs_walk_ancestors() would initialise a "struct nameidata" and
> effectively call handle_dots(&nd, LAST_DOTDOT) repeatedly, calling
> walk_cb(&nd.path, data)
> each time.
handle_dots semantically does more than dget_parent + choose_mountpoint
tho (which is what Landlock currently does, and is also what Song's
iterator will do). There is the step_into which will step into
mountpoints (there is also code to handle symlinks, although I'm not sure
if that's relevant for following ".."), and it will also return ENOENT if
the path is disconnected.
Also I guess we might not need to have an entire nameidata? In theory it
only needs to do what follow_dotdot_rcu does without the path_connected
check. So it seems like given we have path and root as function argument,
it would only need nd->{seq,m_seq}.
I might be wrong tho, but certainly the behaviour is different.
>
> How would you feel about that sort of interface?
I can't speak for Mickaël, but a callback-based interface is less flexible
(and _maybe_ less performant?). Also, probably we will want to fallback
to a reference-taking walk if the walk fails (rather than, say, retry
infinitely), and this should probably use Song's proposed iterator. I'm
not sure if Song would be keen to rewrite this iterator patch series in
callback style (to be clear, it doesn't necessarily seem like a good idea
to me, and I'm not asking him to), which means that we will end up with
the reference walk API being a "call this function repeatedly", and the
rcu walk API taking a callback. I think it is still workable (after all,
if Landlock wants to reuse the code in the callback it can just call the
callback function itself when doing the reference walk), but it seems a
bit "ugly" to me.
But this is just my opinion, and if there is a stronger desire to not
expose any VFS seqcount integers then maybe we will just need to work with
a callback.
Quick note in case anyone reading this has not seen it, a while ago I made
a POC of a non-callback style API for rcu parent walk based on Song's
series:
https://lore.kernel.org/all/dbc7ee0f1f483b7bc2ec9757672a38d99015e9ae.1749402769@maowtm.org/#iZ31fs:namei.c
>
> NeilBrown
>
Powered by blists - more mailing lists