[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250611-bindung-pulver-6158a3053c87@brauner>
Date: Wed, 11 Jun 2025 13:36:46 +0200
From: Christian Brauner <brauner@...nel.org>
To: Tingmao Wang <m@...wtm.org>
Cc: Song Liu <song@...nel.org>,
Mickaël Salaün <mic@...ikod.net>, Al Viro <viro@...iv.linux.org.uk>, amir73il@...il.com,
andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org, daniel@...earbox.net,
eddyz87@...il.com, gnoack@...gle.com, jack@...e.cz, jlayton@...nel.org,
josef@...icpanda.com, kernel-team@...a.com, kpsingh@...nel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-security-module@...r.kernel.org, martin.lau@...ux.dev, mattbobrowski@...gle.com,
repnop@...gle.com
Subject: Re: [PATCH v3 bpf-next 0/5] bpf path iterator
On Mon, Jun 09, 2025 at 09:08:34AM +0100, Tingmao Wang wrote:
> On 6/9/25 07:23, Song Liu wrote:
> > On Sun, Jun 8, 2025 at 10:34 AM Tingmao Wang <m@...wtm.org> wrote:
> > [...]
> >> Hi Song, Christian, Al and others,
> >>
> >> Previously I proposed in [1] to add ability to do a reference-less parent
> >> walk for Landlock. However, as Christian pointed out and I do agree in
> >> hindsight, it is not a good idea to do things like this in non-VFS code.
> >>
> >> However, I still think this is valuable to consider given the performance
> >> improvement, and after some discussion with Mickaël, I would like to
> >> propose extending Song's helper to support such usage. While I recognize
> >> that this patch series is already in its v3, and I do not want to delay it
> >> by too much, putting this proposal out now is still better than after this
> >> has merged, so that we may consider signature changes.
> >>
> >> I've created a proof-of-concept and did some brief testing. The
> >> performance improvement attained here is the same as in [1] (with a "git
> >> status" workload, median landlock overhead 35% -> 28%, median time in
> >> landlock decreases by 26.6%).
> >>
> >> If this idea is accepted, I'm happy to work on it further, split out this
> >> patch, update the comments and do more testing etc, potentially in
> >> collaboration with Song.
> >>
> >> An alternative to this is perhaps to add a new helper
> >> path_walk_parent_rcu, also living in namei.c, that will be used directly
> >> by Landlock. I'm happy to do it either way, but with some experimentation
> >> I personally think that the code in this patch is still clean enough, and
> >> can avoid some duplication.
> >>
> >> Patch title: path_walk_parent: support reference-less walk
> >>
> >> A later commit will update the BPF path iterator to use this.
> >>
> >> Signed-off-by: Tingmao Wang <m@...wtm.org>
> > [...]
> >>
> >> -bool path_walk_parent(struct path *path, const struct path *root);
> >> +struct parent_iterator {
> >> + struct path path;
> >> + struct path root;
> >> + bool rcu;
> >> + /* expected seq of path->dentry */
> >> + unsigned next_seq;
> >> + unsigned m_seq, r_seq;
> >
> > Most of parent_iterator is not really used by reference walk.
> > So it is probably just separate the two APIs?
>
> I don't mind either way, but I feel like it might be nice to just have one
> style of APIs (i.e. an iterator with start / end / next vs just one
> function), even though this is not totally necessary for the ref-taking
> walk. After all, the BPF use case is iterator-based. This also means
> that the code at the user's side (mostly thinking of Landlock here) is
> slightly simpler.
>
> But I've not experimented with the other way. I'm open to both, and I'm
> happy to send a patch later for a separate API (in that case that would
> not depend on this and I might just start a new series).
>
> Would like to hear what VFS folks thinks of this first tho, and whether
> there's any preference in one or two APIs.
I really dislike exposing the sequence number for mounts an for
dentries. That's just nonsense and a non-VFS low-level consumer of this
API has zero business caring about any of that. It's easy to
misunderstand, it's easy to abuse so that's not a good way of doing
this. It's the wrong API.
>
> >
> > Also, is it ok to make m_seq and r_seq available out of fs/?
No, it's not.
>
> The struct is not intended to be used directly by code outside. Not sure
That doesn't mean anything. It's simply the wrong API if it has to spill
so much of its bowels.
> what is the standard way to do this but we can make it private by e.g.
> putting the seq values in another struct, if needed. Alternatively I
> think we can hide the entire struct behind an opaque pointer by doing the
> allocation ourselves.
>
> >
> >> +};
> >> +
> >> +#define PATH_WALK_PARENT_UPDATED 0
> >> +#define PATH_WALK_PARENT_ALREADY_ROOT -1
> >> +#define PATH_WALK_PARENT_RETRY -2
> >> +
> >> +void path_walk_parent_start(struct parent_iterator *pit,
> >> + const struct path *path, const struct path *root,
> >> + bool ref_less);
> >> +int path_walk_parent(struct parent_iterator *pit, struct path *next_parent);
> >> +int path_walk_parent_end(struct parent_iterator *pit);
> >
> > I think it is better to make this rcu walk a separate set of APIs.
> > IOW, we will have:
> >
> > int path_walk_parent(struct path *path, struct path *root);
> >
> > and
> >
> > void path_walk_parent_rcu_start(struct parent_iterator *pit,
> > const struct path *path, const struct path *root);
> > int path_walk_parent_rcu_next(struct parent_iterator *pit, struct path
> > *next_parent);
> > int path_walk_parent_rcu_end(struct parent_iterator *pit);
>
> (replied above)
Exposing two sets of different APIs for essentially the same things is
not going to happen.
The VFS doesn't expose a rcu variant and a non-rcu variant for itself so
we are absolutely not going to do that for outside stuff.
It always does the try RCU first, then try to continue the walk by
falling back to REF walk (e.g., via try_to_unlazy()). If that doesn't
work then let the caller know and require them to decide whether to
abort or redo everything in ref-walk.
There's zero need in that scheme for the caller to see any of the
internals of the VFS and that's what you should aim for.
Powered by blists - more mailing lists