lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250611.faich0Chohg3@digikod.net>
Date: Wed, 11 Jun 2025 17:39:00 +0200
From: Mickaël Salaün <mic@...ikod.net>
To: Christian Brauner <brauner@...nel.org>
Cc: Tingmao Wang <m@...wtm.org>, Song Liu <song@...nel.org>, 
	Al Viro <viro@...iv.linux.org.uk>, amir73il@...il.com, andrii@...nel.org, ast@...nel.org, 
	bpf@...r.kernel.org, daniel@...earbox.net, eddyz87@...il.com, gnoack@...gle.com, 
	jack@...e.cz, jlayton@...nel.org, josef@...icpanda.com, kernel-team@...a.com, 
	kpsingh@...nel.org, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-security-module@...r.kernel.org, martin.lau@...ux.dev, mattbobrowski@...gle.com, 
	repnop@...gle.com
Subject: Re: [PATCH v3 bpf-next 0/5] bpf path iterator

On Wed, Jun 11, 2025 at 01:36:46PM +0200, Christian Brauner wrote:
> On Mon, Jun 09, 2025 at 09:08:34AM +0100, Tingmao Wang wrote:
> > On 6/9/25 07:23, Song Liu wrote:
> > > On Sun, Jun 8, 2025 at 10:34 AM Tingmao Wang <m@...wtm.org> wrote:
> > > [...]
> > >> Hi Song, Christian, Al and others,
> > >>
> > >> Previously I proposed in [1] to add ability to do a reference-less parent
> > >> walk for Landlock.  However, as Christian pointed out and I do agree in
> > >> hindsight, it is not a good idea to do things like this in non-VFS code.
> > >>
> > >> However, I still think this is valuable to consider given the performance
> > >> improvement, and after some discussion with Mickaël, I would like to
> > >> propose extending Song's helper to support such usage.  While I recognize
> > >> that this patch series is already in its v3, and I do not want to delay it
> > >> by too much, putting this proposal out now is still better than after this
> > >> has merged, so that we may consider signature changes.
> > >>
> > >> I've created a proof-of-concept and did some brief testing.  The
> > >> performance improvement attained here is the same as in [1] (with a "git
> > >> status" workload, median landlock overhead 35% -> 28%, median time in
> > >> landlock decreases by 26.6%).
> > >>
> > >> If this idea is accepted, I'm happy to work on it further, split out this
> > >> patch, update the comments and do more testing etc, potentially in
> > >> collaboration with Song.
> > >>
> > >> An alternative to this is perhaps to add a new helper
> > >> path_walk_parent_rcu, also living in namei.c, that will be used directly
> > >> by Landlock.  I'm happy to do it either way, but with some experimentation
> > >> I personally think that the code in this patch is still clean enough, and
> > >> can avoid some duplication.
> > >>
> > >> Patch title: path_walk_parent: support reference-less walk
> > >>
> > >> A later commit will update the BPF path iterator to use this.
> > >>
> > >> Signed-off-by: Tingmao Wang <m@...wtm.org>
> > > [...]
> > >>
> > >> -bool path_walk_parent(struct path *path, const struct path *root);
> > >> +struct parent_iterator {
> > >> +       struct path path;
> > >> +       struct path root;
> > >> +       bool rcu;
> > >> +       /* expected seq of path->dentry */
> > >> +       unsigned next_seq;
> > >> +       unsigned m_seq, r_seq;
> > > 
> > > Most of parent_iterator is not really used by reference walk.
> > > So it is probably just separate the two APIs?
> > 
> > I don't mind either way, but I feel like it might be nice to just have one
> > style of APIs (i.e. an iterator with start / end / next vs just one
> > function), even though this is not totally necessary for the ref-taking
> > walk.  After all, the BPF use case is iterator-based.  This also means
> > that the code at the user's side (mostly thinking of Landlock here) is
> > slightly simpler.
> > 
> > But I've not experimented with the other way.  I'm open to both, and I'm
> > happy to send a patch later for a separate API (in that case that would
> > not depend on this and I might just start a new series).
> > 
> > Would like to hear what VFS folks thinks of this first tho, and whether
> > there's any preference in one or two APIs.
> 
> I really dislike exposing the sequence number for mounts an for
> dentries. That's just nonsense and a non-VFS low-level consumer of this
> API has zero business caring about any of that. It's easy to
> misunderstand, it's easy to abuse so that's not a good way of doing
> this. It's the wrong API.
> 
> > 
> > > 
> > > Also, is it ok to make m_seq and r_seq available out of fs/?
> 
> No, it's not.
> 
> > 
> > The struct is not intended to be used directly by code outside.  Not sure
> 
> That doesn't mean anything. It's simply the wrong API if it has to spill
> so much of its bowels.
> 
> > what is the standard way to do this but we can make it private by e.g.
> > putting the seq values in another struct, if needed.  Alternatively I
> > think we can hide the entire struct behind an opaque pointer by doing the
> > allocation ourselves.
> > 
> > > 
> > >> +};
> > >> +
> > >> +#define PATH_WALK_PARENT_UPDATED               0
> > >> +#define PATH_WALK_PARENT_ALREADY_ROOT  -1
> > >> +#define PATH_WALK_PARENT_RETRY                 -2
> > >> +
> > >> +void path_walk_parent_start(struct parent_iterator *pit,
> > >> +                           const struct path *path, const struct path *root,
> > >> +                           bool ref_less);
> > >> +int path_walk_parent(struct parent_iterator *pit, struct path *next_parent);
> > >> +int path_walk_parent_end(struct parent_iterator *pit);
> > > 
> > > I think it is better to make this rcu walk a separate set of APIs.
> > > IOW, we will have:
> > > 
> > > int path_walk_parent(struct path *path, struct path *root);
> > > 
> > > and
> > > 
> > > void path_walk_parent_rcu_start(struct parent_iterator *pit,
> > >                            const struct path *path, const struct path *root);
> > > int path_walk_parent_rcu_next(struct parent_iterator *pit, struct path
> > > *next_parent);
> > > int path_walk_parent_rcu_end(struct parent_iterator *pit);
> > 
> > (replied above)
> 
> Exposing two sets of different APIs for essentially the same things is
> not going to happen.
> 
> The VFS doesn't expose a rcu variant and a non-rcu variant for itself so
> we are absolutely not going to do that for outside stuff.
> 
> It always does the try RCU first, then try to continue the walk by
> falling back to REF walk (e.g., via try_to_unlazy()). If that doesn't
> work then let the caller know and require them to decide whether to
> abort or redo everything in ref-walk.

That's indeed what is done by choose_mountpoint() (relying on
choose_mountpoint_rcu() when possible), but this proposal is about doing
a full path walk (i.e. multiple calls to path_walk_parent) within RCU.

> 
> There's zero need in that scheme for the caller to see any of the
> internals of the VFS and that's what you should aim for.

Yes, but how could we detect if a full path walk is invalid (because of
a rename or mount change)?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ