[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <175209985487.2234665.6008354090530669455@noble.neil.brown.name>
Date: Thu, 10 Jul 2025 08:24:14 +1000
From: "NeilBrown" <neil@...wn.name>
To: "Song Liu" <songliubraving@...a.com>
Cc: Mickaël Salaün <mic@...ikod.net>,
"Christian Brauner" <brauner@...nel.org>, "Tingmao Wang" <m@...wtm.org>,
"Song Liu" <song@...nel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-security-module@...r.kernel.org"
<linux-security-module@...r.kernel.org>, "Kernel Team" <kernel-team@...a.com>,
"andrii@...nel.org" <andrii@...nel.org>,
"eddyz87@...il.com" <eddyz87@...il.com>, "ast@...nel.org" <ast@...nel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"martin.lau@...ux.dev" <martin.lau@...ux.dev>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"jack@...e.cz" <jack@...e.cz>, "kpsingh@...nel.org" <kpsingh@...nel.org>,
"mattbobrowski@...gle.com" <mattbobrowski@...gle.com>,
Günther Noack <gnoack@...gle.com>,
"Jann Horn" <jannh@...gle.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator
On Thu, 10 Jul 2025, Song Liu wrote:
>
>
> > On Jul 9, 2025, at 9:06 AM, Mickaël Salaün <mic@...ikod.net> wrote:\
>
> [...]
>
> >> If necessary, we hide “root" inside @data. This is good.
> >>
> >>> @path would be updated with latest ancestor path (e.g. @root).
> >>
> >> Update @path to the last ancestor and hold proper references.
> >> I missed this part earlier. With this feature, vfs_walk_ancestors
> >> should work usable with open-codeed bpf path iterator.
> >>
> >> I have a question about this behavior with RCU walk. IIUC, RCU
> >> walk does not hold reference to @ancestor when calling walk_cb().
> >
> > I think a reference to the mount should be held, but not necessarily to
> > the dentry if we are still in the same mount as the original path.
>
> If we update @path and do path_put() after the walk, we have to hold
> reference to both the mnt and the dentry, no?
>
> >
> >> If walk_cb() returns false, shall vfs_walk_ancestors() then
> >> grab a reference on @ancestor? This feels a bit weird to me.
> >
> > If walk_cb() checks for a root, it will return false when the path will
> > match, and the caller would expect to get this root path, right?
>
> If the user want to walk to the global root, walk_cb() may not
> return false at all, IIUC. walk_cb() may also return false on
> other conditions.
>
> >
> > In general, it's safer to always have the same behavior when holding or
> > releasing a reference. I think the caller should then always call
> > path_put() after vfs_walk_ancestors() whatever the return code is.
> >
> >> Maybe “updating @path to the last ancestor” should only apply to
> >> LOOKUP_RCU==false case?
> >>
> >>> @flags could contain LOOKUP_RCU or not, which enables us to have
> >>> walk_cb() not-RCU compatible.
> >>>
> >>> When passing LOOKUP_RCU, if the first call to vfs_walk_ancestors()
> >>> failed with -ECHILD, the caller can restart the walk by calling
> >>> vfs_walk_ancestors() again but without LOOKUP_RCU.
> >>
> >>
> >> Given we want callers to handle -ECHILD and call vfs_walk_ancestors
> >> again without LOOKUP_RCU, I think we should keep @path not changed
> >> With LOOKUP_RCU==true, and only update it to the last ancestor
> >> when LOOKUP_RCU==false.
> >
> > As Neil said, we don't want to explicitly pass LOOKUP_RCU as a public
> > flag. Instead, walk_cb() should never sleep (and then potentially be
> > called under RCU by the vfs_walk_ancestors() implementation).
>
> How should the user handle -ECHILD without LOOKUP_RCU flag? Say the
> following code in landlocked:
>
> /* Try RCU walk first */
> err = vfs_walk_ancestors(path, ll_cb, data, LOOKUP_RCU);
>
> if (err == -ECHILD) {
> struct path walk_path = *path;
>
> /* reset any data changed by the walk */
> reset_data(data);
>
> /* now do ref-walk */
> err = vfs_walk_ancestors(&walk_path, ll_cb, data, 0);
> }
>
> Or do you mean vfs_walk_ancestors will never return -ECHILD?
> Then we need vfs_walk_ancestors to call reset_data logic, right?
It isn't clear to me that vfs_walk_ancestors() needs to return anything.
All the communication happens through walk_cb()
walk_cb() is called with a path, the data, and a "may_sleep" flag.
If it needs to sleep but may_sleep is not set, it returns "-ECHILD"
which causes the walk to restart and use refcounts.
If it wants to stop, it returns 0.
If it wants to continue, it returns 1.
If it wants a reference to the path then it can use (new)
vfs_legitimize_path() which might fail.
If it wants a reference to the path and may_sleep is true, it can use
path_get() which won't fail.
When returning -ECHILD (either because of a need to sleep or because
vfs_legitimize_path() fails), walk_cb() would reset_data().
NeilBrown
Powered by blists - more mailing lists