[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <B33A07A6-6133-486D-B333-970E1C4C5CA3@meta.com>
Date: Thu, 10 Jul 2025 06:28:03 +0000
From: Song Liu <songliubraving@...a.com>
To: NeilBrown <neil@...wn.name>
CC: Mickaël Salaün <mic@...ikod.net>,
Christian Brauner
<brauner@...nel.org>, Tingmao Wang <m@...wtm.org>,
Song Liu
<song@...nel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-security-module@...r.kernel.org"
<linux-security-module@...r.kernel.org>,
Kernel Team <kernel-team@...a.com>,
"andrii@...nel.org" <andrii@...nel.org>,
"eddyz87@...il.com"
<eddyz87@...il.com>,
"ast@...nel.org" <ast@...nel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"martin.lau@...ux.dev"
<martin.lau@...ux.dev>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"jack@...e.cz" <jack@...e.cz>,
"kpsingh@...nel.org" <kpsingh@...nel.org>,
"mattbobrowski@...gle.com" <mattbobrowski@...gle.com>,
Günther Noack <gnoack@...gle.com>,
Jann Horn
<jannh@...gle.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator
> On Jul 9, 2025, at 5:58 PM, NeilBrown <neil@...wn.name> wrote:
>
> On Thu, 10 Jul 2025, Song Liu wrote:
>>
>>
>>> On Jul 9, 2025, at 3:24 PM, NeilBrown <neil@...wn.name> wrote:
>> [...]
>>>>
>>>> How should the user handle -ECHILD without LOOKUP_RCU flag? Say the
>>>> following code in landlocked:
>>>>
>>>> /* Try RCU walk first */
>>>> err = vfs_walk_ancestors(path, ll_cb, data, LOOKUP_RCU);
>>>>
>>>> if (err == -ECHILD) {
>>>> struct path walk_path = *path;
>>>>
>>>> /* reset any data changed by the walk */
>>>> reset_data(data);
>>>>
>>>> /* now do ref-walk */
>>>> err = vfs_walk_ancestors(&walk_path, ll_cb, data, 0);
>>>> }
>>>>
>>>> Or do you mean vfs_walk_ancestors will never return -ECHILD?
>>>> Then we need vfs_walk_ancestors to call reset_data logic, right?
>>>
>>> It isn't clear to me that vfs_walk_ancestors() needs to return anything.
>>> All the communication happens through walk_cb()
>>>
>>> walk_cb() is called with a path, the data, and a "may_sleep" flag.
>>> If it needs to sleep but may_sleep is not set, it returns "-ECHILD"
>>> which causes the walk to restart and use refcounts.
>>> If it wants to stop, it returns 0.
>>> If it wants to continue, it returns 1.
>>> If it wants a reference to the path then it can use (new)
>>> vfs_legitimize_path() which might fail.
>>> If it wants a reference to the path and may_sleep is true, it can use
>>> path_get() which won't fail.
>>>
>>> When returning -ECHILD (either because of a need to sleep or because
>>> vfs_legitimize_path() fails), walk_cb() would reset_data().
>>
>> This might actually work.
>>
>> My only concern is with vfs_legitimize_path. It is probably safer if
>> we only allow taking references with may_sleep==true, so that path_get
>> won’t fail. In this case, we will not need walk_cb() to call
>> vfs_legitimize_path. If the user want a reference, the walk_cb will
>> first return -ECHILD, and call path_get when may_sleep is true.
>
> What is your concern with vfs_legitimize_path() ??
>
> I've since realised that always restarting in response to -ECHILD isn't
> necessary and isn't how normal path-walk works. Restarting might be
> needed, but the first response to -ECHILD is to try legitimize_path().
> If that succeeds, then it is safe to sleep.
> So returning -ECHILD might just result in vfs_walk_ancestors() calling
> legitimize_path() and then calling walk_cb() again. Why not have
> walk_cb() do the vfs_legitimize_path() call (which will almost always
> succeed in practice).
After reading the emails and the code more, I think I misunderstood
why we need to call vfs_legitimize_path(). The goal of “legitimize”
is to get a reference on @path, so a reference-less walk may not
need legitimize_path() at all. Do I get this right this time?
However, I still have some concern with legitimize_path: it requires
m_seq and r_seq recorded at the beginning of the walk, do we want
to pass those to walk_cb()? IIUC, one of the reason we prefer a
callback based solution is that it doesn’t expose nameidata (or a
subset of it). Letting walk_cb to call legitimize_path appears to
defeat this benefit, no?
A separate question below.
I still have some question about how vfs_walk_ancestors() and the
walk_cb() interact. Let’s look at the landlock use case: the user
(landlock) just want to look at each ancestor, but doesn’t need to
take any references. walk_cb() will check @path against @root, and
return 0 when @path is the same as @root.
IIUC, in this case, we will record m_seq and r_seq at the beginning
of vfs_walk_ancestors(), and check them against mount_lock and
rename_lock at the end of the walk. (Maybe we also need to check
them at some points before the end of the walk?) If either seq
changed during the walk, we need to restart the walk, and take
reference on each step. Did I get this right so far?
If the above is right, here are my questions about the
reference-less walk above:
1. Which function (vfs_walk_ancestors or walk_cb) will check m_seq
and r_seq? I think vfs_walk_ancestors should check them.
2. When either seq changes, which function will call reset_data?
I think there are 3 options here:
2.a: vfs_walk_ancestors calls reset_data, which will be another
callback function the caller passes to vfs_walk_ancestors.
2.b: walk_cb will call reset_data(), but we need a mechanism to
tell walk_cb to do it, maybe a “restart” flag?
2.c: Caller of vfs_walk_ancestors will call reset_data(). In
this case, vfs_walk_ancestors will return -ECHILD to its
caller. But I think this option is NACKed.
I think the right solution is to have vfs_walk_ancestors check
m_seq and r_seq, and have walk_cb call reset_data. But this is
Different to the proposal above.
Do my questions above make any sense? Or maybe I totally
misunderstood something?
Thanks,
Song
Powered by blists - more mailing lists