lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A4F6961B-452E-4B0E-B7AC-866B27FA732A@meta.com>
Date: Wed, 9 Jul 2025 22:41:06 +0000
From: Song Liu <songliubraving@...a.com>
To: NeilBrown <neil@...wn.name>
CC: Christian Brauner <brauner@...nel.org>, Tingmao Wang <m@...wtm.org>,
        Mickaël Salaün <mic@...ikod.net>,
        Song Liu
	<song@...nel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-security-module@...r.kernel.org"
	<linux-security-module@...r.kernel.org>,
        Kernel Team <kernel-team@...a.com>,
        "andrii@...nel.org" <andrii@...nel.org>,
        "eddyz87@...il.com"
	<eddyz87@...il.com>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "martin.lau@...ux.dev"
	<martin.lau@...ux.dev>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "jack@...e.cz" <jack@...e.cz>,
        "kpsingh@...nel.org" <kpsingh@...nel.org>,
        "mattbobrowski@...gle.com" <mattbobrowski@...gle.com>,
        Günther Noack <gnoack@...gle.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator



> On Jul 9, 2025, at 3:14 PM, NeilBrown <neil@...wn.name> wrote:
> 
> On Tue, 08 Jul 2025, Song Liu wrote:
>> Hi Christian, 
>> 
>> Thanks for your comments! 
>> 
>>> On Jul 7, 2025, at 4:17 AM, Christian Brauner <brauner@...nel.org> wrote:
>> 
>> [...]
>> 
>>>>> 3/ Extend vfs_walk_ancestors() to pass a "may sleep" flag to the callback.
>>>> 
>>>> I think that's fine.
>>> 
>>> Ok, sorry for the delay but there's a lot of different things going on
>>> right now and this one isn't exactly an easy thing to solve.
>>> 
>>> I mentioned this before and so did Neil: the lookup implementation
>>> supports two modes sleeping and non-sleeping. That api is abstracted
>>> away as heavily as possible by the VFS so that non-core code will not be
>>> exposed to it other than in exceptional circumstances and doesn't have
>>> to care about it.
>>> 
>>> It is a conceptual dead-end to expose these two modes via separate APIs
>>> and leak this implementation detail into non-core code. It will not
>>> happen as far as I'm concerned.
>>> 
>>> I very much understand the urge to get the refcount step-by-step thing
>>> merged asap. Everyone wants their APIs merged fast. And if it's
>>> reasonable to move fast we will (see the kernfs xattr thing).
>>> 
>>> But here are two use-cases that ask for the same thing with different
>>> constraints that closely mirror our unified approach. Merging one
>>> quickly just to have something and then later bolting the other one on
>>> top, augmenting, or replacing, possible having to deprecate the old API
>>> is just objectively nuts. That's how we end up with a spaghetthi helper
>>> collection. We want as little helper fragmentation as possible.
>>> 
>>> We need a unified API that serves both use-cases. I dislike
>>> callback-based APIs generally but we have precedent in the VFS for this
>>> for cases where the internal state handling is delicate enough that it
>>> should not be exposed (see __iterate_supers() which does exactly work
>>> like Neil suggested down to the flag argument itself I added).
>>> 
>>> So I'm open to the callback solution.
>>> 
>>> (Note for really absurd perf requirements you could even make it work
>>> with static calls I'm pretty sure.)
>> 
>> I guess we will go with Mickaël’s idea:
>> 
>>> int vfs_walk_ancestors(struct path *path,
>>>                      bool (*walk_cb)(const struct path *ancestor, void *data),
>>>                      void *data, int flags)
>>> 
>>> The walk continue while walk_cb() returns true.  walk_cb() can then
>>> check if @ancestor is equal to a @root, or other properties.  The
>>> walk_cb() return value (if not bool) should not be returned by
>>> vfs_walk_ancestors() because a walk stop doesn't mean an error.
>> 
>> If necessary, we hide “root" inside @data. This is good. 
>> 
>>> @path would be updated with latest ancestor path (e.g. @root).
>> 
>> Update @path to the last ancestor and hold proper references. 
>> I missed this part earlier. With this feature, vfs_walk_ancestors 
>> should work usable with open-codeed bpf path iterator.
> 
> I don't think path should be updated.  That adds complexity which might
> not be needed.  The original (landlock) requirements were only to look
> at each ancestor, not to take a reference to any of them.

I think this is the ideal case that landlock wants in the long term. 
But we may need to take references when the attempt fails. Also, 
current landlock code takes reference at each step. 

> If the caller needs a reference to any of the ancestors I think that
> walk_cb() needs to take that reference and store it in data.
> Note that attempting to take the reference might fail.  See
> legitimize_path() in fs/namei.c.
> 
> It isn't yet clear to me what would be a good API for requesting the
> reference.
> One option would be for vfs_walk_ancestors() to pass another void* to
> walk_cb(), and it passed it on to vfs_legitimize_path() which extracts
> the seq numbers from there.
> Another might be that the path passed to walk_cb is always
> nameidata.path, and so when that is passed to vfs_legitimize_path() path
> it can use container_of() to find the seq numbers.

Letting walk_cb() call vfs_legitimize_path() seems suboptimal to me. 
I think the original goal is to have vfs_walk_ancestors() to:
  1. Try to walk the ancestors without taking any references;
  2. Detect when the not-taking-reference walk is not reliable;
  3. Maybe, retry the walk from beginning, but takes references on 
     each step. 

With walk_cb() calling vfs_legitimize_path(), we are moving #2 above 
to walk_cb(). I think this is not what we want? 

> If vfs_legitimize_path() fail, walk_cb() might want to ask for the walk
> to be restarted.
> 
>> 
>> I have a question about this behavior with RCU walk. IIUC, RCU 
>> walk does not hold reference to @ancestor when calling walk_cb().
>> If walk_cb() returns false, shall vfs_walk_ancestors() then
>> grab a reference on @ancestor? This feels a bit weird to me. 
>> Maybe “updating @path to the last ancestor” should only apply to
>> LOOKUP_RCU==false case? 
>> 
>>> @flags could contain LOOKUP_RCU or not, which enables us to have
>>> walk_cb() not-RCU compatible.
>>> 
>>> When passing LOOKUP_RCU, if the first call to vfs_walk_ancestors()
>>> failed with -ECHILD, the caller can restart the walk by calling
>>> vfs_walk_ancestors() again but without LOOKUP_RCU.
>> 
>> 
>> Given we want callers to handle -ECHILD and call vfs_walk_ancestors
>> again without LOOKUP_RCU, I think we should keep @path not changed
>> With LOOKUP_RCU==true, and only update it to the last ancestor 
>> when LOOKUP_RCU==false.
> 
> No, we really don't want to pass a LOOKUP_RCU() flag to
> vfs_walk_ancestors().
> vfs_walk_ancestors() might choose to pass that flag to walk_cb().

In this case, we will need vfs_walk_ancestors to handle #3 above. 

Thanks,
Song

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ