lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250724.ij7AhF9quoow@digikod.net>
Date: Thu, 24 Jul 2025 19:35:54 +0200
From: Mickaël Salaün <mic@...ikod.net>
To: Song Liu <songliubraving@...a.com>
Cc: NeilBrown <neil@...wn.name>, Christian Brauner <brauner@...nel.org>, 
	Tingmao Wang <m@...wtm.org>, Song Liu <song@...nel.org>, 
	"bpf@...r.kernel.org" <bpf@...r.kernel.org>, "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"linux-security-module@...r.kernel.org" <linux-security-module@...r.kernel.org>, Kernel Team <kernel-team@...a.com>, 
	"andrii@...nel.org" <andrii@...nel.org>, "eddyz87@...il.com" <eddyz87@...il.com>, 
	"ast@...nel.org" <ast@...nel.org>, "daniel@...earbox.net" <daniel@...earbox.net>, 
	"martin.lau@...ux.dev" <martin.lau@...ux.dev>, "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>, 
	"jack@...e.cz" <jack@...e.cz>, "kpsingh@...nel.org" <kpsingh@...nel.org>, 
	"mattbobrowski@...gle.com" <mattbobrowski@...gle.com>, Günther Noack <gnoack@...gle.com>, 
	Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator

On Mon, Jul 14, 2025 at 09:09:42PM +0000, Song Liu wrote:
> 
> > On Jul 9, 2025, at 11:28 PM, Song Liu <songliubraving@...a.com> wrote:
> 
> [...]
> 
> >>>> It isn't clear to me that vfs_walk_ancestors() needs to return anything.
> >>>> All the communication happens through walk_cb()
> >>>> 
> >>>> walk_cb() is called with a path, the data, and a "may_sleep" flag.
> >>>> If it needs to sleep but may_sleep is not set, it returns "-ECHILD"
> >>>> which causes the walk to restart and use refcounts.
> >>>> If it wants to stop, it returns 0.
> >>>> If it wants to continue, it returns 1.
> >>>> If it wants a reference to the path then it can use (new)
> >>>> vfs_legitimize_path() which might fail.
> >>>> If it wants a reference to the path and may_sleep is true, it can use
> >>>> path_get() which won't fail.
> >>>> 
> >>>> When returning -ECHILD (either because of a need to sleep or because
> >>>> vfs_legitimize_path() fails), walk_cb() would reset_data().
> >>> 
> >>> This might actually work. 
> >>> 
> >>> My only concern is with vfs_legitimize_path. It is probably safer if 
> >>> we only allow taking references with may_sleep==true, so that path_get
> >>> won’t fail. In this case, we will not need walk_cb() to call 
> >>> vfs_legitimize_path. If the user want a reference, the walk_cb will 
> >>> first return -ECHILD, and call path_get when may_sleep is true.
> >> 
> >> What is your concern with vfs_legitimize_path() ??
> >> 
> >> I've since realised that always restarting in response to -ECHILD isn't
> >> necessary and isn't how normal path-walk works.  Restarting might be
> >> needed, but the first response to -ECHILD is to try legitimize_path().
> >> If that succeeds, then it is safe to sleep.
> >> So returning -ECHILD might just result in vfs_walk_ancestors() calling
> >> legitimize_path() and then calling walk_cb() again.  Why not have
> >> walk_cb() do the vfs_legitimize_path() call (which will almost always
> >> succeed in practice).
> > 
> > After reading the emails and the code more, I think I misunderstood 
> > why we need to call vfs_legitimize_path(). The goal of “legitimize” 
> > is to get a reference on @path, so a reference-less walk may not
> > need legitimize_path() at all. Do I get this right this time? 
> > 
> > However, I still have some concern with legitimize_path: it requires
> > m_seq and r_seq recorded at the beginning of the walk, do we want
> > to pass those to walk_cb()? IIUC, one of the reason we prefer a 
> > callback based solution is that it doesn’t expose nameidata (or a
> > subset of it). Letting walk_cb to call legitimize_path appears to 
> > defeat this benefit, no? 

Yes, walk_cb() should be very light and non-blocking/non-sleepable.  If
the caller cannot give these guarantees, then it can just pass NULL
instead of a valid walk_cb(), and continue the walk (if needed) by
calling the vfs_walk_ancentors() helper again, which would not benefit
from the RCU optimization in this case.

Before this patch series land, handling of disconnected directories
should be well defined, or at least let the caller deal with it.  How do
you plan to handle disconnected directories for the eBPF use case?  See
https://lore.kernel.org/all/20250719104204.545188-1-mic@digikod.net/
Unfortunately, this issue is not solved for Landlock yet.

> > 
> > 
> > A separate question below. 
> > 
> > I still have some question about how vfs_walk_ancestors() and the 
> > walk_cb() interact. Let’s look at the landlock use case: the user 
> > (landlock) just want to look at each ancestor, but doesn’t need to 
> > take any references. walk_cb() will check @path against @root, and 
> > return 0 when @path is the same as @root. 
> > 
> > IIUC, in this case, we will record m_seq and r_seq at the beginning
> > of vfs_walk_ancestors(), and check them against mount_lock and 
> > rename_lock at the end of the walk. (Maybe we also need to check 
> > them at some points before the end of the walk?) If either seq
> > changed during the walk, we need to restart the walk, and take
> > reference on each step. Did I get this right so far? 

I think so.  You should get some inspiration from prepend_path().

> > 
> > If the above is right, here are my questions about the 
> > reference-less walk above: 
> > 
> > 1. Which function (vfs_walk_ancestors or walk_cb) will check m_seq 
> >   and r_seq? I think vfs_walk_ancestors should check them. 

Yes, walk_cb() should be as simple as possible: the simpler version
should just return a constant.

> > 2. When either seq changes, which function will call reset_data?
> >   I think there are 3 options here:
> >  2.a: vfs_walk_ancestors calls reset_data, which will be another
> >       callback function the caller passes to vfs_walk_ancestors. 
> >  2.b: walk_cb will call reset_data(), but we need a mechanism to
> >       tell walk_cb to do it, maybe a “restart” flag?
> >  2.c: Caller of vfs_walk_ancestors will call reset_data(). In 
> >       this case, vfs_walk_ancestors will return -ECHILD to its
> >       caller. But I think this option is NACKed. 
> > 
> > I think the right solution is to have vfs_walk_ancestors check
> > m_seq and r_seq, and have walk_cb call reset_data. But this is
> > Different to the proposal above. 

I'm not sure a reset_data() would be useful if walk_cb() never sleep.

If we really need such reset_data(), a fourth option would be for
walk_cb() to return a specific value (an enum instead of a bool) to
trigger the reset.

> > 
> > Do my questions above make any sense? Or maybe I totally 
> > misunderstood something?
> 
> Hi Neil, 
> 
> Did my questions/comments above make sense? I am hoping we can 
> agree on some design soon. 
> 
> Christian and Mickaël, 
> 
> Could you please also share your thoughts on this?
> 
> Current requirements from BPF side is straightforward: we just
> need a mechanism to “walk up one level and hold reference”. So
> most of the requirement comes from LandLock side. 

Have you thought about how to handle disconnected directories?

> 
> Thanks,
> Song
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ