lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <175093334910.2280845.2994364473463803565@noble.neil.brown.name>
Date: Thu, 26 Jun 2025 20:22:29 +1000
From: "NeilBrown" <neil@...wn.name>
To: "Song Liu" <songliubraving@...a.com>
Cc: "Tingmao Wang" <m@...wtm.org>,
 Mickaël Salaün <mic@...ikod.net>,
 "Song Liu" <song@...nel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
 "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "linux-security-module@...r.kernel.org"
 <linux-security-module@...r.kernel.org>,
 "brauner@...nel.org" <brauner@...nel.org>,
 "Kernel Team" <kernel-team@...a.com>, "andrii@...nel.org" <andrii@...nel.org>,
 "eddyz87@...il.com" <eddyz87@...il.com>, "ast@...nel.org" <ast@...nel.org>,
 "daniel@...earbox.net" <daniel@...earbox.net>,
 "martin.lau@...ux.dev" <martin.lau@...ux.dev>,
 "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
 "jack@...e.cz" <jack@...e.cz>, "kpsingh@...nel.org" <kpsingh@...nel.org>,
 "mattbobrowski@...gle.com" <mattbobrowski@...gle.com>,
 Günther Noack <gnoack@...gle.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator

On Thu, 26 Jun 2025, Song Liu wrote:
> 
> 
> > On Jun 25, 2025, at 6:05 PM, NeilBrown <neil@...wn.name> wrote:
> 
> [...]
> 
> >> 
> >> I can't speak for Mickaël, but a callback-based interface is less flexible
> >> (and _maybe_ less performant?).  Also, probably we will want to fallback
> >> to a reference-taking walk if the walk fails (rather than, say, retry
> >> infinitely), and this should probably use Song's proposed iterator.  I'm
> >> not sure if Song would be keen to rewrite this iterator patch series in
> >> callback style (to be clear, it doesn't necessarily seem like a good idea
> >> to me, and I'm not asking him to), which means that we will end up with
> >> the reference walk API being a "call this function repeatedly", and the
> >> rcu walk API taking a callback.  I think it is still workable (after all,
> >> if Landlock wants to reuse the code in the callback it can just call the
> >> callback function itself when doing the reference walk), but it seems a
> >> bit "ugly" to me.
> > 
> > call-back can have a performance impact (less opportunity for compiler
> > optimisation and CPU speculation), though less than taking spinlock and
> > references.  However Al and Christian have drawn a hard line against
> > making seq numbers visible outside VFS code so I think it is the
> > approach most likely to be accepted.
> > 
> > Certainly vfs_walk_ancestors() would fallback to ref-walk if rcu-walk
> > resulted in -ECHILD - just like all other path walking code in namei.c.
> > This would be largely transparent to the caller - the caller would only
> > see that the callback received a NULL path indicating a restart.  It
> > wouldn't need to know why.
> 
> I guess I misunderstood the proposal of vfs_walk_ancestors() 
> initially, so some clarification:
> 
> I think vfs_walk_ancestors() is good for the rcu-walk, and some 
> rcu-then-ref-walk. However, I don’t think it fits all use cases. 
> A reliable step-by-step ref-walk, like this set, works well with 
> BPF, and we want to keep it. 

The distinction between rcu-walk and ref-walk is an internal
implementation detail.  You as a caller shouldn't need to think about
the difference.  You just want to walk.  Note that LOOKUP_RCU is
documented in namei.h as "semi-internal".  The only uses outside of
core-VFS code is in individual filesystem's d_revalidate handler - they
are checking if they are allowed to sleep or not.  You should never
expect to pass LOOKUP_RCU to an VFS API - no other code does.

It might be reasonable for you as a caller to have some control over
whether the call can sleep or not.  LOOKUP_CACHED is a bit like that.
But for dotdot lookup the code will never sleep - so that is not
relevant.

I strongly suggest you stop thinking about rcu-walk vs ref-walk.  Think
about the needs of your code.  If you need a high-performance API, then
ask for a high-performance API, don't assume what form it will take or
what the internal implementation details will be.

I think you already have a clear answer that a step-by-step API will not
be read-only on the dcache (i.e.  it will adjust refcounts) and so will
not be high performance.  If you want high performance, you need to
accept a different style of API.

NeilBrown

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ