linux-kernel - Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPhsuW7LFP0ddFg_oqkDyO9s7DZX89GFQBOnX=4n5mV=VCP5oA@mail.gmail.com>
Date: Thu, 29 May 2025 12:46:00 -0700
From: Song Liu <song@...nel.org>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: Jan Kara <jack@...e.cz>, bpf@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-security-module@...r.kernel.org, 
	kernel-team@...a.com, andrii@...nel.org, eddyz87@...il.com, ast@...nel.org, 
	daniel@...earbox.net, martin.lau@...ux.dev, brauner@...nel.org, 
	kpsingh@...nel.org, mattbobrowski@...gle.com, amir73il@...il.com, 
	repnop@...gle.com, jlayton@...nel.org, josef@...icpanda.com, mic@...ikod.net, 
	gnoack@...gle.com
Subject: Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator

On Thu, May 29, 2025 at 11:35 AM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> On Thu, May 29, 2025 at 11:00:51AM -0700, Song Liu wrote:
> > On Thu, May 29, 2025 at 10:38 AM Al Viro <viro@...iv.linux.org.uk> wrote:
> > >
> > > On Thu, May 29, 2025 at 09:53:21AM -0700, Song Liu wrote:
> > >
> > > > Current version of path iterator only supports walking towards the root,
> > > > with helper path_parent. But the path iterator API can be extended
> > > > to cover other use cases.
> > >
> > > Clarify the last part, please - call me paranoid, but that sounds like
> > > a beginning of something that really should be discussed upfront.
> >
> > We don't have any plan with future use cases yet. The only example
> > I mentioned in the original version of the commit log is "walk the
> > mount tree". IOW, it is similar to the current iterator, but skips non
> > mount point iterations.
> >
> > Since we call it "path iterator", it might make sense to add ways to
> > iterate the VFS tree in different patterns. For example, we may
> > have an iterator that iterates all files within a directory. Again, we
> > don't see urgent use cases other than the current "walk to root"
> > iterator.
>
> What kinds of locking environments can that end up used in?

This will start with a referenced "struct path", in a sleepable context.

> The reason why I'm getting more and more unhappy with this thing is
> that it sounds like a massive headache for any correctness analysis in
> VFS work.
>
> Going straight to the root starting at a point you already have pinned
> is relatively mild - you can't do path_put() in any blocking contexts,
> obviously, and you'd better be careful with what you are doing on
> mountpoint traversal (e.g. combined with "now let's open that directory
> and read it" it's an instant "hell, no" - you could easily bypass MNT_LOCKED
> restrictions that way), but if there's a threat of that getting augmented
> with other things (iterating through all files in directory would be
> a very different beast from the locking POV, if nothing else)... ouch.

We are fully aware that a "files in the directory" iterator may need
different locking. This is the exact reason we want to provide this
logic as an iterator in the kernel: to get locking/etc correct in the
first place, so that the users can avoid making mistakes.

> Basically, you are creating a spot we will need to watch very carefully
> from now on.  And the rationale appears to include "so that we could
> expose that to random out-of-tree code that decided to call itself LSM",
> so pardon me for being rather suspicious about the details.

No matter what we call them, these use cases exist, out-of-tree or
in-tree, as BPF programs or kernel modules. We are learning from
Landlock here, simply because it is probably the best way to achieve
this.

This particular set introduces a safer API than combinations of
existing APIs (follow_up(), dget_parent(), etc.). It guarantees all
the memory accesses are to properly referenced kernel objects;
it also guaranteed all the acquired references are released.
Therefore, I don't see it adds risks in any sense.

Thanks,
Song