[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+nzeyitBVgz2a0K=kSMi5HOSVwxBuuHMqSqmT6H+oqnw@mail.gmail.com>
Date: Fri, 15 Nov 2024 12:04:02 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Amir Goldstein <amir73il@...il.com>
Cc: Song Liu <song@...nel.org>, bpf <bpf@...r.kernel.org>,
Linux-Fsdevel <linux-fsdevel@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
LSM List <linux-security-module@...r.kernel.org>, Kernel Team <kernel-team@...a.com>,
Andrii Nakryiko <andrii@...nel.org>, Eddy Z <eddyz87@...il.com>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <martin.lau@...ux.dev>, Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, KP Singh <kpsingh@...nel.org>,
Matt Bobrowski <mattbobrowski@...gle.com>, "repnop@...gle.com" <repnop@...gle.com>,
Jeff Layton <jlayton@...nel.org>, Josef Bacik <josef@...icpanda.com>,
Mickaël Salaün <mic@...ikod.net>,
"gnoack@...gle.com" <gnoack@...gle.com>
Subject: Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for
BPF based fanotify fastpath handler
On Thu, Nov 14, 2024 at 11:26 PM Amir Goldstein <amir73il@...il.com> wrote:
>
> On Thu, Nov 14, 2024 at 9:14 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song@...nel.org> wrote:
> > >
> > > +
> > > + if (bpf_is_subdir(dentry, v->dentry))
> > > + ret = FAN_FP_RET_SEND_TO_USERSPACE;
> > > + else
> > > + ret = FAN_FP_RET_SKIP_EVENT;
> >
> > It seems to me that all these patches and feature additions
> > to fanotify, new kfuncs, etc are done just to do the above
> > filtering by subdir ?
> >
> > If so, just hard code this logic as an extra flag to fanotify ?
> > So it can filter all events by subdir.
> > bpf programmability makes sense when it needs to express
> > user space policy. Here it's just a filter by subdir.
> > bpf hammer doesn't look like the right tool for this use case.
>
> Good question.
>
> Speaking as someone who has made several attempts to design
> efficient subtree filtering in fanotify, it is not as easy as it sounds.
>
> I recently implemented a method that could be used for "practical"
> subdir filtering in userspace, not before Jan has questioned if we
> should go directly to subtree filtering with bpf [1].
>
> This is not the only filter that was proposed for fanotify, where bpf
> filter came as an alternative proposal [2], but subtree filtering is by far
> the most wanted filter.
Interesting :)
Thanks for the links. Long threads back then.
Looks like bpf filtering was proposed 2 and 4 years ago and
Song's current attempt is the first real prototype that actually
implements what was discussed for so long?
Cool. I guess it has legs then.
> The problem with implementing a naive is_subtree() filter in fanotify
> is the unbounded cost to be paid by every user for every fs access
> when M such filters are installed deep in the fs tree.
Agree that the concern is real.
I just don't see how filtering in bpf vs filtering in the kernel
is going to be any different.
is_subdir() is fast, so either kernel or bpf prog calling it
on every file_open doesn't seem like a big deal.
Are you saying that 'naive is_subdir' would call is_subdir
M times for every such filter ?
I guess it can be optimized. When these M filters are installed
the kernel can check whether they're subdirs of each other
and filter a subset of M filters with a single is_subdir() call.
Same logic can be in either bpf prog or in the kernel.
> Making this more efficient then becomes a matter of trading of
> memory (inode/path cache size) and performance and depends
> on the size and depth of the watched filesystem.
> This engineering decision *is* the userspace policy that can be
> expressed by a bpf program.
>
> As you may know, Linux is lagging behind Win and MacOS w.r.t
> subtree filtering for fs events.
>
> MacOS/FreeBSD took the userspace approach with fseventsd [3].
> If you Google "fseventsd", you will get results with "High CPU and
> Memory Usage" for as far as the browser can scroll.
Yeah. Fun.
Looking at your 2 year old attempt it seems the key part was to
make sure inodes are not pinned for filtering purpose?
How would you call is_subdir() then if parent dentry is not dget() ?
Or meaning of 'pinning' is different?
> I hope the bpf-aided early subtree filtering technology would be
> able to reduce some of this overhead to facilitate a better engineering
> solution, but that remains to be proven...
Sure. Let's explore this further.
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/linux-fsdevel/20220228140556.ae5rhgqsyzm5djbp@quack3.lan/
> [2] https://lore.kernel.org/linux-fsdevel/20200828084603.GA7072@quack2.suse.cz/
> [3] https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/TechnologyOverview/TechnologyOverview.html#//apple_ref/doc/uid/TP40005289-CH3-SW1
Powered by blists - more mailing lists