linux-kernel - Re: [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF based fanotify fastpath handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOQ4uxjXjjkKMa1xcPyxE5vxh1U5oGZJWtofRCwp-3ikCA6cgg@mail.gmail.com>
Date: Fri, 8 Nov 2024 09:18:56 +0100
From: Amir Goldstein <amir73il@...il.com>
To: Song Liu <song@...nel.org>
Cc: Song Liu <songliubraving@...a.com>, bpf <bpf@...r.kernel.org>, 
	Linux-Fsdevel <linux-fsdevel@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	Kernel Team <kernel-team@...a.com>, "andrii@...nel.org" <andrii@...nel.org>, 
	"eddyz87@...il.com" <eddyz87@...il.com>, "ast@...nel.org" <ast@...nel.org>, 
	"daniel@...earbox.net" <daniel@...earbox.net>, "martin.lau@...ux.dev" <martin.lau@...ux.dev>, 
	"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>, "brauner@...nel.org" <brauner@...nel.org>, 
	"jack@...e.cz" <jack@...e.cz>, "kpsingh@...nel.org" <kpsingh@...nel.org>, 
	"mattbobrowski@...gle.com" <mattbobrowski@...gle.com>, "repnop@...gle.com" <repnop@...gle.com>, 
	"jlayton@...nel.org" <jlayton@...nel.org>, Josef Bacik <josef@...icpanda.com>
Subject: Re: [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF based
 fanotify fastpath handler

On Thu, Nov 7, 2024 at 9:48 PM Song Liu <song@...nel.org> wrote:
>
> On Thu, Nov 7, 2024 at 12:34 PM Amir Goldstein <amir73il@...il.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 8:53 PM Song Liu <songliubraving@...a.com> wrote:
> > >
> > >
> > >
> > > > On Nov 7, 2024, at 3:10 AM, Amir Goldstein <amir73il@...il.com> wrote:
> > > >
> > > > On Wed, Oct 30, 2024 at 12:13 AM Song Liu <song@...nel.org> wrote:
> > > >>
> > > >> This test shows a simplified logic that monitors a subtree. This is
> > > >> simplified as it doesn't handle all the scenarios, such as:
> > > >>
> > > >>  1) moving a subsubtree into/outof the being monitoring subtree;
> > > >
> > > > There is a solution for that (see below)
> > > >
> > > >>  2) mount point inside the being monitored subtree
> > > >
> > > > For that we will need to add the MOUNT/UNMOUNT/MOVE_MOUNT events,
> > > > but those have been requested by userspace anyway.
> > > >
> > > >>
> > > >> Therefore, this is not to show a way to reliably monitor a subtree.
> > > >> Instead, this is to test the functionalities of bpf based fastpath.
> > > >> To really monitor a subtree reliably, we will need more complex logic.
> > > >
> > > > Actually, this example is the foundation of my vision for efficient and race
> > > > free subtree filtering:
> > > >
> > > > 1. The inode map is to be treated as a cache for the is_subdir() query
> > >
> > > Using is_subdir() as the truth and managing the cache in inode map seems
> > > promising to me.
> > >
> > > > 2. Cache entries can also have a "distance from root" (i.e. depth) value
> > > > 3. Each unknown queried path can call is_subdir() and populate the cache
> > > >    entries for all ancestors
> > > > 4. The cache/map size should be limited and when limit is reached,
> > > >    evicting entries by depth priority makes sense
> > > > 5. A rename event for a directory whose inode is in the map and whose
> > > >   new parent is not in the map or has a different value than old parent
> > > >   needs to invalidate the entire map
> > > > 6. fast_path also needs a hook from inode evict to clear cache entries
> > >
> > > The inode map is physically attached to the inode itself. So the evict
> > > event is automatically handled. IOW, an inode's entry in the inode map
> > > is automatically removed when the inode is freed. For the same reason,
> > > we don't need to set a limit in map size and add evicting logic. Of
> > > course, this works based on the assumption that we don't use too much
> > > memory for each inode. I think this assumption is true.
> >
> > Oh no, it is definitely wrong each inode is around 1K and this was the
> > main incentive to implement FAN_MARK_EVICTABLE and
> > FAN_MARK_FILESYSTEK in the first place, because recursive
> > pinning of all inodes in a large tree is not scalable to large trees.
>
> The inode map does not pin the inode in memory. The inode will
> still be evicted as normal. The entry associated with the being
> evicted inode in the map will be freed together as the inoide is freed.
> The overhead I mentioned was the extra few bytes (for the flag, etc.)
> per inode. When an evicted inode is loaded again, we will use
> is_subdir() to recreate the entry in the inode map.
>
> Does this make sense and address the concern?
>

Yes, that sounds good.

Thanks,
Amir.