linux-kernel - Re: [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241128171001.xamzdpqlumqdqdkl@quack3>
Date: Thu, 28 Nov 2024 18:10:01 +0100
From: Jan Kara <jack@...e.cz>
To: Song Liu <songliubraving@...a.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Amir Goldstein <amir73il@...il.com>, Song Liu <song@...nel.org>,
	bpf <bpf@...r.kernel.org>,
	Linux-Fsdevel <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	LSM List <linux-security-module@...r.kernel.org>,
	Kernel Team <kernel-team@...a.com>,
	Andrii Nakryiko <andrii@...nel.org>, Eddy Z <eddyz87@...il.com>,
	Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Martin KaFai Lau <martin.lau@...ux.dev>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
	KP Singh <kpsingh@...nel.org>,
	Matt Bobrowski <mattbobrowski@...gle.com>,
	"repnop@...gle.com" <repnop@...gle.com>,
	Jeff Layton <jlayton@...nel.org>,
	Josef Bacik <josef@...icpanda.com>,
	Mickaël Salaün <mic@...ikod.net>,
	"gnoack@...gle.com" <gnoack@...gle.com>
Subject: Re: [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify
 fiter

On Wed 27-11-24 02:16:09, Song Liu wrote:
> > On Nov 26, 2024, at 4:50 PM, Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
> > 
> 
> [...]
> 
> >>> +
> >>> +static void sample_filter_free(struct fanotify_filter_hook *filter_hook)
> >>> +{
> >>> +       struct fan_filter_sample_data *data = filter_hook->data;
> >>> +
> >>> +       path_put(&data->subtree_path);
> >>> +       kfree(data);
> >>> +}
> >>> +
> >> 
> >> Hi Song,
> >> 
> >> This example looks fine but it raises a question.
> >> This filter will keep the mount of subtree_path busy until the group is closed
> >> or the filter is detached.
> >> This is probably fine for many services that keep the mount busy anyway.
> >> 
> >> But what if this wasn't the intention?
> >> What if an Anti-malware engine that watches all mounts wanted to use that
> >> for configuring some ignore/block subtree filters?
> >> 
> >> One way would be to use a is_subtree() variant that looks for a
> >> subtree root inode
> >> number and then verifies it with a subtree root fid.
> >> A production subtree filter will need to use a variant of is_subtree()
> >> anyway that
> >> looks for a set of subtree root inodes, because doing a loop of is_subtree() for
> >> multiple paths is a no go.
> >> 
> >> Don't need to change anything in the example, unless other people
> >> think that we do need to set a better example to begin with...
> > 
> > I think we have to treat this patch as a real filter and not as an example
> > to make sure that the whole approach is workable end to end.
> > The point about not holding path/dentry is very valid.
> > The algorithm needs to support that.
> 
> Hmm.. I am not sure whether we cannot hold a refcount. If that is a 
> requirement, the algorithm will be more complex. 

Well, for production use that would certainly be a requirement. Many years
ago dnotify (the first fs notification subsystem) was preventing
filesystems from being unmounted because it required open file and it was a
pain.

> IIUC, fsnotify_mark on a inode does not hold a refcount to inode. 

The connector (head of the mark list) does hold inode reference. But we
have a hook in the unmount path (fsnotify_unmount_inodes()) which drops all
the marks and connectors for the filesystem.

> And when the inode is evicted, the mark is freed. I guess this 
> requires the user space, the AntiVirus scanner for example, to 
> hold a reference to the inode? If this is the case, I think it 
> is OK for the filter, either bpf or kernel module, to hold a 
> reference to the subtree root.

No, fsnotify pins the inodes in memory (which if fine) but releases them
when unmount should happen. Userspace doesn't need to pin anything.

> > It may very well turn out that the logic of handling many filters
> > without a loop and not grabbing a path refcnt is too complex for bpf.
> > Then this subtree filtering would have to stay as a kernel module
> > or extra flag/feature for fanotify.
> 
> Handling multiple subtrees is indeed an issue. Since we rely on 
> the mark in the SB, multiple subtrees under the same SB will share
> that mark. Unless we use some cache, accessing a file will 
> trigger multiple is_subdir() calls. 
> 
> One possible solution is that have a new helper that checks
> is_subdir() for a list of parent subtrees with a single series
> of dentry walk. IOW, something like:
> 
> bool is_subdir_of_any(struct dentry *new_dentry, 
>                       struct list_head *list_of_dentry).
> 
> For BPF, one possible solution is to walk the dentry tree 
> up to the root, under bpf_rcu_read_lock().

I can see two possible issues with this. Firstly, you don't have list_head
in a dentry you could easily use to pass dentries to a function like this.
Probably you'll need an external array with dentry pointers or something
like that.

Second issue is more inherent in the BPF filter approach - if there would
be more notification groups each watching for some subtree (like users
watching their home dirs, apps watching their subtrees with data etc.), then
we'd still end up traversing the directory tree for each such notification
group. That seems suboptimal but I have to think how much we care how we
could possibly avoid that.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR