[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8ae11e3e0d9339e6c60556fcd2734a37da3b4a11.camel@kernel.org>
Date: Tue, 19 Nov 2024 09:21:47 -0500
From: Jeff Layton <jlayton@...nel.org>
To: Song Liu <songliubraving@...a.com>, Jan Kara <jack@...e.cz>
Cc: Christian Brauner <brauner@...nel.org>, Song Liu <song@...nel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-security-module@...r.kernel.org"
<linux-security-module@...r.kernel.org>, Kernel Team
<kernel-team@...a.com>, "andrii@...nel.org" <andrii@...nel.org>,
"eddyz87@...il.com" <eddyz87@...il.com>, "ast@...nel.org" <ast@...nel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>, "martin.lau@...ux.dev"
<martin.lau@...ux.dev>, "viro@...iv.linux.org.uk"
<viro@...iv.linux.org.uk>, "kpsingh@...nel.org" <kpsingh@...nel.org>,
"mattbobrowski@...gle.com" <mattbobrowski@...gle.com>,
"amir73il@...il.com" <amir73il@...il.com>, "repnop@...gle.com"
<repnop@...gle.com>, Josef Bacik <josef@...icpanda.com>, "mic@...ikod.net"
<mic@...ikod.net>, "gnoack@...gle.com" <gnoack@...gle.com>
Subject: Re: [PATCH bpf-next 2/4] bpf: Make bpf inode storage available to
tracing program
On Fri, 2024-11-15 at 17:35 +0000, Song Liu wrote:
> Hi Jan,
>
> > On Nov 15, 2024, at 3:19 AM, Jan Kara <jack@...e.cz> wrote:
>
> [...]
>
> > > AFAICT, we need to modify how lsm blob are managed with
> > > CONFIG_BPF_SYSCALL=y && CONFIG_BPF_LSM=n case. The solution, even
> > > if it gets accepted, doesn't really save any memory. Instead of
> > > growing struct inode by 8 bytes, the solution will allocate 8
> > > more bytes to inode->i_security. So the total memory consumption
> > > is the same, but the memory is more fragmented.
> >
> > I guess you've found a better solution for this based on James' suggestion.
> >
> > > Therefore, I think we should really step back and consider adding
> > > the i_bpf_storage to struct inode. While this does increase the
> > > size of struct inode by 8 bytes, it may end up with less overall
> > > memory consumption for the system. This is why.
> > >
> > > When the user cannot use inode local storage, the alternative is
> > > to use hash maps (use inode pointer as key). AFAICT, all hash maps
> > > comes with non-trivial overhead, in memory consumption, in access
> > > latency, and in extra code to manage the memory. OTOH, inode local
> > > storage doesn't have these issue, and is usually much more efficient:
> > > - memory is only allocated for inodes with actual data,
> > > - O(1) latency,
> > > - per inode data is freed automatically when the inode is evicted.
> > > Please refer to [1] where Amir mentioned all the work needed to
> > > properly manage a hash map, and I explained why we don't need to
> > > worry about these with inode local storage.
> >
> > Well, but here you are speaking of a situation where bpf inode storage
> > space gets actually used for most inodes. Then I agree i_bpf_storage is the
> > most economic solution. But I'd also expect that for vast majority of
> > systems the bpf inode storage isn't used at all and if it does get used, it
> > is used only for a small fraction of inodes. So we are weighting 8 bytes
> > per inode for all those users that don't need it against more significant
> > memory savings for users that actually do need per inode bpf storage. A
> > factor in this is that a lot of people are running some distribution kernel
> > which generally enables most config options that are at least somewhat
> > useful. So hiding the cost behind CONFIG_FOO doesn't really help such
> > people.
>
> Agreed that an extra pointer will be used if there is no actual users
> of it. However, in longer term, "most users do not use bpf inode
> storage" may not be true. As kernel engineers, we may not always notice
> when user space is using some BPF features. For example, systemd has
> a BPF LSM program "restrict_filesystems" [1]. It is enabled if the
> user have lsm=bpf in kernel args. I personally noticed it as a
> surprise when we enabled lsm=bpf.
>
> > I'm personally not *so* hung up about a pointer in struct inode but I can
> > see why Christian is and I agree adding a pointer there isn't a win for
> > everybody.
>
> I can also understand Christian's motivation. However, I am a bit
> frustrated because similar approach (adding a pointer to the struct)
> worked fine for other popular data structures: task_struct, sock,
> cgroup.
>
There are (usually) a lot more inodes on a host than all of those other
structs combined. Worse, struct inode is often embedded in other
structs, and adding fields can cause alignment problems there.
> > Longer term, I think it may be beneficial to come up with a way to attach
> > private info to the inode in a way that doesn't cost us one pointer per
> > funcionality that may possibly attach info to the inode. We already have
> > i_crypt_info, i_verity_info, i_flctx, i_security, etc. It's always a tough
> > call where the space overhead for everybody is worth the runtime &
> > complexity overhead for users using the functionality...
>
> It does seem to be the right long term solution, and I am willing to
> work on it. However, I would really appreciate some positive feedback
> on the idea, so that I have better confidence my weeks of work has a
> better chance to worth it.
>
> Thanks,
> Song
>
> [1] https://github.com/systemd/systemd/blob/main/src/core/bpf/restrict_fs/restrict-fs.bpf.c
fsnotify is somewhat similar to file locking in that few inodes on the
machine actually utilize these fields.
For file locking, we allocate and populate the inode->i_flctx field on
an as-needed basis. The kernel then hangs on to that struct until the
inode is freed. We could do something similar here. We have this now:
#ifdef CONFIG_FSNOTIFY
__u32 i_fsnotify_mask; /* all events this inode cares about */
/* 32-bit hole reserved for expanding i_fsnotify_mask */
struct fsnotify_mark_connector __rcu *i_fsnotify_marks;
#endif
What if you were to turn these fields into a pointer to a new struct:
struct fsnotify_inode_context {
struct fsnotify_mark_connector __rcu *i_fsnotify_marks;
struct bpf_local_storage __rcu *i_bpf_storage;
__u32 i_fsnotify_mask; /* all events this inode cares about */
};
Then whenever you have to populate any of these fields, you just
allocate one of these structs and set the inode up to point to it.
They're tiny too, so don't bother freeing it until the inode is
deallocated.
It'd mean rejiggering a fair bit of fsnotify code, but it would give
the fsnotify code an easier way to expand per-inode info in the future.
It would also slightly shrink struct inode too.
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists