[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzakHh3qm2JBsWE8qnMmZMeM7w5vZGneKAsLM_vktPbc9g@mail.gmail.com>
Date: Fri, 31 Mar 2023 11:19:45 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Matthew Wilcox <willy@...radead.org>,
Jiri Olsa <olsajiri@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Hao Luo <haoluo@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, bpf@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-perf-users@...r.kernel.org,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...omium.org>,
Stanislav Fomichev <sdf@...gle.com>,
Daniel Borkmann <daniel@...earbox.net>,
Namhyung Kim <namhyung@...il.com>,
Dave Chinner <david@...morbit.com>
Subject: Re: [PATCHv3 bpf-next 0/9] mm/bpf/perf: Store build id in file object
On Wed, Mar 22, 2023 at 8:45 AM Arnaldo Carvalho de Melo
<acme@...nel.org> wrote:
>
> Em Sat, Mar 18, 2023 at 03:16:45PM +0000, Matthew Wilcox escreveu:
> > On Sat, Mar 18, 2023 at 09:33:49AM +0100, Jiri Olsa wrote:
> > > On Thu, Mar 16, 2023 at 05:34:41PM +0000, Matthew Wilcox wrote:
> > > > On Thu, Mar 16, 2023 at 06:01:40PM +0100, Jiri Olsa wrote:
> > > > > hi,
> > > > > this patchset adds build id object pointer to struct file object.
> > > > >
> > > > > We have several use cases for build id to be used in BPF programs
> > > > > [2][3].
> > > >
> > > > Yes, you have use cases, but you never answered the question I asked:
> > > >
> > > > Is this going to be enabled by every distro kernel, or is it for special
> > > > use-cases where only people doing a very specialised thing who are
> > > > willing to build their own kernels will use it?
> > >
> > > I hope so, but I guess only time tell.. given the response by Ian and Andrii
> > > there are 3 big users already
> >
> > So the whole "There's a config option to turn it off" shtick is just a
> > fig-leaf. I won't ever see it turned off. You're imposing the cost of
> > this on EVERYONE who runs a distro kernel. And almost nobody will see
> > any benefits from it. Thanks for admitting that.
>
> I agree that build-ids are not useful for all 'struct file' uses, just
> for executable files and for people wanting to have better observability
> capabilities.
>
> Having said that, it seems there will be no extra memory overhead at
> least for a fedora:36 x86_64 kernel:
>
> void __init files_init(void)
> {
> filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
> SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL);
> percpu_counter_init(&nr_files, 0, GFP_KERNEL);
> }
>
> [root@...co ~]# pahole file | grep size: -A2
> /* size: 232, cachelines: 4, members: 20 */
> /* sum members: 228, holes: 1, sum holes: 4 */
> /* last cacheline: 40 bytes */
> [acme@...co perf-tools]$ uname -a
> Linux quaco 6.1.11-100.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Feb 9 20:36:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
> [root@...co ~]# head -2 /proc/slabinfo
> slabinfo - version: 2.1
> # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
> [root@...co ~]# grep -w filp /proc/slabinfo
> filp 12452 13056 256 32 2 : tunables 0 0 0 : slabdata 408 408 0
> [root@...co ~]#
>
> so there are 24 bytes on the 4th cacheline that are not being used,
> right?
Well, even better then!
>
> One other observation is that maybe we could do it as the 'struct sock'
> hierachy in networking, where we would have a 'struct exec_file' that
> would be:
>
> struct exec_file {
> struct file file;
> char build_id[20];
> }
>
> say, and then when we create the 'struct file' in __alloc_file() we
> could check some bit in 'flags' like Al Viro suggested and pick a
> different slab than 'filp_cachep', that has that extra space for the
> build_id (and whatever else exec related state we may end up wanting, if
> ever).
>
> No core fs will need to know about that except when we go free it, to
> free from the right slab cache.
>
> In current distro configs, no overhead would take place if I read that
> SLAB_HWCACHE_ALIGN thing right, no?
Makes sense to me as well. Whatever the solution, as long as it's
usable from NMI contexts would be fine for the purposes of fetching
build ID. It would be good to hear from folks that are opposing adding
a pointer field to struct file whether they prefer this way instead?
>
> - Arnaldo
Powered by blists - more mailing lists