lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkYzo_K9uF7GOO3yoKzTSbFWuNTUG3O6w1VrGCQvgWhsoQ@mail.gmail.com>
Date: Mon, 14 Oct 2024 17:15:39 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Steven Rostedt <rostedt@...dmis.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Muchun Song <muchun.song@...ux.dev>, 
	JP Kobryn <inwardvessel@...il.com>, linux-mm@...ck.org, cgroups@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Meta kernel team <kernel-team@...a.com>, 
	Daniel Xu <dxu@...uu.xyz>, bpf@...r.kernel.org, 
	Martin KaFai Lau <martin.lau@...ux.dev>
Subject: Re: [PATCH] memcg: add tracing for memcg stat updates

On Thu, Oct 10, 2024 at 10:26 AM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
>
> On Wed, Oct 09, 2024 at 06:24:55PM GMT, Yosry Ahmed wrote:
> > On Wed, Oct 9, 2024 at 6:08 PM Steven Rostedt <rostedt@...dmis.org> wrote:
> > >
> > > On Wed, 9 Oct 2024 17:46:22 -0700
> > > Yosry Ahmed <yosryahmed@...gle.com> wrote:
> > >
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -71,6 +71,10 @@
> > > > >
> > > > >  #include <linux/uaccess.h>
> > > > >
> > > > > +#define CREATE_TRACE_POINTS
> > > > > +#include <trace/events/memcg.h>
> > > > > +#undef CREATE_TRACE_POINTS
> > > > > +
> > > > >  #include <trace/events/vmscan.h>
> > > > >
> > > > >  struct cgroup_subsys memory_cgrp_subsys __read_mostly;
> > > > > @@ -682,7 +686,9 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
> > > > >                 return;
> > > > >
> > > > >         __this_cpu_add(memcg->vmstats_percpu->state[i], val);
> > > > > -       memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val));
> > > > > +       val = memcg_state_val_in_pages(idx, val);
> > > > > +       memcg_rstat_updated(memcg, val);
> > > > > +       trace_mod_memcg_state(memcg, idx, val);
> > > >
> > > > Is it too unreasonable to include the stat name?
> > > >
> > > > The index has to be correlated with the kernel config and perhaps even
> > > > version. It's not a big deal, but if performance is not a concern when
> > > > tracing is enabled anyway, maybe we can lookup the name here (or in
> > > > TP_fast_assign()).
> > >
> > > What name? Is it looked up from idx? If so, you can do it on the reading of
>
> Does reading side mean the one reading /sys/kernel/tracing/trace will do
> the translation from enums to string?
>
> > > the trace event where performance is not an issue. See the __print_symbolic()
> > > and friends in samples/trace_events/trace-events-sample.h
> >
> > Yeah they can be found using idx. Thanks for referring us to
> > __print_symbolic(), I suppose for this to work we need to construct an
> > array of {idx, name}. I think we can replace the existing memory_stats
> > and memcg1_stats/memcg1_stat_names arrays with something that we can
> > reuse for tracing, so we wouldn't need to consume extra space.
> >
> > Shakeel, what do you think?
>
> Cc Daniel & Martin
>
> I was planning to use bpftrace which can use dwarf/btf to convert the
> raw int to its enum string. Martin provided the following command to
> extract the translation from the kernel.
>
> $ bpftool btf dump file /sys/kernel/btf/vmlinux | grep -A10 node_stat_item
> [2264] ENUM 'node_stat_item' encoding=UNSIGNED size=4 vlen=46
>         'NR_LRU_BASE' val=0
>         'NR_INACTIVE_ANON' val=0
>         'NR_ACTIVE_ANON' val=1
>         'NR_INACTIVE_FILE' val=2
>         'NR_ACTIVE_FILE' val=3
>         'NR_UNEVICTABLE' val=4
>         'NR_SLAB_RECLAIMABLE_B' val=5
>         'NR_SLAB_UNRECLAIMABLE_B' val=6
>         'NR_ISOLATED_ANON' val=7
>         'NR_ISOLATED_FILE' val=8
> ...
>
> My point is userspace tools can use existing infra to extract this
> information.
>
> However I am not against adding __print_symbolic() (but without any
> duplication), so users reading /sys/kernel/tracing/trace directly can
> see more useful information as well. Please post a follow up patch after
> this one.

I briefly looked into this and I think it would be annoying to have
this, unfortunately. Even if we rework the existing arrays with memcg
stat names to be in a format conforming to tracing, we would need to
move them out to a separate header to avoid a circular dependency.

Additionally, for __count_memcg_events() things will be more
complicated because the names are not in an array in memcontrol.c, but
we use vm_event_name() and the relevant names are part of a larger
array, vmstat_text, which we would need to rework similarly.

I think this would be easier to implement if we can somehow provide a
callback that returns the name based on the index, rather than an
array. But even then, we would need to specify a different callback
for each event, so it won't be as simple as specifying the callback in
the event class.

All in all, unless we realize there is something that is fundamentally
more difficult to do in userspace, I think it's not worth adding this
unfortunately :/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ