netdev - Re: [PATCH v2 net-next 2/2] mm/vmpressure: add tracepoint for socket pressure detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAAVpQUDj23KHKpMFA4J7gV=H_BnvG4z0aVxf6-B04KsYtBL=1w@mail.gmail.com>
Date: Tue, 15 Jul 2025 10:17:11 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Daniel Sedlak <daniel.sedlak@...77.com>
Cc: "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Jonathan Corbet <corbet@....net>, Neal Cardwell <ncardwell@...gle.com>, David Ahern <dsahern@...nel.org>, 
	Andrew Morton <akpm@...ux-foundation.org>, Shakeel Butt <shakeel.butt@...ux.dev>, 
	Yosry Ahmed <yosry.ahmed@...ux.dev>, linux-mm@...ck.org, netdev@...r.kernel.org, 
	Matyas Hurtik <matyas.hurtik@...77.com>, Daniel Sedlak <danie.sedlak@...77.com>
Subject: Re: [PATCH v2 net-next 2/2] mm/vmpressure: add tracepoint for socket
 pressure detection

On Tue, Jul 15, 2025 at 12:01 AM Daniel Sedlak <daniel.sedlak@...77.com> wrote:
>
> Hi Kuniyuki,
>
> On 7/14/25 8:02 PM, Kuniyuki Iwashima wrote:
> >> +TRACE_EVENT(memcg_socket_under_pressure,
> >> +
> >> +       TP_PROTO(const struct mem_cgroup *memcg, unsigned long scanned,
> >> +               unsigned long reclaimed),
> >> +
> >> +       TP_ARGS(memcg, scanned, reclaimed),
> >> +
> >> +       TP_STRUCT__entry(
> >> +               __field(u64, id)
> >> +               __field(unsigned long, scanned)
> >> +               __field(unsigned long, reclaimed)
> >> +       ),
> >> +
> >> +       TP_fast_assign(
> >> +               __entry->id = cgroup_id(memcg->css.cgroup);
> >> +               __entry->scanned = scanned;
> >> +               __entry->reclaimed = reclaimed;
> >> +       ),
> >> +
> >> +       TP_printk("memcg_id=%llu scanned=%lu reclaimed=%lu",
> >> +               __entry->id,
> >
> > Maybe a noob question: How can we translate the memcg ID
> > to the /sys/fs/cgroup/... path ?
>
> IMO this should be really named `cgroup_id` instead of `memcg_id`, but
> we kept the latter to keep consistency with the rest of the file.
>
> To find cgroup path you can use:
> - find /sys/fs/cgroup/ -inum `memcg_id`, and it will print "path" to the
> affected cgroup.
> - or you can use bpftrace tracepoint hooks and there is a helper
> function [1].

Thanks, this is good to know and worth in the commit message.

>
> Or we can put the cgroup_path to the tracepoint instead of that ID, but
> I feel it can be too much overhead, the paths can be pretty long.

Agree, the ID is good enough given we can find the cgroup by oneliner.

>
> Link: https://bpftrace.org/docs/latest#functions-cgroup_path [1]
> > It would be nice to place this patch first and the description of
> > patch 2 has how to use the new stat with this tracepoint.
>
> Sure, can do that. However, I am unsure how a good idea is to
> cross-reference commits, since each may go through a different tree
> because each commit is for a different subsystem. They would have to go
> through one tree, right?

Right.  Probably you can just assume both patches will be merged
and post the tracepoint patch to mm ML first and then add its
lore.kernel.org link and howto in the stat patch and post it to netdev ML.


>
>
> >> +               __entry->scanned,
> >> +               __entry->reclaimed)
> >> +);
> >> +
> >>   #endif /* _TRACE_MEMCG_H */
> >>
> >>   /* This part must be outside protection */
> >> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> >> index bd5183dfd879..aa9583066731 100644
> >> --- a/mm/vmpressure.c
> >> +++ b/mm/vmpressure.c
> >> @@ -21,6 +21,8 @@
> >>   #include <linux/printk.h>
> >>   #include <linux/vmpressure.h>
> >>
> >> +#include <trace/events/memcg.h>
> >> +
> >>   /*
> >>    * The window size (vmpressure_win) is the number of scanned pages before
> >>    * we try to analyze scanned/reclaimed ratio. So the window is used as a
> >> @@ -317,6 +319,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
> >>                           * pressure events can occur.
> >>                           */
> >>                          WRITE_ONCE(memcg->socket_pressure, jiffies + HZ);
> >> +                       trace_memcg_socket_under_pressure(memcg, scanned, reclaimed);
> >
> > This is triggered only when we enter the memory pressure state
> > and not when we leave the state, right ?  Is it possible to issue
> > such an event ?
>
> AFAIK, the currently used API in the vmpressure function does not have
> anything like enter or leave the socket memory pressure state. It only
> periodically re-arms the socket pressure for a specific duration. So the
> current tracepoint is called when the socket pressure is re-armed.
>
> I am not sure how feasible it is to rewrite it so we have enter and
> leave logic. I am not that familiar with those code paths.

Unlike tcp mem pressure, the memcg pressure only persists for one
second, so it won't make sense to add the "leave" event, and I guess
the assumption would be that socket_pressure will keep updated under
memory pressure.