[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <29defc2f-1dd4-4269-8677-34bb1ce44a55@cdn77.com>
Date: Thu, 21 Aug 2025 20:44:07 +0200
From: Matyas Hurtik <matyas.hurtik@...77.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Tejun Heo <tj@...nel.org>, Michal Koutný
<mkoutny@...e.com>, Daniel Sedlak <daniel.sedlak@...77.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Jonathan Corbet <corbet@....net>,
Neal Cardwell <ncardwell@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>,
David Ahern <dsahern@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Yosry Ahmed <yosry.ahmed@...ux.dev>, linux-mm@...ck.org,
netdev@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>, cgroups@...r.kernel.org
Subject: Re: [PATCH v4] memcg: expose socket memory pressure in a cgroup
Hello,
On 8/20/25 11:34 PM, Shakeel Butt wrote:
> On Wed, Aug 20, 2025 at 10:37:49PM +0200, Matyas Hurtik wrote:
>> Result of mem_cgroup_under_socket_pressure() depends on whether self
>> or any ancestors have had socket_pressure set. So any duration of an
>> ancestor being throttled would also mean the child was being
>> throttled. By summing our and our ancestors socket_pressure_duration
>> we should get our total time being throttled (possibly more because
>> of overlaps).
> This is not how memcg stats (and their semantics) work and maybe that
> is not what you want. In the memcg stats semactics for a given memcg
> the socket_pressure_duration metric is not the stall duration faced by
> sockets in memcg but instead it will be stall duration caused by the
> memcg and its descendants. If that is not what we want, we need to do
> something different and orthogonal to memcg stats.
By memcg stats, do you mean only the contents of the memory.stat file?
Would it be semantically consistent if we were to put it into
a separate file (like memory.net.throttled) instead?
Just to summarize the proposals of different methods of hierarchical
propagation:
1) None - keeping the reported duration local to that cgroup:
value = self
Would not be too out of place, since memory.events.local
already does not accumulate hierarchically.
To determine whether sockets in a memcg were throttled,
we would traverse the /sys/fs/cgroup/ hierarchy from root to
the cgroup of interest and sum those local durations.
2) Propagating the duration upwards (using rstat or simple iteration
towards root memcg during write):
value = self + sum of children
Most semantically consistent with other exposed stat files.
Could be added as an entry into memory.stat.
Since the pressure gets applied from ancestors to children
(see mem_cgroup_under_socket_pressure()), determining the duration of
throttling for sockets in some cgroup would be hardest in this variant.
It would involve iterating from the root to the examined cgroup and
at each node subtracting the values of its children from that nodes
value,
then the sum of that would correspond to the total duration throttled.
3) Propagating the duration downwards (write only locally,
read traversing hierarchy upwards):
value = self + sum of ancestors
Mirrors the logic used in mem_cgroup_under_socket_pressure(),
increase in the reported value for a memcg would coincide with more
throttling being done to the sockets of that memcg.
I think that variant 3 would be the most useful for diagnosing
when this socket throttling happens in a certain memcg.
I'm not sure if I understand the use case of variant 2.
Thanks,
Matyas
Powered by blists - more mailing lists