[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e65222c1-83f9-4d23-b9af-16db7e6e8a42@cdn77.com>
Date: Wed, 20 Aug 2025 18:51:07 +0200
From: Matyas Hurtik <matyas.hurtik@...77.com>
To: Tejun Heo <tj@...nel.org>, Michal Koutný
<mkoutny@...e.com>
Cc: Daniel Sedlak <daniel.sedlak@...77.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Jonathan Corbet <corbet@....net>,
Neal Cardwell <ncardwell@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>,
David Ahern <dsahern@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Shakeel Butt <shakeel.butt@...ux.dev>, Yosry Ahmed <yosry.ahmed@...ux.dev>,
linux-mm@...ck.org, netdev@...r.kernel.org,
Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>, cgroups@...r.kernel.org
Subject: Re: [PATCH v4] memcg: expose socket memory pressure in a cgroup
Hello,
On 8/13/25 8:03 PM, Tejun Heo wrote:
> On Wed, Aug 13, 2025 at 02:03:28PM +0200, Michal Koutný wrote:
> ...
>> One more point to clarify -- should the value include throttling from
>> ancestors or not. (I think both are fine but) this semantic should also
>> be described in the docs. I.e. current proposal is
>> value = sum_children + self
>> and if you're see that C's value is 0, it doesn't mean its sockets
>> weren't subject of throttling. It just means you need to check also
>> values in C ancestors. Does that work?
> I was more thinking that it would account for all throttled durations, but
> it's true that we only count locally originating events for e.g.
> memory.events::low or pids.events::max. Hmm... I'm unsure. So, for events, I
> think local sources make sense as it's tracking what limits are triggering
> where. However, I'm not sure that translates well to throttle duration which
> is closer to pressure metrics than event counters. We don't distinguish the
> sources of contention when presenting pressure metrics after all.
I think calculating the value using self and ancestors would better match
the logic in mem_cgroup_under_socket_pressure() and it would avoid the
issue Michal outlined without relying on an explanation in the docs -
checking a single value per cgroup to confirm whether sockets belonging
to that cgroup were being throttled looks more intuitive to me.
If we were to have the write side of the stat in vmpressure() look
something like:
new_socket_pressure = jiffies + HZ;
old_socket_pressure = atomic_long_xchg(
&memcg->socket_pressure, new_socket_pressure);
duration_to_add = jiffies_to_usecs(
min(new_socket_pressure - old_socket_pressure, HZ));
atomic_long_add(duration_to_add, &memcg->socket_pressure_duration);
And the read side:
total_duration = 0;
for (; !mem_cgroup_is_root(memcg); memcg = parent_mem_cgroup(memcg))
total_duration += atomic_long_read(&memcg->socket_pressure_duration);
Would that work?
There would be an issue with the reported value possibly being larger
than the real duration of the throttling, due to overlapping
intervals of socket_pressure with some ancestor. Is that a problem?
Thanks,
Matyas
Powered by blists - more mailing lists