[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250813202142.GB115258@cmpxchg.org>
Date: Wed, 13 Aug 2025 16:21:42 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Kuniyuki Iwashima <kuniyu@...gle.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Neal Cardwell <ncardwell@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemb@...gle.com>,
Matthieu Baerts <matttbe@...nel.org>,
Mat Martineau <martineau@...nel.org>,
Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeel.butt@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Koutný <mkoutny@...e.com>,
Tejun Heo <tj@...nel.org>, Simon Horman <horms@...nel.org>,
Geliang Tang <geliang@...nel.org>,
Muchun Song <muchun.song@...ux.dev>,
Mina Almasry <almasrymina@...gle.com>,
Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org,
mptcp@...ts.linux.dev, cgroups@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v3 net-next 12/12] net-memcg: Decouple controlled memcg
from global protocol memory accounting.
On Wed, Aug 13, 2025 at 11:43:15AM -0700, Kuniyuki Iwashima wrote:
> On Wed, Aug 13, 2025 at 6:00 AM Johannes Weiner <hannes@...xchg.org> wrote:
> This change stop double-charging by opting out of _the
> networking layer one_ because it interferes with memcg
> and complicates configuration of memory.max and the
> global networking limit.
No, we do want the global limits as a backstop - even if every single
cgroup in the system has its own memory limit.
Sure, from a fairness POV, we want socket buffers accounted towards
the containers' memory footprint and subject to their limits.
But that doesn't imply that we can let the cgroup limit be the only
thing curbing an explosion in socket buffers.
This isn't about fairness, but about host stability.
The MM can easily get rid of file cache and heap pages, but it has
limited to no control over the socket buffer lifetime. If you split a
1TB host into 8 containers limited to ~128G, that doesn't mean you
want to allow up to 1TB of memory in socket buffers. That could make
low memory situations unrecoverable.
> > Maybe their socket buffers is the only thing that happens
> > to matter to *you*, but this is in no way a generic, universal,
> > upstreamable solution. Knob or auto-detection is not the issue.
> >
> > Nacked-by: Johannes Weiner <hannes@...xchg.org>
>
> Please let me know if this nack still applies with the
> explanation above.
Yes, for one I think it's an unacceptable behavioral change of the
sysctl semantics.
But my wider point is that I think you're trying to fix something that
is a direct result of a flawed approach to containerization, and it
would make much more sense to address that instead.
Powered by blists - more mailing lists