[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250811173116.2829786-1-kuniyu@google.com>
Date: Mon, 11 Aug 2025 17:30:28 +0000
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Neal Cardwell <ncardwell@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemb@...gle.com>, Matthieu Baerts <matttbe@...nel.org>,
Mat Martineau <martineau@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeel.butt@...ux.dev>, Andrew Morton <akpm@...ux-foundation.org>,
"Michal Koutný" <mkoutny@...e.com>, Tejun Heo <tj@...nel.org>
Cc: Simon Horman <horms@...nel.org>, Geliang Tang <geliang@...nel.org>,
Muchun Song <muchun.song@...ux.dev>, Mina Almasry <almasrymina@...gle.com>,
Kuniyuki Iwashima <kuniyu@...gle.com>, Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org,
mptcp@...ts.linux.dev, cgroups@...r.kernel.org, linux-mm@...ck.org
Subject: [PATCH v2 net-next 00/12] net-memcg: Decouple controlled memcg from sk->sk_prot->memory_allocated.
Some protocols (e.g., TCP, UDP) have their own memory accounting for
socket buffers and charge memory to global per-protocol counters such
as /proc/net/ipv4/tcp_mem.
When running under a non-root cgroup, this memory is also charged to
the memcg as sock in memory.stat.
Sockets of such protocols are still subject to the global limits,
thus affected by a noisy neighbour outside cgroup.
This makes it difficult to accurately estimate and configure appropriate
global limits.
If all workloads were guaranteed to be controlled under memcg, the issue
can be worked around by setting tcp_mem[0~2] to UINT_MAX.
However, this assumption does not always hold, and processes that belong
to the root cgroup or opt out of memcg can consume memory up to the global
limit, which is problematic.
This series decouples memcg from the global memory accounting if its
memory.max is not "max". This simplifies the memcg configuration while
keeping the global limits within a reasonable range, which is only 10% of
the physical memory by default.
Overview of the series:
patch 1 is a bug fix for MPTCP
patch 2 ~ 9 move sk->sk_memcg accesses to a single place
patch 10 moves sk_memcg under CONFIG_MEMCG
patch 11 stores a flag in the lowest bit of sk->sk_memcg
patch 12 decouples memcg from sk_prot->memory_allocated based on the flag
Changes:
v2:
* Remove per-memcg knob
* Patch 11
* Set flag on sk_memcg based on memory.max
* Patch 12
* Add sk_should_enter_memory_pressure() and cover
tcp_enter_memory_pressure() calls
* Update examples in changelog
v1: https://lore.kernel.org/netdev/20250721203624.3807041-1-kuniyu@google.com/
Kuniyuki Iwashima (12):
mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready().
tcp: Simplify error path in inet_csk_accept().
net: Call trace_sock_exceed_buf_limit() for memcg failure with
SK_MEM_RECV.
net: Clean up __sk_mem_raise_allocated().
net-memcg: Introduce mem_cgroup_from_sk().
net-memcg: Introduce mem_cgroup_sk_enabled().
net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge().
net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure().
net: Define sk_memcg under CONFIG_MEMCG.
net-memcg: Store MEMCG_SOCK_ISOLATED in sk->sk_memcg.
net-memcg: Decouple controlled memcg from global protocol memory
accounting.
include/linux/memcontrol.h | 45 +++++++++-------
include/net/proto_memory.h | 15 ++++--
include/net/sock.h | 67 +++++++++++++++++++++++
include/net/tcp.h | 10 ++--
mm/memcontrol.c | 48 +++++++++++++----
net/core/sock.c | 94 +++++++++++++++++++++------------
net/ipv4/inet_connection_sock.c | 35 +++++++-----
net/ipv4/tcp.c | 3 +-
net/ipv4/tcp_output.c | 13 +++--
net/mptcp/protocol.c | 4 +-
net/mptcp/protocol.h | 4 +-
net/mptcp/subflow.c | 11 ++--
net/tls/tls_device.c | 3 +-
13 files changed, 253 insertions(+), 99 deletions(-)
--
2.51.0.rc0.155.g4a0f42376b-goog
Powered by blists - more mailing lists