[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250910192057.1045711-1-kuniyu@google.com>
Date: Wed, 10 Sep 2025 19:19:27 +0000
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Martin KaFai Lau <martin.lau@...ux.dev>
Cc: John Fastabend <john.fastabend@...il.com>, Stanislav Fomichev <sdf@...ichev.me>,
Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>, Shakeel Butt <shakeel.butt@...ux.dev>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Neal Cardwell <ncardwell@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
Mina Almasry <almasrymina@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>,
Kuniyuki Iwashima <kuni1840@...il.com>, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: [PATCH v8 bpf-next/net 0/6] bpf: Allow decoupling memcg from sk->sk_prot->memory_allocated.
Some protocols (e.g., TCP, UDP) have their own memory accounting for
socket buffers and charge memory to global per-protocol counters such
as /proc/net/ipv4/tcp_mem.
If the socket has sk->sk_memcg, this memory is also charged to the memcg
as sock in memory.stat.
We do not need to pay costs for two orthogonal memory accounting
mechanisms.
This series allows decoupling memcg from the global memory accounting
(memcg + tcp_mem -> memcg) if socket is configured as such by sysctl
or BPF prog.
Overview of the series:
patch 1 & 2 prepares for decoupling memcg from sk_prot->memory_allocated
based on the SK_MEMCG_EXCLUSIVE flag
patch 3 introduces net.core.memcg_exclusive
patch 4 & 5 supports flagging SK_MEMCG_EXCLUSIVE via bpf_setsockopt()
patch 6 is selftest
Changes:
v8:
* Patch 3: Fix build failure when CONFIG_NET=n
v7: https://lore.kernel.org/netdev/20250909204632.3994767-1-kuniyu@google.com/
* Rename s/ISOLATED/EXCLUSIVE/
* Add patch 3 (net.core.memcg_exclusive sysctl)
* Reorder the core patch 2 before sysctl + bpf changes
* Patch 6
* Add test for sysctl
v6: https://lore.kernel.org/netdev/20250908223750.3375376-1-kuniyu@google.com/
* Patch 4
* Update commit message
* Patch 5
* Trace sk_prot->memory_allocated + sk_prot->memory_per_cpu_fw_alloc
v5: https://lore.kernel.org/netdev/20250903190238.2511885-1-kuniyu@google.com/
* Patch 2
* Rename new variants to bpf_sock_create_{get,set}sockopt()
* Patch 3
* Limit getsockopt() to BPF_CGROUP_INET_SOCK_CREATE
* Patch 5
* Use kern_sync_rcu()
* Double NR_SEND to 128
v4: https://lore.kernel.org/netdev/20250829010026.347440-1-kuniyu@google.com/
* Patch 2
* Use __bpf_setsockopt() instead of _bpf_setsockopt()
* Add getsockopt() for a cgroup with multiple bpf progs running
* Patch 3
* Only allow inet_create() to set flags
* Inherit flags from listener to child in sk_clone_lock()
* Support clearing flags
* Patch 5
* Only use inet_create() hook
* Test bpf_getsockopt()
* Add serial_ prefix
* Reduce sleep() and the amount of sent data
v3: https://lore.kernel.org/netdev/20250826183940.3310118-1-kuniyu@google.com/
* Drop patches for accept() hook
* Patch 1
* Merge if blocks
* Patch2
* Drop bpf_func_proto for accept()
* Patch 3
* Allow flagging without sk->sk_memcg
* Inherit SK_BPF_MEMCG_SOCK_ISOLATED in __inet_accept()
v2: https://lore.kernel.org/bpf/20250825204158.2414402-1-kuniyu@google.com/
* Patch 2
* Define BPF_CGROUP_RUN_PROG_INET_SOCK_ACCEPT() when CONFIG_CGROUP_BPF=n
* Patch 5
* Make 2 new bpf_func_proto static
* Patch 6
* s/mem_cgroup_sk_set_flag/mem_cgroup_sk_set_flags/ when CONFIG_MEMCG=n
* Use finer CONFIG_CGROUP_BPF instead of CONFIG_BPF_SYSCALL for ifdef
v1: https://lore.kernel.org/netdev/20250822221846.744252-1-kuniyu@google.com/
Kuniyuki Iwashima (6):
tcp: Save lock_sock() for memcg in inet_csk_accept().
net-memcg: Allow decoupling memcg from global protocol memory
accounting.
net-memcg: Introduce net.core.memcg_exclusive sysctl.
bpf: Support bpf_setsockopt() for BPF_CGROUP_INET_SOCK_CREATE.
bpf: Introduce SK_BPF_MEMCG_FLAGS and SK_BPF_MEMCG_EXCLUSIVE.
selftest: bpf: Add test for SK_MEMCG_EXCLUSIVE.
Documentation/admin-guide/sysctl/net.rst | 9 +
include/net/netns/core.h | 3 +
include/net/proto_memory.h | 15 +-
include/net/sock.h | 47 +++-
include/net/tcp.h | 10 +-
include/uapi/linux/bpf.h | 6 +
mm/memcontrol.c | 15 +-
net/core/filter.c | 82 ++++++
net/core/sock.c | 65 +++--
net/core/sysctl_net_core.c | 11 +
net/ipv4/af_inet.c | 37 +++
net/ipv4/inet_connection_sock.c | 26 +-
net/ipv4/tcp.c | 3 +-
net/ipv4/tcp_output.c | 10 +-
net/mptcp/protocol.c | 3 +-
net/tls/tls_device.c | 4 +-
tools/include/uapi/linux/bpf.h | 6 +
.../selftests/bpf/prog_tests/sk_memcg.c | 261 ++++++++++++++++++
tools/testing/selftests/bpf/progs/sk_memcg.c | 146 ++++++++++
19 files changed, 701 insertions(+), 58 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_memcg.c
create mode 100644 tools/testing/selftests/bpf/progs/sk_memcg.c
--
2.51.0.384.g4c02a37b29-goog
Powered by blists - more mailing lists