[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAVpQUBO8TXjjtt++kF0R-qs-Utn-eY5o321NyAALEYTfq0xGw@mail.gmail.com>
Date: Fri, 15 Aug 2025 10:24:12 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Neal Cardwell <ncardwell@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemb@...gle.com>, Matthieu Baerts <matttbe@...nel.org>,
Mat Martineau <martineau@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>, Michal Koutný <mkoutny@...e.com>,
Tejun Heo <tj@...nel.org>, Simon Horman <horms@...nel.org>, Geliang Tang <geliang@...nel.org>,
Muchun Song <muchun.song@...ux.dev>, Mina Almasry <almasrymina@...gle.com>,
Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org, mptcp@...ts.linux.dev,
cgroups@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
On Thu, Aug 14, 2025 at 7:31 PM Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
>
> On Thu, Aug 14, 2025 at 6:06 PM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> >
> > On Thu, Aug 14, 2025 at 05:05:56PM -0700, Kuniyuki Iwashima wrote:
> > > On Thu, Aug 14, 2025 at 4:46 PM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> > > >
> > > > On Thu, Aug 14, 2025 at 04:27:31PM -0700, Kuniyuki Iwashima wrote:
> > > > > On Thu, Aug 14, 2025 at 2:44 PM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> > > > > >
> > > > > > On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
> > > > > > > When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
> > > > > > > sk->sk_memcg based on the current task.
> > > > > > >
> > > > > > > MPTCP subflow socket creation is triggered from userspace or
> > > > > > > an in-kernel worker.
> > > > > > >
> > > > > > > In the latter case, sk->sk_memcg is not what we want. So, we fix
> > > > > > > it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
> > > > > > >
> > > > > > > Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
> > > > > > > under #ifdef CONFIG_SOCK_CGROUP_DATA.
> > > > > > >
> > > > > > > The two configs are orthogonal. If CONFIG_MEMCG is enabled without
> > > > > > > CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
> > > > > > > correctly.
> > > > > > >
> > > > > > > Let's wrap sock_create_kern() for subflow with set_active_memcg()
> > > > > > > using the parent sk->sk_memcg.
> > > > > > >
> > > > > > > Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
> > > > > > > Suggested-by: Michal Koutný <mkoutny@...e.com>
> > > > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@...gle.com>
> > > > > > > ---
> > > > > > > mm/memcontrol.c | 5 ++++-
> > > > > > > net/mptcp/subflow.c | 11 +++--------
> > > > > > > 2 files changed, 7 insertions(+), 9 deletions(-)
> > > > > > >
> > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > > > index 8dd7fbed5a94..450862e7fd7a 100644
> > > > > > > --- a/mm/memcontrol.c
> > > > > > > +++ b/mm/memcontrol.c
> > > > > > > @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
> > > > > > > if (!in_task())
> > > > > > > return;
> > > > > > >
> > > > > > > + memcg = current->active_memcg;
> > > > > > > +
> > > > > >
> > > > > > Use active_memcg() instead of current->active_memcg and do before the
> > > > > > !in_task() check.
> > > > >
> > > > > Why not reuse the !in_task() check here ?
> > > > > We never use int_active_memcg for socket and also
> > > > > know int_active_memcg is always NULL here.
> > > > >
> > > >
> > > > If we are making mem_cgroup_sk_alloc() work with set_active_memcg()
> > > > infra then make it work for both in_task() and !in_task() contexts.
> > >
> > > Considering e876ecc67db80, then I think we should add
> > > set_active_memcg_in_task() and active_memcg_in_task().
> > >
> > > or at least we need WARN_ON() if we want to place active_memcg()
> > > before the in_task() check, but this looks ugly.
> > >
> > > memcg = active_memcg();
> > > if (!in_task() && !memcg)
> > > return;
> > > DEBUG_NET_WARN_ON_ONCE(!in_task() && memcg))
> >
> > You don't have to use the code as is. It is just an example. Basically I
> > am asking if in future someone does the following:
> >
> > // in !in_task() context
> > old_memcg = set_active_memcg(new_memcg);
> > sk = sk_alloc();
> > set_active_memcg(old_memcg);
> >
> > mem_cgroup_sk_alloc() should work and associate the sk with the
> > new_memcg.
> >
> > You can manually inline active_memcg() function to avoid multiple
> > in_task() checks like below:
>
> Will do so, thanks!
I noticed this won't work with the bpf approach as the
hook is only called for !sk_kern socket (MPTCP subflow
is sk_kern == 1) and we need to manually copy the
memcg anyway.. so I'll use the original patch 1 in the
next version.
> >
> > void mem_cgroup_sk_alloc(struct sock *sk)
> > {
> > struct mem_cgroup *memcg;
> >
> > if (!mem_cgroup_sockets_enabled)
> > return;
> >
> > if (!in_task()) {
> > memcg = this_cpu_read(int_active_memcg);
> >
> > /*
> > * Do not associate the sock with unrelated interrupted
> > * task's memcg.
> > */
> > if (!memcg)
> > return;
> > } else {
> > memcg = current->active_memcg;
> > }
> >
> > rcu_read_lock();
> > if (likely(!memcg))
> > memcg = mem_cgroup_from_task(current);
> > if (mem_cgroup_is_root(memcg))
> > goto out;
> > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg1_tcpmem_active(memcg))
> > goto out;
> > if (css_tryget(&memcg->css))
> > sk->sk_memcg = memcg;
> > out:
> > rcu_read_unlock();
> > }
> >
Powered by blists - more mailing lists