lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41ed390c-884e-4158-9fe8-ce3af53cf77b@kernel.org>
Date: Fri, 15 Aug 2025 19:30:21 +0200
From: Matthieu Baerts <matttbe@...nel.org>
To: Kuniyuki Iwashima <kuniyu@...gle.com>,
 Shakeel Butt <shakeel.butt@...ux.dev>
Cc: "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Neal Cardwell <ncardwell@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
 Willem de Bruijn <willemb@...gle.com>, Mat Martineau <martineau@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
 Roman Gushchin <roman.gushchin@...ux.dev>,
 Andrew Morton <akpm@...ux-foundation.org>, Michal Koutný
 <mkoutny@...e.com>, Tejun Heo <tj@...nel.org>,
 Simon Horman <horms@...nel.org>, Geliang Tang <geliang@...nel.org>,
 Muchun Song <muchun.song@...ux.dev>, Mina Almasry <almasrymina@...gle.com>,
 Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org,
 mptcp@...ts.linux.dev, cgroups@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when
 CONFIG_SOCK_CGROUP_DATA=n.

Hi Kuniyuki,

On 15/08/2025 19:24, Kuniyuki Iwashima wrote:
> On Thu, Aug 14, 2025 at 7:31 PM Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
>>
>> On Thu, Aug 14, 2025 at 6:06 PM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
>>>
>>> On Thu, Aug 14, 2025 at 05:05:56PM -0700, Kuniyuki Iwashima wrote:
>>>> On Thu, Aug 14, 2025 at 4:46 PM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
>>>>>
>>>>> On Thu, Aug 14, 2025 at 04:27:31PM -0700, Kuniyuki Iwashima wrote:
>>>>>> On Thu, Aug 14, 2025 at 2:44 PM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
>>>>>>>
>>>>>>> On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
>>>>>>>> When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
>>>>>>>> sk->sk_memcg based on the current task.
>>>>>>>>
>>>>>>>> MPTCP subflow socket creation is triggered from userspace or
>>>>>>>> an in-kernel worker.
>>>>>>>>
>>>>>>>> In the latter case, sk->sk_memcg is not what we want.  So, we fix
>>>>>>>> it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
>>>>>>>>
>>>>>>>> Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
>>>>>>>> under #ifdef CONFIG_SOCK_CGROUP_DATA.
>>>>>>>>
>>>>>>>> The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
>>>>>>>> CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
>>>>>>>> correctly.
>>>>>>>>
>>>>>>>> Let's wrap sock_create_kern() for subflow with set_active_memcg()
>>>>>>>> using the parent sk->sk_memcg.
>>>>>>>>
>>>>>>>> Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
>>>>>>>> Suggested-by: Michal Koutný <mkoutny@...e.com>
>>>>>>>> Signed-off-by: Kuniyuki Iwashima <kuniyu@...gle.com>
>>>>>>>> ---
>>>>>>>>  mm/memcontrol.c     |  5 ++++-
>>>>>>>>  net/mptcp/subflow.c | 11 +++--------
>>>>>>>>  2 files changed, 7 insertions(+), 9 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>>>>>> index 8dd7fbed5a94..450862e7fd7a 100644
>>>>>>>> --- a/mm/memcontrol.c
>>>>>>>> +++ b/mm/memcontrol.c
>>>>>>>> @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
>>>>>>>>       if (!in_task())
>>>>>>>>               return;
>>>>>>>>
>>>>>>>> +     memcg = current->active_memcg;
>>>>>>>> +
>>>>>>>
>>>>>>> Use active_memcg() instead of current->active_memcg and do before the
>>>>>>> !in_task() check.
>>>>>>
>>>>>> Why not reuse the !in_task() check here ?
>>>>>> We never use int_active_memcg for socket and also
>>>>>> know int_active_memcg is always NULL here.
>>>>>>
>>>>>
>>>>> If we are making mem_cgroup_sk_alloc() work with set_active_memcg()
>>>>> infra then make it work for both in_task() and !in_task() contexts.
>>>>
>>>> Considering e876ecc67db80, then I think we should add
>>>> set_active_memcg_in_task() and active_memcg_in_task().
>>>>
>>>> or at least we need WARN_ON() if we want to place active_memcg()
>>>> before the in_task() check, but this looks ugly.
>>>>
>>>>         memcg = active_memcg();
>>>>         if (!in_task() && !memcg)
>>>>                 return;
>>>>         DEBUG_NET_WARN_ON_ONCE(!in_task() && memcg))
>>>
>>> You don't have to use the code as is. It is just an example. Basically I
>>> am asking if in future someone does the following:
>>>
>>>         // in !in_task() context
>>>         old_memcg = set_active_memcg(new_memcg);
>>>         sk = sk_alloc();
>>>         set_active_memcg(old_memcg);
>>>
>>> mem_cgroup_sk_alloc() should work and associate the sk with the
>>> new_memcg.
>>>
>>> You can manually inline active_memcg() function to avoid multiple
>>> in_task() checks like below:
>>
>> Will do so, thanks!
> 
> I noticed this won't work with the bpf approach as the
> hook is only called for !sk_kern socket (MPTCP subflow
> is sk_kern == 1) and we need to manually copy the
> memcg anyway.. so I'll use the original patch 1 in the
> next version.

Thank you for having checked that!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ