linux-kernel - Re: [PATCH v2] memcg: add charging of already allocated slab objects

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a21841c4-6d40-42cb-88cb-eee5f80ccf11@suse.cz>
Date: Thu, 29 Aug 2024 19:39:32 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
 Roman Gushchin <roman.gushchin@...ux.dev>,
 Muchun Song <muchun.song@...ux.dev>, David Rientjes <rientjes@...gle.com>,
 Hyeonggon Yoo <42.hyeyoo@...il.com>, Eric Dumazet <edumazet@...gle.com>,
 "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, Meta kernel team <kernel-team@...a.com>,
 cgroups@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH v2] memcg: add charging of already allocated slab objects

On 8/29/24 18:10, Shakeel Butt wrote:
> On Thu, Aug 29, 2024 at 11:42:10AM GMT, Vlastimil Babka wrote:
>> On 8/28/24 01:52, Shakeel Butt wrote:
>> > At the moment, the slab objects are charged to the memcg at the
>> > allocation time. However there are cases where slab objects are
>> > allocated at the time where the right target memcg to charge it to is
>> > not known. One such case is the network sockets for the incoming
>> > connection which are allocated in the softirq context.
>> > 
>> > Couple hundred thousand connections are very normal on large loaded
>> > server and almost all of those sockets underlying those connections get
>> > allocated in the softirq context and thus not charged to any memcg.
>> > However later at the accept() time we know the right target memcg to
>> > charge. Let's add new API to charge already allocated objects, so we can
>> > have better accounting of the memory usage.
>> > 
>> > To measure the performance impact of this change, tcp_crr is used from
>> > the neper [1] performance suite. Basically it is a network ping pong
>> > test with new connection for each ping pong.
>> > 
>> > The server and the client are run inside 3 level of cgroup hierarchy
>> > using the following commands:
>> > 
>> > Server:
>> >  $ tcp_crr -6
>> > 
>> > Client:
>> >  $ tcp_crr -6 -c -H ${server_ip}
>> > 
>> > If the client and server run on different machines with 50 GBPS NIC,
>> > there is no visible impact of the change.
>> > 
>> > For the same machine experiment with v6.11-rc5 as base.
>> > 
>> >           base (throughput)     with-patch
>> > tcp_crr   14545 (+- 80)         14463 (+- 56)
>> > 
>> > It seems like the performance impact is within the noise.
>> > 
>> > Link: https://github.com/google/neper [1]
>> > Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
>> > ---
>> > v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
>> > Changes since v1:
>> > - Correctly handle large allocations which bypass slab
>> > - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
>> > 
>> > RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
>> > Changes since the RFC:
>> > - Added check for already charged slab objects.
>> > - Added performance results from neper's tcp_crr
>> > 
>> >  include/linux/slab.h            |  1 +
>> >  mm/slub.c                       | 51 +++++++++++++++++++++++++++++++++
>> >  net/ipv4/inet_connection_sock.c |  5 ++--
>> >  3 files changed, 55 insertions(+), 2 deletions(-)
>> 
>> I can take the v3 in slab tree, if net people ack?
> 
> Thanks.
> 
>> 
>> BTW, will this be also useful for Linus's idea of charging struct files only
>> after they exist? But IIRC there was supposed to be also a part where we
>> have a way to quickly determine if we're not over limit (while allowing some
>> overcharge to make it quicker).
>>
> 
> Do you have link to those discussions or pointers to the code? From what
> you have described, I think this should work. We have the relevant gfp
> flags to control the charging behavior (with some caveats).

I think this was the last part of the discussion, and in the cover letter of
that there are links to the older threads for more context

https://lore.kernel.org/all/CAHk-%3DwhgFtbTxCAg2CWQtDj7n6CEyzvdV1wcCj2qpMfpw0%3Dm1A@mail.gmail.com/

>> Because right now this just overcharges unconditionally, but that's
>> understandable when the irq context creating the socket can't know the memcg
>> upfront. In the open() situation this is different.
>> 
> 
> For networking we deliberately overcharges in the irq context (if
> needed) and the course correct in the task context. However networking
> stack is very robust due to mechanisms like backoff, retransmit to handle
> situations like packet drops, allocation failures, congestion etc. Other
> subsystem are not that robust against ENOMEM. Once I have more detail I
> can follow up on the struct files case.

Ack. Agreed with Roman that it would be a separate followup change.

> thanks,
> Shakeel
> 
>