[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a21841c4-6d40-42cb-88cb-eee5f80ccf11@suse.cz>
Date: Thu, 29 Aug 2024 19:39:32 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>, David Rientjes <rientjes@...gle.com>,
Hyeonggon Yoo <42.hyeyoo@...il.com>, Eric Dumazet <edumazet@...gle.com>,
"David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Meta kernel team <kernel-team@...a.com>,
cgroups@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH v2] memcg: add charging of already allocated slab objects
On 8/29/24 18:10, Shakeel Butt wrote:
> On Thu, Aug 29, 2024 at 11:42:10AM GMT, Vlastimil Babka wrote:
>> On 8/28/24 01:52, Shakeel Butt wrote:
>> > At the moment, the slab objects are charged to the memcg at the
>> > allocation time. However there are cases where slab objects are
>> > allocated at the time where the right target memcg to charge it to is
>> > not known. One such case is the network sockets for the incoming
>> > connection which are allocated in the softirq context.
>> >
>> > Couple hundred thousand connections are very normal on large loaded
>> > server and almost all of those sockets underlying those connections get
>> > allocated in the softirq context and thus not charged to any memcg.
>> > However later at the accept() time we know the right target memcg to
>> > charge. Let's add new API to charge already allocated objects, so we can
>> > have better accounting of the memory usage.
>> >
>> > To measure the performance impact of this change, tcp_crr is used from
>> > the neper [1] performance suite. Basically it is a network ping pong
>> > test with new connection for each ping pong.
>> >
>> > The server and the client are run inside 3 level of cgroup hierarchy
>> > using the following commands:
>> >
>> > Server:
>> > $ tcp_crr -6
>> >
>> > Client:
>> > $ tcp_crr -6 -c -H ${server_ip}
>> >
>> > If the client and server run on different machines with 50 GBPS NIC,
>> > there is no visible impact of the change.
>> >
>> > For the same machine experiment with v6.11-rc5 as base.
>> >
>> > base (throughput) with-patch
>> > tcp_crr 14545 (+- 80) 14463 (+- 56)
>> >
>> > It seems like the performance impact is within the noise.
>> >
>> > Link: https://github.com/google/neper [1]
>> > Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
>> > ---
>> > v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
>> > Changes since v1:
>> > - Correctly handle large allocations which bypass slab
>> > - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
>> >
>> > RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
>> > Changes since the RFC:
>> > - Added check for already charged slab objects.
>> > - Added performance results from neper's tcp_crr
>> >
>> > include/linux/slab.h | 1 +
>> > mm/slub.c | 51 +++++++++++++++++++++++++++++++++
>> > net/ipv4/inet_connection_sock.c | 5 ++--
>> > 3 files changed, 55 insertions(+), 2 deletions(-)
>>
>> I can take the v3 in slab tree, if net people ack?
>
> Thanks.
>
>>
>> BTW, will this be also useful for Linus's idea of charging struct files only
>> after they exist? But IIRC there was supposed to be also a part where we
>> have a way to quickly determine if we're not over limit (while allowing some
>> overcharge to make it quicker).
>>
>
> Do you have link to those discussions or pointers to the code? From what
> you have described, I think this should work. We have the relevant gfp
> flags to control the charging behavior (with some caveats).
I think this was the last part of the discussion, and in the cover letter of
that there are links to the older threads for more context
https://lore.kernel.org/all/CAHk-%3DwhgFtbTxCAg2CWQtDj7n6CEyzvdV1wcCj2qpMfpw0%3Dm1A@mail.gmail.com/
>> Because right now this just overcharges unconditionally, but that's
>> understandable when the irq context creating the socket can't know the memcg
>> upfront. In the open() situation this is different.
>>
>
> For networking we deliberately overcharges in the irq context (if
> needed) and the course correct in the task context. However networking
> stack is very robust due to mechanisms like backoff, retransmit to handle
> situations like packet drops, allocation failures, congestion etc. Other
> subsystem are not that robust against ENOMEM. Once I have more detail I
> can follow up on the struct files case.
Ack. Agreed with Roman that it would be a separate followup change.
> thanks,
> Shakeel
>
>
Powered by blists - more mailing lists