[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7551924f-a9b6-4bb8-bfe9-e3efcf0da438@bytedance.com>
Date: Tue, 3 Oct 2023 20:49:08 +0800
From: Abel Wu <wuyun.abel@...edance.com>
To: Shakeel Butt <shakeelb@...gle.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>,
Breno Leitao <leitao@...ian.org>,
Alexander Mikhalitsyn <alexander@...alicyn.com>,
David Howells <dhowells@...hat.com>,
Jason Xing <kernelxing@...cent.com>,
Xin Long <lucien.xin@...il.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujtsu.com>,
"open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising
memory
On 9/24/23 3:28 PM, Shakeel Butt wrote:
> On Fri, Sep 22, 2023 at 06:10:06PM +0800, Abel Wu wrote:
> [...]
>>
>> After a second thought, it is still vague to me about the position
>> the memcg pressure should be in socket memory allocation. It lacks
>> convincing design. I think the above hunk helps, but not much.
>>
>> I wonder if we should take option (3) first. Thoughts?
>>
>
> Let's take a step further. Let's decouple the memcg accounting and
> global skmem accounting. __sk_mem_raise_allocated is already very hard
> to reason. There are couple of heuristics in it which may or may not
> apply to both accounting infrastructures.
>
> Let's explicitly document what heurisitics allows to forcefully succeed
> the allocations i.e. irrespective of pressure or over limit for both
> accounting infras. I think decoupling them would make the flow of the
> code very clear.
I can't agree more.
>
> There are three heuristics:
I found all of them were first introduced in linux-2.4.0-test7pre1 for
TCP only, and then migrated to socket core in linux-2.6.8-rc1 without
functional change.
>
> 1. minimum buffer size even under pressure.
This is required by RFC 7323 (TCP Extensions for High Performance) to
make features like Window Scale option work as expected, and should be
succeeded under global pressure by tcp_{r,w}mem's definition. And IMHO
for same reason, it should also be succeeded under memcg pressure, or
else workloads might suffer performance drop due to bottleneck on
network.
The allocation must not be succeeded either exceed global or memcg's
hard limit, or else a DoS attack can be taken place by spawning lots
of sockets that are under minimum buffer size.
>
> 2. allow allocation for a socket whose usage is below average of the
> system.
Since 'average' is within the scope of global accounting, this one
only makes sense under global memory pressure. Actually this exists
before cgroup was born, hence doesn't take memcg into consideration.
While OTOH the intention of throttling under memcg pressure is to
relief the memcg from heavy reclaim pressure, this heuristic does no
help. And there also seems to be no reason to succeed the allocation
when global or memcg's hard limit is exceeded.
>
> 3. socket is over its sndbuf.
TBH I don't get its point..
>
> Let's discuss which heuristic applies to which accounting infra and
> under which state (under pressure or over limit).
I will follow your suggestion to post a patch to explicitly document
the behaviors once things are cleared.
Thanks,
Abel
Powered by blists - more mailing lists