lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82c0a442-c7d7-d0f1-54de-7a5e7e6a31d5@bytedance.com>
Date:   Fri, 22 Sep 2023 16:36:04 +0800
From:   Abel Wu <wuyun.abel@...edance.com>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Kuniyuki Iwashima <kuniyu@...zon.com>,
        Breno Leitao <leitao@...ian.org>,
        Alexander Mikhalitsyn <alexander@...alicyn.com>,
        David Howells <dhowells@...hat.com>,
        Jason Xing <kernelxing@...cent.com>,
        Xin Long <lucien.xin@...il.com>,
        KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujtsu.com>,
        "open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising
 memory

On 9/22/23 3:01 AM, Shakeel Butt wrote:
> On Wed, Sep 20, 2023 at 09:25:41PM +0800, Abel Wu wrote:
>> Before sockets became aware of net-memcg's memory pressure since
>> commit e1aab161e013 ("socket: initial cgroup code."), the memory
>> usage would be granted to raise if below average even when under
>> protocol's pressure. This provides fairness among the sockets of
>> same protocol.
>>
>> That commit changes this because the heuristic will also be
>> effective when only memcg is under pressure which makes no sense.
>> Fix this by skipping this heuristic when under memcg pressure.
>>
>> Fixes: e1aab161e013 ("socket: initial cgroup code.")
>> Signed-off-by: Abel Wu <wuyun.abel@...edance.com>
>> ---
>>   net/core/sock.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 379eb8b65562..ef5cf6250f17 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -3093,8 +3093,16 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>>   	if (sk_has_memory_pressure(sk)) {
>>   		u64 alloc;
>>   
>> -		if (!sk_under_memory_pressure(sk))
>> +		if (memcg && mem_cgroup_under_socket_pressure(memcg))
>> +			goto suppress_allocation;
>> +
>> +		if (!sk_under_global_memory_pressure(sk))
>>   			return 1;
> 
> I am onboard with replacing sk_under_memory_pressure() with
> sk_under_global_memory_pressure(). However suppressing on memcg pressure
> is a behavior change from status quo and need more thought and testing.
> 
> I think there are three options for this hunk:
> 
> 1. proposed patch
> 2. Consider memcg pressure only for !in_softirq().
> 3. Don't consider memcg pressure at all.
> 
> All three options are behavior change from the status quo but with
> different risk levels. (1) may reintroduce the regression fixed by
> 720ca52bcef22 ("net-memcg: avoid stalls when under memory pressure").

Just for the record, it is same for the current upstream implementation
if the socket reaches average usage. Taking option 2 will fix this too.

> (2) is more inlined with 720ca52bcef22. (3) has the risk to making memcg
> limits ineffective.
> 
> IMHO we should go with (2) as there is already a precedence in
> 720ca52bcef22.

Yes, I agree. Actually applying option(2) would make this patch quite
similar to the previous version[a], except the below part:

  	/* Under limit. */
  	if (allocated <= sk_prot_mem_limits(sk, 0)) {
  		sk_leave_memory_pressure(sk);
-		return 1;
+		if (!under_memcg_pressure)
+			return 1;
  	}

My original thought is to inherit the behavior of tcpmem pressure.
There are also 3 levels of memcg pressure named low/medium/critical,
but considering that the 'low' level is too much conservative for
socket allocation, I made the following match:

	PROTOCOL	MEMCG		ACTION
	-----------------------------------------------------
	low		<medium		allow allocation
	pressure	medium		be more conservative
	high		critical	throttle

which also seems align with the design[b] of memcg pressure. Anyway
I will take option (2) and post v2.

Thanks & Best,
	Abel

[a] 
https://lore.kernel.org/lkml/20230901062141.51972-4-wuyun.abel@bytedance.com/
[b] 
https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#memory-pressure

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ