lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71ac08d3-9f36-e0de-870e-3e252abcb66a@bytedance.com>
Date: Fri, 22 Sep 2023 18:10:06 +0800
From: Abel Wu <wuyun.abel@...edance.com>
To: Shakeel Butt <shakeelb@...gle.com>
Cc: "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Kuniyuki Iwashima <kuniyu@...zon.com>,
 Breno Leitao <leitao@...ian.org>,
 Alexander Mikhalitsyn <alexander@...alicyn.com>,
 David Howells <dhowells@...hat.com>, Jason Xing <kernelxing@...cent.com>,
 Xin Long <lucien.xin@...il.com>,
 KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujtsu.com>,
 "open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising
 memory

On 9/22/23 4:36 PM, Abel Wu wrote:
> On 9/22/23 3:01 AM, Shakeel Butt wrote:
>> On Wed, Sep 20, 2023 at 09:25:41PM +0800, Abel Wu wrote:
>>> Before sockets became aware of net-memcg's memory pressure since
>>> commit e1aab161e013 ("socket: initial cgroup code."), the memory
>>> usage would be granted to raise if below average even when under
>>> protocol's pressure. This provides fairness among the sockets of
>>> same protocol.
>>>
>>> That commit changes this because the heuristic will also be
>>> effective when only memcg is under pressure which makes no sense.
>>> Fix this by skipping this heuristic when under memcg pressure.
>>>
>>> Fixes: e1aab161e013 ("socket: initial cgroup code.")
>>> Signed-off-by: Abel Wu <wuyun.abel@...edance.com>
>>> ---
>>>   net/core/sock.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>> index 379eb8b65562..ef5cf6250f17 100644
>>> --- a/net/core/sock.c
>>> +++ b/net/core/sock.c
>>> @@ -3093,8 +3093,16 @@ int __sk_mem_raise_allocated(struct sock *sk, 
>>> int size, int amt, int kind)
>>>       if (sk_has_memory_pressure(sk)) {
>>>           u64 alloc;
>>> -        if (!sk_under_memory_pressure(sk))
>>> +        if (memcg && mem_cgroup_under_socket_pressure(memcg))
>>> +            goto suppress_allocation;
>>> +
>>> +        if (!sk_under_global_memory_pressure(sk))
>>>               return 1;
>>
>> I am onboard with replacing sk_under_memory_pressure() with
>> sk_under_global_memory_pressure(). However suppressing on memcg pressure
>> is a behavior change from status quo and need more thought and testing.
>>
>> I think there are three options for this hunk:
>>
>> 1. proposed patch
>> 2. Consider memcg pressure only for !in_softirq().
>> 3. Don't consider memcg pressure at all.
>>
>> All three options are behavior change from the status quo but with
>> different risk levels. (1) may reintroduce the regression fixed by
>> 720ca52bcef22 ("net-memcg: avoid stalls when under memory pressure").
> 
> Just for the record, it is same for the current upstream implementation
> if the socket reaches average usage. Taking option 2 will fix this too.
> 
>> (2) is more inlined with 720ca52bcef22. (3) has the risk to making memcg
>> limits ineffective.
>>
>> IMHO we should go with (2) as there is already a precedence in
>> 720ca52bcef22.
> 
> Yes, I agree. Actually applying option(2) would make this patch quite
> similar to the previous version[a], except the below part:
> 
>       /* Under limit. */
>       if (allocated <= sk_prot_mem_limits(sk, 0)) {
>           sk_leave_memory_pressure(sk);
> -        return 1;
> +        if (!under_memcg_pressure)
> +            return 1;
>       }

After a second thought, it is still vague to me about the position
the memcg pressure should be in socket memory allocation. It lacks
convincing design. I think the above hunk helps, but not much.

I wonder if we should take option (3) first. Thoughts?

Thanks,
	Abel

> 
> My original thought is to inherit the behavior of tcpmem pressure.
> There are also 3 levels of memcg pressure named low/medium/critical,
> but considering that the 'low' level is too much conservative for
> socket allocation, I made the following match:
> 
>      PROTOCOL    MEMCG        ACTION
>      -----------------------------------------------------
>      low        <medium        allow allocation
>      pressure    medium        be more conservative
>      high        critical    throttle
> 
> which also seems align with the design[b] of memcg pressure. Anyway
> I will take option (2) and post v2.
> 
> Thanks & Best,
>      Abel
> 
> [a] 
> https://lore.kernel.org/lkml/20230901062141.51972-4-wuyun.abel@bytedance.com/
> [b] 
> https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#memory-pressure

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ