lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8c6a71aaaabc0a8ea4c36ce609cb097857b68a96.camel@redhat.com>
Date:   Thu, 19 Oct 2023 10:02:12 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     Abel Wu <wuyun.abel@...edance.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>
Cc:     netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v2 3/3] sock: Fix improper heuristic on raising
 memory

On Mon, 2023-10-16 at 21:28 +0800, Abel Wu wrote:
> Before sockets became aware of net-memcg's memory pressure since
> commit e1aab161e013 ("socket: initial cgroup code."), the memory
> usage would be granted to raise if below average even when under
> protocol's pressure. This provides fairness among the sockets of
> same protocol.
> 
> That commit changes this because the heuristic will also be
> effective when only memcg is under pressure which makes no sense.
> Fix this by reverting to the behavior before that commit.
> 
> After this fix, __sk_mem_raise_allocated() no longer considers
> memcg's pressure. As memcgs are isolated from each other w.r.t.
> memory accounting, consuming one's budget won't affect others.
> So except the places where buffer sizes are needed to be tuned,
> allow workloads to use the memory they are provisioned.
> 
> Fixes: e1aab161e013 ("socket: initial cgroup code.")
> Signed-off-by: Abel Wu <wuyun.abel@...edance.com>
> ---
> v2:
>   - Ignore memcg pressure when raising memory allocated.
> ---
>  net/core/sock.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 9f969e3c2ddf..1d28e3e87970 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3035,7 +3035,13 @@ EXPORT_SYMBOL(sk_wait_data);
>   *	@amt: pages to allocate
>   *	@kind: allocation type
>   *
> - *	Similar to __sk_mem_schedule(), but does not update sk_forward_alloc
> + *	Similar to __sk_mem_schedule(), but does not update sk_forward_alloc.
> + *
> + *	Unlike the globally shared limits among the sockets under same protocol,
> + *	consuming the budget of a memcg won't have direct effect on other ones.
> + *	So be optimistic about memcg's tolerance, and leave the callers to decide
> + *	whether or not to raise allocated through sk_under_memory_pressure() or
> + *	its variants.
>   */
>  int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>  {
> @@ -3093,7 +3099,11 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>  	if (sk_has_memory_pressure(sk)) {
>  		u64 alloc;
>  
> -		if (!sk_under_memory_pressure(sk))
> +		/* The following 'average' heuristic is within the
> +		 * scope of global accounting, so it only makes
> +		 * sense for global memory pressure.
> +		 */
> +		if (!sk_under_global_memory_pressure(sk))
>  			return 1;

Since the whole logic is fairly non trivial I'd like to explicitly note
(for my own future memory) that I think this is the correct approach. 

The memcg granted the current allocation via the
mem_cgroup_charge_skmem() call above, the heuristic to eventually
suppress the allocation should be outside the memcg scope.

LGTM, thanks!

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ