lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 16 Apr 2021 10:14:39 +0530
From:   Bharata B Rao <bharata@...ux.ibm.com>
To:     Dave Chinner <david@...morbit.com>
Cc:     akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
        aneesh.kumar@...ux.ibm.com
Subject: Re: High kmalloc-32 slab cache consumption with 10k containers

On Wed, Apr 07, 2021 at 08:28:07AM +1000, Dave Chinner wrote:
> On Mon, Apr 05, 2021 at 11:18:48AM +0530, Bharata B Rao wrote:
> 
> > As an alternative approach, I have this below hack that does lazy
> > list_lru creation. The memcg-specific list is created and initialized
> > only when there is a request to add an element to that particular
> > list. Though I am not sure about the full impact of this change
> > on the owners of the lists and also the performance impact of this,
> > the overall savings look good.
> 
> Avoiding memory allocation in list_lru_add() was one of the main
> reasons for up-front static allocation of memcg lists. We cannot do
> memory allocation while callers are holding multiple spinlocks in
> core system algorithms (e.g. dentry_kill -> retain_dentry ->
> d_lru_add -> list_lru_add), let alone while holding an internal
> spinlock.
> 
> Putting a GFP_ATOMIC allocation inside 3-4 nested spinlocks in a
> path we know might have memory demand in the *hundreds of GB* range
> gets an NACK from me. It's a great idea, but it's just not a

I do understand that GFP_ATOMIC allocations are really not preferrable
but want to point out that the allocations in the range of hundreds of
GBs get reduced to tens of MBs when we do lazy list_lru head allocations
under GFP_ATOMIC.

As shown earlier, this is what I see in my experimental setup with
10k containers:

Number of kmalloc-32 allocations
		Before		During		After
W/o patch	178176		3442409472	388933632
W/  patch	190464		468992		468992

So 3442409472*32=102GB upfront list_lru creation-time GFP_KERNEL allocations
get reduced to 468992*32=14MB dynamic list_lru addtion-time GFP_ATOMIC
allocations.

This does really depend and vary on the type of the container and
the number of mounts it does, but I suspect we are looking
at GFP_ATOMIC allocations in the MB range. Also the number of
GFP_ATOMIC slab allocation requests matter I suppose.

There are other users of list_lru, but I was just looking at
dentry and inode list_lru usecase. It appears to me that for both
dentry and inode, we can tolerate the failure from list_lru_add
due to GFP_ATOMIC allocation failure. The failure to add dentry
or inode to the lru list means that they won't be retained in
the lru list, but would be freed immediately. Is this understanding
correct?

If so, would that likely impact the subsequent lookups adversely?
We failed to retain a dentry or inode in the lru list because
we failed to allocate memory, presumably under memory pressure.
Even in such a scenario, is failure to add dentry or inode to
lru list so bad and not tolerable? 

Regards,
Bharata.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ