linux-kernel - Re: Unbounded growth of slab caches and how to shrink them

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 29 Jun 2016 17:17:51 +0300
From:	Nikolay Borisov <kernel@...p.com>
To:	Christoph Lameter <cl@...ux.com>
Cc:	"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Subject: Re: Unbounded growth of slab caches and how to shrink them



On 06/29/2016 05:00 PM, Christoph Lameter wrote:
> On Wed, 29 Jun 2016, Nikolay Borisov wrote:
> 
>> I've observed a rather strange unbounded growth of the kmalloc-192
>> slab cache:
>>
>> OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>> 711124869 411527215   3%    0.19K 16934908       42 135479264K kmalloc-192
>>
>> Essentially the kmalloc is around 130 GB , yet only 3 percent of this are
>> being used. In this case I'd like to essentially shrink the overall size
>> of the cache. How is it possible to achieve that? I tried echoing '1'
>> to /sys/kernel/slab/kmalloc-192/shrink but nothing changed.
> 
> Ok this probably means that most slabs have just a few or one objects?
> Some workloads can result in situations like that. Can you enable
> debugging and get a list of functions where these objects are allocated?

Right, so what debugging concretely do you have in mind. So far what I
did was reboot the machine with SLUB merging disabled, since there are
quite a lot of slabs being merged into that particular one:

:t-0000192   <- cred_jar pid_3 inet_peer_cache request_sock_TCPv6
kmalloc-192 file_lock_cache bio-0 ip_dst_cache key_jar

I'm quite sure it's likely it's one of the either networking or bio-0
slab cache, since the others seems generally not very used.

> 
>> This is on 3.12 which is rather old kernel, but still I believe it is
>> entirely possible for someone to find a way to flood a machine with
>> network requests which would cause a lot of objects to be allocate,
>> resulting in a particular slab cache growing, then later when the request
>> flood stops the cache would be almost empty, yet the memory won't be usable
>> for anything other than satisfying memory allocation from this cache.
> 
> True. Long known problem and all my attempts to facilitate a solution here
> did not go anywhere. The essential solution would require objects being
> movable or removable from the sparsely allocated page frames. And this
> goes way beyond my subsystem.
> 
> If you can figure out which subsystem allocates or frees these objects
> (through the call traces) then we may find a knob in the subsystem to
> clear those out once in a while.
> 
>