lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5773D88F.1030005@kyup.com>
Date:	Wed, 29 Jun 2016 17:17:51 +0300
From:	Nikolay Borisov <kernel@...p.com>
To:	Christoph Lameter <cl@...ux.com>
Cc:	"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Subject: Re: Unbounded growth of slab caches and how to shrink them



On 06/29/2016 05:00 PM, Christoph Lameter wrote:
> On Wed, 29 Jun 2016, Nikolay Borisov wrote:
> 
>> I've observed a rather strange unbounded growth of the kmalloc-192
>> slab cache:
>>
>> OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>> 711124869 411527215   3%    0.19K 16934908       42 135479264K kmalloc-192
>>
>> Essentially the kmalloc is around 130 GB , yet only 3 percent of this are
>> being used. In this case I'd like to essentially shrink the overall size
>> of the cache. How is it possible to achieve that? I tried echoing '1'
>> to /sys/kernel/slab/kmalloc-192/shrink but nothing changed.
> 
> Ok this probably means that most slabs have just a few or one objects?
> Some workloads can result in situations like that. Can you enable
> debugging and get a list of functions where these objects are allocated?

Right, so what debugging concretely do you have in mind. So far what I
did was reboot the machine with SLUB merging disabled, since there are
quite a lot of slabs being merged into that particular one:

:t-0000192   <- cred_jar pid_3 inet_peer_cache request_sock_TCPv6
kmalloc-192 file_lock_cache bio-0 ip_dst_cache key_jar

I'm quite sure it's likely it's one of the either networking or bio-0
slab cache, since the others seems generally not very used.

> 
>> This is on 3.12 which is rather old kernel, but still I believe it is
>> entirely possible for someone to find a way to flood a machine with
>> network requests which would cause a lot of objects to be allocate,
>> resulting in a particular slab cache growing, then later when the request
>> flood stops the cache would be almost empty, yet the memory won't be usable
>> for anything other than satisfying memory allocation from this cache.
> 
> True. Long known problem and all my attempts to facilitate a solution here
> did not go anywhere. The essential solution would require objects being
> movable or removable from the sparsely allocated page frames. And this
> goes way beyond my subsystem.
> 
> If you can figure out which subsystem allocates or frees these objects
> (through the call traces) then we may find a knob in the subsystem to
> clear those out once in a while.
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ