lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241007163507.647617031@linutronix.de>
Date: Mon,  7 Oct 2024 18:49:51 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Zhen Lei <thunder.leizhen@...wei.com>,
 Waiman Long <longman@...hat.com>
Subject: [patch 00/25] debugobjects: Rework object handling

Zhen reported that the global lock in debug objects is problematic. There
are several issues:

    1) Parallel pool refill attempts result in long wait times

    2) The operations under the lock move batches of objects by moving
       them one by one from one list to another

       This takes quite some time and is also a cache line dirtying
       festival.

For further context see:

  https://lore.kernel.org/all/20240904133944.2124-1-thunder.leizhen@huawei.com
  https://lore.kernel.org/all/20240904134152.2141-1-thunder.leizhen@huawei.com

Address this with the following changes:

    1) Avoid parallel pool refills unless the fill level is critical

    2) Release and reacquire the pool look between batches in the worker
       thread.

    3) Convert the pool handling to a stack of batches which can be moved
       with trivial hlist operations which are fast and do not touch a
       gazillion of cache lines

While working on this, I noticed that the kmem_cache allocation/free rate
is rather high. This is addressed by:

    1) Doubling the per CPU pool size

    2) Agressively refilling the per CPU pool from the free list

    3) Throttling the kmem_cache_free() operations by monitoring the object
       usage with a exponentially wheighed moving average

The resulting reduction for a full kernel compile:

      kmem_cache_alloc()	kmem_cache_free()
Base: 380k			330k
#1:   295k			245k      
#2:   225k			245k
#3:   170k			117k

Especially the reduction of allocations makes a difference as that happens
in the hot path.

There are further possibilities to enhance this:

    1) Move the lock into the new global pool data structure
    2) Provide a per-node "global" pool which is brought up
       before the first CPU of a node is brought up

That's left as an exercise for the reader. :)

The series has incorporated the latest changes from Zhen:

  https://lore.kernel.org/all/20240911083521.2257-1-thunder.leizhen@huawei.com

to avoid conflicts.

It is based on v6.12-rc1 and also available from git:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git core/debugobjects

Thanks,

	tglx
---
 include/linux/debugobjects.h |   12 
 lib/debugobjects.c           |  866 ++++++++++++++++++++++++-------------------
 2 files changed, 503 insertions(+), 375 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ