lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190520141450.7575-1-longman@redhat.com>
Date:   Mon, 20 May 2019 10:14:45 -0400
From:   Waiman Long <longman@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-kernel@...r.kernel.org,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        "Joel Fernandes (Google)" <joel@...lfernandes.org>,
        Qian Cai <cai@....us>, Zhong Jiang <zhongjiang@...wei.com>,
        Waiman Long <longman@...hat.com>
Subject: [PATCH 0/5] debugobjects: Reduce global pool_lock contention

Many large workloads require the kernel to acquire a lot of objects
and then free them when the work is done. When the debug objects code
is configured, this will cause a lot of debug objects to be allocated and
then free later on. For instance, after a kernel boot up and 3 parallel
kernel builds, the partial output of the debug objects stats file was:

  pool_free     :3082
  pool_min_free :498
  pool_used     :108488
  pool_max_used :170127
  on_free_list  :0
  objs_allocated:34954917
  objs_freed    :34844371

It can be seen that a lot of debug objects were allocated and freed
during those operations. All these debug object allocation and free
operations require grabbing the global pool_lock. On systems with many
CPUs, the contention on this single global lock can become one of the
bottlenecks.

This patchset tries to reduce the level of contention on this global
pool_lock by the following means:
 1) Add a percpu free object pool to serve as a cache so that object
    allocation and freeing can be done without acquiring pool_lock when
    is not empty or full.
 2) Batching up multiple operations in a single lock/unlock critical
    section to reduce the number of times the pool_lock is to be
    acquired.
 3) Make the actual freeing of the debug objects via the workqueue less
    aggressive to minimize the actual number of slab allocation and
    freeing calls.

In addition, this patchset also tries to move the printk() call out
of the raw db->lock critical section to reduce the lock hold time as
much as possible.

With or without these changes, the times to do a parallel kernel build
on a 2-socket 36-core 72-thread Haswell system were:

   Kernel         Elapsed time      System time
   ------         ------------      -----------
   Pre-patch        4m51.01s         83m11.53s
   Post-patch       4m47.45s         80m25.78s

The post-patch partial debug objects stats file for the same operations
was:

  pool_free     :5901
  pool_pcp_free :3742
  pool_min_free :1022
  pool_used     :104805
  pool_max_used :168081
  on_free_list  :0
  objs_allocated:5796864
  objs_freed    :5687182

Waiman Long (5):
  debugobjects: Add percpu free pools
  debugobjects: Percpu pool lookahead freeing/allocation
  debugobjects: Reduce number of pool_lock acquisitions in fill_pool()
  debugobjects: Less aggressive freeing of excess debug objects
  debugobjects: Move printk out of db lock critical sections

 lib/debugobjects.c | 308 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 252 insertions(+), 56 deletions(-)

-- 
2.18.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ