lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1480713290-4678-1-git-send-email-longman@redhat.com>
Date:   Fri,  2 Dec 2016 16:14:47 -0500
From:   Waiman Long <longman@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-kernel@...r.kernel.org,
        "Du Changbin" <changbin.du@...el.com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Jan Stancek <jstancek@...hat.com>,
        Waiman Long <longman@...hat.com>
Subject: [PATCH v2 0/3] debugobjects: Reduce global pool_lock contention

 v1->v2:
  - Move patch 2 in front of patch 1.
  - Fix merge conflict with linux-next.

This patchset aims to reduce contention of the global pool_lock
while improving performance at the same time. It is done to resolve
the following soft lockup problem with a debug kernel in some of the
large SMP systems:

 NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21]
 ...
 RIP: 0010:[<ffffffff817c216b>]  [<ffffffff817c216b>]
	_raw_spin_unlock_irqrestore+0x3b/0x60
 ...
 Call Trace:
  [<ffffffff813f40d1>] free_object+0x81/0xb0
  [<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220
  [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
  [<ffffffff81284996>] ? file_free_rcu+0x36/0x60
  [<ffffffff81251712>] kmem_cache_free+0xd2/0x380
  [<ffffffff81284960>] ? fput+0x90/0x90
  [<ffffffff81284996>] file_free_rcu+0x36/0x60
  [<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550
  [<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550
  [<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50
  [<ffffffff810c59d1>] kthread+0x101/0x120
  [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
  [<ffffffff817c2d32>] ret_from_fork+0x22/0x50

On a 8-socket IvyBridge-EX system (120 cores, 240 threads), the
elapsed time of a 4.9-rc7 kernel parallel build (make -j 240) was
reduced from 7m57s to 7m19s with a patched 4.9-rc7 kernel. There
was also about a 10X reduction in the number of debug objects being
allocated from or freed to the kmemcache during the kernel build.

Waiman Long (3):
  debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
  debugobjects: Scale thresholds with # of CPUs
  debugobjects: Reduce contention on the global pool_lock

 lib/debugobjects.c | 57 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 45 insertions(+), 12 deletions(-)

-- 
1.8.3.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ