[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190520141450.7575-1-longman@redhat.com>
Date: Mon, 20 May 2019 10:14:45 -0400
From: Waiman Long <longman@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org,
Yang Shi <yang.shi@...ux.alibaba.com>,
"Joel Fernandes (Google)" <joel@...lfernandes.org>,
Qian Cai <cai@....us>, Zhong Jiang <zhongjiang@...wei.com>,
Waiman Long <longman@...hat.com>
Subject: [PATCH 0/5] debugobjects: Reduce global pool_lock contention
Many large workloads require the kernel to acquire a lot of objects
and then free them when the work is done. When the debug objects code
is configured, this will cause a lot of debug objects to be allocated and
then free later on. For instance, after a kernel boot up and 3 parallel
kernel builds, the partial output of the debug objects stats file was:
pool_free :3082
pool_min_free :498
pool_used :108488
pool_max_used :170127
on_free_list :0
objs_allocated:34954917
objs_freed :34844371
It can be seen that a lot of debug objects were allocated and freed
during those operations. All these debug object allocation and free
operations require grabbing the global pool_lock. On systems with many
CPUs, the contention on this single global lock can become one of the
bottlenecks.
This patchset tries to reduce the level of contention on this global
pool_lock by the following means:
1) Add a percpu free object pool to serve as a cache so that object
allocation and freeing can be done without acquiring pool_lock when
is not empty or full.
2) Batching up multiple operations in a single lock/unlock critical
section to reduce the number of times the pool_lock is to be
acquired.
3) Make the actual freeing of the debug objects via the workqueue less
aggressive to minimize the actual number of slab allocation and
freeing calls.
In addition, this patchset also tries to move the printk() call out
of the raw db->lock critical section to reduce the lock hold time as
much as possible.
With or without these changes, the times to do a parallel kernel build
on a 2-socket 36-core 72-thread Haswell system were:
Kernel Elapsed time System time
------ ------------ -----------
Pre-patch 4m51.01s 83m11.53s
Post-patch 4m47.45s 80m25.78s
The post-patch partial debug objects stats file for the same operations
was:
pool_free :5901
pool_pcp_free :3742
pool_min_free :1022
pool_used :104805
pool_max_used :168081
on_free_list :0
objs_allocated:5796864
objs_freed :5687182
Waiman Long (5):
debugobjects: Add percpu free pools
debugobjects: Percpu pool lookahead freeing/allocation
debugobjects: Reduce number of pool_lock acquisitions in fill_pool()
debugobjects: Less aggressive freeing of excess debug objects
debugobjects: Move printk out of db lock critical sections
lib/debugobjects.c | 308 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 252 insertions(+), 56 deletions(-)
--
2.18.1
Powered by blists - more mailing lists