linux-kernel - [PATCH v2 00/11] mm/slab: reduce lock contention in alloc path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1460436666-20462-1-git-send-email-iamjoonsoo.kim@lge.com>
Date:	Tue, 12 Apr 2016 13:50:55 +0900
From:	js1304@...il.com
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Christoph Lameter <cl@...ux.com>,
	Pekka Enberg <penberg@...nel.org>,
	David Rientjes <rientjes@...gle.com>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: [PATCH v2 00/11] mm/slab: reduce lock contention in alloc path

From: Joonsoo Kim <iamjoonsoo.kim@....com>

Major changes from v1
o hold node lock instead of slab_mutex in kmem_cache_shrink()
o fix suspend-to-ram issue reported by Nishanth
o use synchronize_sched() instead of kick_all_cpus_sync()

While processing concurrent allocation, SLAB could be contended
a lot because it did a lots of work with holding a lock. This
patchset try to reduce the number of critical section to reduce
lock contention. Major changes are lockless decision to allocate
more slab and lockless cpu cache refill from the newly allocated slab.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago. I make the output simpler.
The number shows cycle count during alloc/free respectively so less
is better.

* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048

* After
Kmalloc N*alloc N*free(32): Average=344/792
Kmalloc N*alloc N*free(64): Average=347/882
Kmalloc N*alloc N*free(128): Average=390/959
Kmalloc N*alloc N*free(256): Average=393/1067
Kmalloc N*alloc N*free(512): Average=683/1229
Kmalloc N*alloc N*free(1024): Average=1295/1325
Kmalloc N*alloc N*free(2048): Average=2513/1664
Kmalloc N*alloc N*free(4096): Average=4742/2172

It shows that performance improves greatly (roughly more than 50%)
for the object class whose size is more than 128 bytes.

Joonsoo Kim (11):
  mm/slab: fix the theoretical race by holding proper lock
  mm/slab: remove BAD_ALIEN_MAGIC again
  mm/slab: drain the free slab as much as possible
  mm/slab: factor out kmem_cache_node initialization code
  mm/slab: clean-up kmem_cache_node setup
  mm/slab: don't keep free slabs if free_objects exceeds free_limit
  mm/slab: racy access/modify the slab color
  mm/slab: make cache_grow() handle the page allocated on arbitrary node
  mm/slab: separate cache_grow() to two parts
  mm/slab: refill cpu cache through a new slab without holding a node
    lock
  mm/slab: lockless decision to grow cache

 mm/slab.c | 562 +++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 295 insertions(+), 267 deletions(-)

-- 
1.9.1