linux-kernel - [RFC 00/26] SLUB: use local_lock for kmem_cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210524233946.20352-1-vbabka@suse.cz>
Date:   Tue, 25 May 2021 01:39:20 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Christoph Lameter <cl@...ux.com>,
        David Rientjes <rientjes@...gle.com>,
        Pekka Enberg <penberg@...nel.org>,
        Joonsoo Kim <iamjoonsoo.kim@....com>
Cc:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Jann Horn <jannh@...gle.com>, Vlastimil Babka <vbabka@...e.cz>
Subject: [RFC 00/26] SLUB: use local_lock for kmem_cache_cpu protection and reduce disabling irqs

This series was inspired by Mel's pcplist local_lock rewrite, and also interest
to better understand SLUB's locking and the new primitives and RT variants and
implications. It should make SLUB more preemption-friendly, especially for RT,
hopefully without noticeable regressions, as the fast paths are not affected.

Series is based on 5.13-rc3 and also available as a git branch:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v1r9
It received some light stability testing and also basic performance screening
(thanks Mel) that didn't show major regressions. But I'm interested in e.g.
Jesper's tests whether the bulk allocator didn't regress.

Before the series, SLUB is lockless in both allocation and free fast paths, but
elsewhere, it's disabling irqs for considerable periods of time - especially in
allocation slowpath and the bulk allocation, where IRQs are re-enabled only
when a new page from the page allocator is needed, and the context allows
blocking. The irq disabled sections can then include deactivate_slab() which
walks a full freelist and frees the slab back to page allocator or
unfreeze_partials() going through a list of percpu partial slabs. The RT tree
currently has some patches mitigating these, but we can do much better in
mainline too.

Patches 1-2 are straightforward optimizations removing unnecessary usages of
object_map_lock.

Patch 3 is a cleanup of an obviously unnecessary local_irq_save/restore
instance.

Patch 4 simplifies the fast paths on systems with preemption, based on
(hopefully correct) observation that the current loops to verify tid are
unnecessary.

Patches 5-18 focus on allocation slowpath. Patches 5-8 are preparatory code
refactoring.

Patch 9 moves disabling of irqs into ___slab_alloc() from its callers, which
are the allocation slowpath, and bulk allocation. Instead these callers only
disable migration to stabilize the cpu. The following patches then gradually
reduce the scope of disabled irqs in ___slab_alloc() and the functions called
from there. As of patch 12, the re-enabling of irqs based on gfp flags before
calling the page allocator is removed from allocate_slab(). As of patch 15,
it's possible to reach the page allocator (in case of existing slabs depleted)
without disabling and re-enabling irqs a single time.

Patches 19-24 reduce the scope of disabled irqs in remaining functions. Patch
25 replaces a preempt_disable with migrate_disable in put_cpu_partial().

Patch 26 replaces the remaining explicitly irq disabled sections that protect
percpu variables with a local_lock, and updates the locking documentation in
the file's comment.

The result is that irq disabling is only done for minimum amount of time needed
and as part of spin lock or local lock operations to make them irq-safe, except
one case around slab_lock which is a bit spinlock.

This should have obvious implications for better preemption, especially on RT.
Also some RT patches should now be unnecessary, IIUC:

  mm: slub: Enable irqs for __GFP_WAIT [1] becomes unnecessary as of patch 12.

The following two once the IPI flush_slab() handler is dealt with, as discussed
later:

  mm: sl[au]b: Change list_lock to raw_spinlock_t [2] - the SLAB part can be
  dropped as a different patch restricts RT to SLUB anyway. And after this series
  the list_lock in SLUB is never used with irqs disabled before taking the lock.

  mm: slub: Move discard_slab() invocations out of IRQ-off sections [3] should be
  unnecessary as this series does move these invocations outside irq disabled
  sections

Some caveats that will probably have to be solved on PREEMPT_RT - I'm just not
sure enough from reading Documentation/locking/locktypes.rst how some things
work. Advice welcome.

* There are paths such as:
  get_partial_node() - does spin_lock_irqsave(&n->list_lock);
    acquire_slab()
      __cmpxchg_double_slab()
        slab_lock() - a bit spinlock without explicit irqsave

On !PREEMPT_RT this is fine as spin_lock_irqsave() disables irq so slab_lock()
doesn't need to and it's still irq-safe. I assume there are no such guarantees on
PREEMPT_RT where spin_lock_irqsave() is just a mutex with disabled migration?
So RT will have to make sure all paths to slab_lock go through explicit irqsave?

* There is this path involving IPI:
  flush_all()
    on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1);
      IPI with interrupts disabled (is it still true also on RT?)
        flush_cpu_slab()
          flush_slab()
            manipulate kmem_cache_cpu variables
	    deactivate_slab();

The problems here are that in flush_slab() we manipulate variables normally
protected by the local_lock. On !PREEMPT_RT we don't need the local_lock here
because local_lock_irqsave() just disables irqs and we already got them
disabled from the IPI. On PREEMPT_RT we IIUC actually even can't take the
local_lock due to the irqs already disabled. So that's a problem.

Another issue is that deactivate_slab() above will take the node_lock spinlock,
so with irqs disabled it would still have to be a raw spinlock as patch [2]
does. And it will also call discard_slab() which should be also called without
irqs disabled. So for these reasons, the RT patch "mm: slub: Move
flush_cpu_slab() invocations __free_slab() invocations out of IRQ context" [4]
converting IPIs to workqueues will still be needed. Then the work handler
can use local_lock normally and that should solve the issues with flush_all()
and hopefully allow ditching patch [2].

Or is there perhaps a simpler way to make this flush IPI not disable IRQ on
PREEMPT_RT?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0003-mm-slub-Enable-irqs-for-__GFP_WAIT.patch?h=linux-5.12.y-rt-patches
[2] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0001-mm-sl-au-b-Change-list_lock-to-raw_spinlock_t.patch?h=linux-5.12.y-rt-patches
[3] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0004-mm-slub-Move-discard_slab-invocations-out-of-IRQ-off.patch?h=linux-5.12.y-rt-patches
[4] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0005-mm-slub-Move-flush_cpu_slab-invocations-__free_slab-.patch?h=linux-5.12.y-rt-patches

Vlastimil Babka (26):
  mm, slub: allocate private object map for sysfs listings
  mm, slub: allocate private object map for validate_slab_cache()
  mm, slub: don't disable irq for debug_check_no_locks_freed()
  mm, slub: simplify kmem_cache_cpu and tid setup
  mm, slub: extract get_partial() from new_slab_objects()
  mm, slub: dissolve new_slab_objects() into ___slab_alloc()
  mm, slub: return slab page from get_partial() and set c->page
    afterwards
  mm, slub: restructure new page checks in ___slab_alloc()
  mm, slub: move disabling/enabling irqs to ___slab_alloc()
  mm, slub: do initial checks in  ___slab_alloc() with irqs enabled
  mm, slub: move disabling irqs closer to get_partial() in
    ___slab_alloc()
  mm, slub: restore irqs around calling new_slab()
  mm, slub: validate partial and newly allocated slabs before loading
    them
  mm, slub: check new pages with restored irqs
  mm, slub: stop disabling irqs around get_partial()
  mm, slub: move reset of c->page and freelist out of deactivate_slab()
  mm, slub: make locking in deactivate_slab() irq-safe
  mm, slub: call deactivate_slab() without disabling irqs
  mm, slub: move irq control into unfreeze_partials()
  mm, slub: discard slabs in unfreeze_partials() without irqs disabled
  mm, slub: detach whole partial list at once in unfreeze_partials()
  mm, slub: detach percpu partial list in unfreeze_partials() using
    this_cpu_cmpxchg()
  mm, slub: only disable irq with spin_lock in __unfreeze_partials()
  mm, slub: don't disable irqs in slub_cpu_dead()
  mm, slub: use migrate_disable() in put_cpu_partial()
  mm, slub: convert kmem_cpu_slab protection to local_lock

 include/linux/slub_def.h |   2 +
 mm/slub.c                | 496 ++++++++++++++++++++++++---------------
 2 files changed, 314 insertions(+), 184 deletions(-)

-- 
2.31.1