linux-kernel - Re: [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu allocator scalability problem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGudoHF8-tpc3nJeJ3gF2_GZZGp_raMBu4GXC_5omWMc7LhN1w@mail.gmail.com>
Date: Thu, 24 Apr 2025 17:20:59 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Pedro Falcato <pfalcato@...e.de>
Cc: Harry Yoo <harry.yoo@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, 
	Christoph Lameter <cl@...two.org>, David Rientjes <rientjes@...gle.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>, 
	Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>, 
	Jiri Pirko <jiri@...nulli.us>, Vlad Buslov <vladbu@...dia.com>, 
	Yevgeny Kliteynik <kliteyn@...dia.com>, Jan Kara <jack@...e.cz>, Byungchul Park <byungchul@...com>, 
	linux-mm@...ck.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu
 allocator scalability problem

On Thu, Apr 24, 2025 at 1:28 PM Pedro Falcato <pfalcato@...e.de> wrote:
> > How to do this with slab constructors and destructors: the constructor
> > allocates percpu memory, and the destructor frees it when the slab pages
> > are reclaimed; this slightly alters the constructor’s semantics,
> > as it can now fail.
> >
>
> I really really really really don't like this. We're opening a pandora's box
> of locking issues for slab deadlocks and other subtle issues. IMO the best
> solution there would be, what, failing dtors? which says a lot about the whole
> situation...
>

I noted the need to use leaf spin locks in my IRC conversations with
Harry and later in this very thread, it is a bummer this bit did not
make into the cover letter -- hopefully it would have avoided this
exchange.

I'm going to summarize this again here:

By API contract the dtor can only take a leaf spinlock, in this case one which:
1. disables irqs
2. is the last lock in the dependency chain, as in no locks are taken
while holding it

That way there is no possibility of a deadlock.

This poses a question on how to enforce it and this bit is easy: for
example one can add leaf-spinlock notion to lockdep. Then a misuse on
allocation side is going to complain immediately even without
triggering reclaim. Further, if one would feel so inclined, a test
module can walk the list of all slab caches and do a populate/reclaim
cycle on those with the ctor/dtor pair.

Then there is the matter of particular consumers being ready to do
what they need to on the dtor side only with the spinlock. Does not
sound like a fundamental problem.

> Case in point:
> What happens if you allocate a slab and start ->ctor()-ing objects, and then
> one of the ctors fails? We need to free the ctor, but not without ->dtor()-ing
> everything back (AIUI this is not handled in this series, yet). Besides this
> complication, if failing dtors were added into the mix, we'd be left with a
> half-initialized slab(!!) in the middle of the cache waiting to get freed,
> without being able to.
>

Per my previous paragraph failing dtors would be a self-induced problem.

I can agree one has to roll things back if ctors don't work out, but I
don't think this poses a significant problem.

> Then there are obviously other problems like: whatever you're calling must
> not ever require the slab allocator (directly or indirectly) and must not
> do direct reclaim (ever!), at the risk of a deadlock. The pcpu allocator
> is a no-go (AIUI!) already because of such issues.
>

I don't see how that's true.

> Then there's the separate (but adjacent, particularly as we're considering
> this series due to performance improvements) issue that the ctor() and
> dtor() interfaces are terrible, in the sense that they do not let you batch
> in any way shape or form (requiring us to lock/unlock many times, allocate
> many times, etc). If this is done for performance improvements, I would prefer
> a superior ctor/dtor interface that takes something like a slab iterator and
> lets you do these things.
>

Batching this is also something I mentioned and indeed is a "nice to
have" change. Note however that the work you are suggesting to batch
now also on every alloc/free cycle, so doing it once per creation of a
given object instead is already a win.

-- 
Mateusz Guzik <mjguzik gmail.com>