[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQK+3GLbq4GjOYO0Q6vhURPyNyy70bZKUUwRpLuK-R8NAA@mail.gmail.com>
Date: Thu, 23 Oct 2025 18:17:19 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Harry Yoo <harry.yoo@...cle.com>
Cc: Vlastimil Babka <vbabka@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>,
Christoph Lameter <cl@...two.org>, David Rientjes <rientjes@...gle.com>,
Roman Gushchin <roman.gushchin@...ux.dev>, Alexei Starovoitov <ast@...nel.org>, linux-mm <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@...cle.com> wrote:
>
> On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote:
> > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@...e.cz> wrote:
> > >
> > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> > > that we discard the newly allocated slab if we can't spin and we fail to
> > > trylock. As a result we don't perform inc_slabs_node() later in the
> > > function. Instead we perform a deferred deactivate_slab() which can
> > > either put the unacounted slab on partial list, or discard it
> > > immediately while performing dec_slabs_node(). Either way will cause an
> > > accounting imbalance.
> > >
> > > Fix this by not marking the slab as frozen, and using free_slab()
> > > instead of deactivate_slab() for non-frozen slabs in
> > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> > > case. By not using discard_slab() we avoid dec_slabs_node().
> > >
> > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> > > Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
> > > ---
> > > Changes in v2:
> > > - Fix the problem differently. Harry pointed out that we can't move
> > > inc_slabs_node() outside of list_lock protected regions as that would
> > > reintroduce issues fixed by commit c7323a5ad078
> > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> > > ---
> > > mm/slub.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 23d8f54e9486..87a1d2f9de0d 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > >
> > > if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
> > > /* Unlucky, discard newly allocated slab */
> > > - slab->frozen = 1;
> > > defer_deactivate_slab(slab, NULL);
> > > return NULL;
> > > }
> > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
> > > struct slab *slab = container_of(pos, struct slab, llnode);
> > >
> > > #ifdef CONFIG_SLUB_TINY
> > > - discard_slab(slab->slab_cache, slab);
> > > + free_slab(slab->slab_cache, slab);
> > > #else
> > > - deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > > + if (slab->frozen)
> > > + deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > > + else
> > > + free_slab(slab->slab_cache, slab);
> >
> > A bit odd to use 'frozen' flag as such a signal.
> > I guess I'm worried that truly !frozen slab can come here
> > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
> > And things will be much worse than just accounting.
>
> But the cpu slab must have been frozen before it's attached to
> c->slab?
Is it?
the path is
c = slub_get_cpu_ptr(s->cpu_slab);
if (unlikely(c->slab)) {
struct slab *flush_slab = c->slab;
defer_deactivate_slab(flush_slab, ...);
I don't see why it would be frozen.
> > Maybe add
> > inc_slabs_node(s, nid, slab->objects);
> > right before
> > defer_deactivate_slab(slab, NULL);
> > return NULL;
> >
> > I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
> > It's been 3 years since.
>
> When n->nr_slabs is inconsistent, validate_slab_node() might report an
> error (false positive) when someone wrote '1' to
> /sys/kernel/slab/<cache name>/validate
Ok. I see it now. It's the actual number of elements in n->full
list needs to match n->nr_slabs.
But then how it's not broken already?
I see that
alloc_single_from_new_slab()
unconditionally does inc_slabs_node(), but
slab itself is added either to n->full or n->partial lists.
And validate_slab_node() should be complaining already.
Anyway, I'm not arguing. Just trying to understand.
If you think the fix is fine, then go ahead.
Powered by blists - more mailing lists