lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWY7K0SmNsW1O3mv@hyeyoo>
Date: Tue, 13 Jan 2026 21:31:39 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Petr Tesarik <ptesarik@...e.com>, Christoph Lameter <cl@...two.org>,
        David Rientjes <rientjes@...gle.com>,
        Roman Gushchin <roman.gushchin@...ux.dev>, Hao Li <hao.li@...ux.dev>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Uladzislau Rezki <urezki@...il.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Alexei Starovoitov <ast@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
        bpf@...r.kernel.org, kasan-dev@...glegroups.com,
        kernel test robot <oliver.sang@...el.com>, stable@...r.kernel.org
Subject: Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to
 kvfree_rcu_barrier_on_cache()

On Tue, Jan 13, 2026 at 10:32:33AM +0100, Vlastimil Babka wrote:
> On 1/13/26 3:08 AM, Harry Yoo wrote:
> > On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
> >> After we submit the rcu_free sheaves to call_rcu() we need to make sure
> >> the rcu callbacks complete. kvfree_rcu_barrier() does that via
> >> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
> >> that.
> > 
> > Oops, my bad.
> > 
> >> Reported-by: kernel test robot <oliver.sang@...el.com>
> >> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
> >> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
> >> Cc: stable@...r.kernel.org
> >> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
> >> ---
> > 
> > The fix looks good to me, but I wonder why
> > `if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
> > didn't prevent the bug from happening?
> 
> Hmm good point, didn't notice it's there.
> 
> I think it doesn't help because it happens only after
> flush_all_cpus_locked(). And the callback from rcu_free_sheaf_nobarn()
> will do sheaf_flush_unused() and end up installing the cpu slab again.

I thought about it a little bit more...

It's not because a cpu slab was installed again (for list_slab_objects()
to be called on a slab, it must be on n->partial list), but because
flush_slab() cannot handle concurrent frees to the cpu slab.

CPU X                                CPU Y

- flush_slab() reads
  c->freelist
                                     rcu_free_sheaf_nobarn()
				     ->sheaf_flush_unused()
				     ->__kmem_cache_free_bulk()
				     ->do_slab_free()
				       -> sees slab == c->slab
				       -> frees to c->freelist
- c->slab = NULL,
  c->freelist = NULL
- call deactivate_slab()
  ^ the object freed by sheaf_flush_unused() is leaked,
    thus slab->inuse != 0

That said, flush_slab() works fine only when it is guaranteed that
there will be no concurrent frees to the cpu slab (acquiring local_lock
in flush_slab() doesn't help because free fastpath doesn't take it)

calling rcu_barrier() before flush_all_cpus_locked() ensures
there will be no concurrent frees.

A side question; I'm not sure how __kmem_cache_shrink(),
validate_slab_cache(), cpu_partial_store() are supposed to work
correctly? They call flush_all() without guaranteeing there will be
no concurrent frees to the cpu slab.

...probably doesn't matter after sheaves-for-all :)

> Because the bot flagged commit "slab: add sheaves to most caches" where
> cpu slabs still exist. It's thus possible that with the full series, the
> bug is gone. But we should prevent it upfront anyway.

> The rcu_barrier() in __kmem_cache_shutdown() however is probably
> unnecessary then and we can remove it, right?

Agreed. As it's called (after flushing rcu sheaves) in
kvfree_rcu_barrier_on_cache(), it's not necessary in
__kmem_cache_shutdown().

> >>  mm/slab_common.c | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index eed7ea556cb1..ee994ec7f251 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
> >>   */
> >>  void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
> >>  {
> >> -	if (s->cpu_sheaves)
> >> +	if (s->cpu_sheaves) {
> >>  		flush_rcu_sheaves_on_cache(s);
> >> +		rcu_barrier();
> >> +	}
> >> +
> >>  	/*
> >>  	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
> >>  	 * on a specific slab cache.

-- 
Cheers,
Harry / Hyeonggon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ