linux-kernel - Re: [PATCH] slub: avoid list_lock contention from __refill_objects

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <00e257f0-d80d-408e-963c-7962a1cef9d8@suse.cz>
Date: Thu, 29 Jan 2026 11:39:04 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Harry Yoo <harry.yoo@...cle.com>, Hao Li <hao.li@...ux.dev>
Cc: Mateusz Guzik <mjguzik@...il.com>,
 Andrew Morton <akpm@...ux-foundation.org>, Christoph Lameter
 <cl@...two.org>, David Rientjes <rientjes@...gle.com>,
 Roman Gushchin <roman.gushchin@...ux.dev>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, kernel test robot <oliver.sang@...el.com>
Subject: Re: [PATCH] slub: avoid list_lock contention from
 __refill_objects_any()

On 1/29/26 10:30, Harry Yoo wrote:
> On Thu, Jan 29, 2026 at 05:21:21PM +0800, Hao Li wrote:
>> On Thu, Jan 29, 2026 at 10:07:57AM +0100, Vlastimil Babka wrote:
>> > Kernel test robot has reported a regression in the patch "slab: refill
>> > sheaves from all nodes". When taken in isolation like this, there is
>> > indeed a tradeoff - we prefer to use remote objects prior to allocating
>> > new local slabs. It is replicating a behavior that existed before
>> > sheaves for replenishing cpu (partial) slabs - now called
>> > get_from_any_partial() to allocate a single object.
>> > 
>> > So the possibility of allocating remote objects is intended even if
>> > remote accesses are then slower. But the profiles in the report also
>> > suggested a contention on the list_lock spinlock. And that's something
>> > we can try to avoid without much tradeoff - if someone else has the
>> > spin_lock, it's more likely they are allocating from the node than
>> > freeing to it, so we can skip it even if it means allocating a new local
>> > slab - contributing to that lock's contention isn't worth it. It should
>> > not result in partial slabs accumulating on the remote node.
>> > 
>> > Thus add an allow_spin parameter to __refill_objects_node() and
>> > get_partial_node_bulk() to make the attempts from __refill_objects_any()
>> > use only a trylock.
>> > 
>> > Reported-by: kernel test robot <oliver.sang@...el.com>
>> > Link: https://lore.kernel.org/oe-lkp/202601132136.77efd6d7-lkp@intel.com
>> > Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
>> 
>> In my testing, this patch improved performance by:
>> 
>> will-it-scale.64.processes +14.2%
>> will-it-scale.128.processes +9.6%
>> will-it-scale.192.processes +10.8%
>> will-it-scale.per_process_ops +11.6%
>>
>> Tested-by: Hao Li <hao.li@...ux.dev>
> 
> I wonder if using spin_is_contended() or spin_is_locked()
> would be better than trylock by avoiding an atomic operation?

I checked and found that spin_trylock() itself implements a non-atomic check
before the atomic. So adding a spin_is_locked() would only help the caller
bail out a bit faster, but this is not a fastpath. It wouldn't help the
cache coherency traffic, AFAIU.