linux-kernel - Re: [PATCH v2] slub: keep empty main sheaf as spare in __pcs_replace_empty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ohyhyc6tldamsaon2sq2l5hslyhtnovquxt36gcidlumtio44p@uwvk27px4trp>
Date: Fri, 16 Jan 2026 17:11:58 +0800
From: Hao Li <hao.li@...ux.dev>
To: Zhao Liu <zhao1.liu@...el.com>
Cc: Vlastimil Babka <vbabka@...e.cz>, Hao Li <haolee.swjtu@...il.com>, 
	akpm@...ux-foundation.org, harry.yoo@...cle.com, cl@...two.org, rientjes@...gle.com, 
	roman.gushchin@...ux.dev, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	tim.c.chen@...el.com, yu.c.chen@...el.com
Subject: Re: [PATCH v2] slub: keep empty main sheaf as spare in
 __pcs_replace_empty_main()

On Fri, Jan 16, 2026 at 05:07:30PM +0800, Zhao Liu wrote:
> > > The following is the perf data comparing 2 tests w/o fix & with this fix:
> > > 
> > > # Baseline  Delta Abs  Shared Object            Symbol
> > > # ........  .........  .......................  ....................................
> > > #
> > >     61.76%     +4.78%  [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
> > >      0.93%     -0.32%  [kernel.vmlinux]         [k] __slab_free
> > >      0.39%     -0.31%  [kernel.vmlinux]         [k] barn_get_empty_sheaf
> > >      1.35%     -0.30%  [kernel.vmlinux]         [k] mas_leaf_max_gap
> > >      3.22%     -0.30%  [kernel.vmlinux]         [k] __kmem_cache_alloc_bulk
> > >      1.73%     -0.20%  [kernel.vmlinux]         [k] __cond_resched
> > >      0.52%     -0.19%  [kernel.vmlinux]         [k] _raw_spin_lock_irqsave
> > >      0.92%     +0.18%  [kernel.vmlinux]         [k] _raw_spin_lock
> > >      1.91%     -0.15%  [kernel.vmlinux]         [k] zap_pmd_range.isra.0
> > >      1.37%     -0.13%  [kernel.vmlinux]         [k] mas_wr_node_store
> > >      1.29%     -0.12%  [kernel.vmlinux]         [k] free_pud_range
> > >      0.92%     -0.11%  [kernel.vmlinux]         [k] __mmap_region
> > >      0.12%     -0.11%  [kernel.vmlinux]         [k] barn_put_empty_sheaf
> > >      0.20%     -0.09%  [kernel.vmlinux]         [k] barn_replace_empty_sheaf
> > >      0.31%     +0.09%  [kernel.vmlinux]         [k] get_partial_node
> > >      0.29%     -0.07%  [kernel.vmlinux]         [k] __rcu_free_sheaf_prepare
> > >      0.12%     -0.07%  [kernel.vmlinux]         [k] intel_idle_xstate
> > >      0.21%     -0.07%  [kernel.vmlinux]         [k] __kfree_rcu_sheaf
> > >      0.26%     -0.07%  [kernel.vmlinux]         [k] down_write
> > >      0.53%     -0.06%  libc.so.6                [.] __mmap
> > >      0.66%     -0.06%  [kernel.vmlinux]         [k] mas_walk
> > >      0.48%     -0.06%  [kernel.vmlinux]         [k] mas_prev_slot
> > >      0.45%     -0.06%  [kernel.vmlinux]         [k] mas_find
> > >      0.38%     -0.06%  [kernel.vmlinux]         [k] mas_wr_store_type
> > >      0.23%     -0.06%  [kernel.vmlinux]         [k] do_vmi_align_munmap
> > >      0.21%     -0.05%  [kernel.vmlinux]         [k] perf_event_mmap_event
> > >      0.32%     -0.05%  [kernel.vmlinux]         [k] entry_SYSRETQ_unsafe_stack
> > >      0.19%     -0.05%  [kernel.vmlinux]         [k] downgrade_write
> > >      0.59%     -0.05%  [kernel.vmlinux]         [k] mas_next_slot
> > >      0.31%     -0.05%  [kernel.vmlinux]         [k] __mmap_new_vma
> > >      0.44%     -0.05%  [kernel.vmlinux]         [k] kmem_cache_alloc_noprof
> > >      0.28%     -0.05%  [kernel.vmlinux]         [k] __vma_enter_locked
> > >      0.41%     -0.05%  [kernel.vmlinux]         [k] memcpy
> > >      0.48%     -0.04%  [kernel.vmlinux]         [k] mas_store_gfp
> > >      0.14%     +0.04%  [kernel.vmlinux]         [k] __put_partials
> > >      0.19%     -0.04%  [kernel.vmlinux]         [k] mas_empty_area_rev
> > >      0.30%     -0.04%  [kernel.vmlinux]         [k] do_syscall_64
> > >      0.25%     -0.04%  [kernel.vmlinux]         [k] mas_preallocate
> > >      0.15%     -0.04%  [kernel.vmlinux]         [k] rcu_free_sheaf
> > >      0.22%     -0.04%  [kernel.vmlinux]         [k] entry_SYSCALL_64
> > >      0.49%     -0.04%  libc.so.6                [.] __munmap
> > >      0.91%     -0.04%  [kernel.vmlinux]         [k] rcu_all_qs
> > >      0.21%     -0.04%  [kernel.vmlinux]         [k] __vm_munmap
> > >      0.24%     -0.04%  [kernel.vmlinux]         [k] mas_store_prealloc
> > >      0.19%     -0.04%  [kernel.vmlinux]         [k] __kmalloc_cache_noprof
> > >      0.34%     -0.04%  [kernel.vmlinux]         [k] build_detached_freelist
> > >      0.19%     -0.03%  [kernel.vmlinux]         [k] vms_complete_munmap_vmas
> > >      0.36%     -0.03%  [kernel.vmlinux]         [k] mas_rev_awalk
> > >      0.05%     -0.03%  [kernel.vmlinux]         [k] shuffle_freelist
> > >      0.19%     -0.03%  [kernel.vmlinux]         [k] down_write_killable
> > >      0.19%     -0.03%  [kernel.vmlinux]         [k] kmem_cache_free
> > >      0.27%     -0.03%  [kernel.vmlinux]         [k] up_write
> > >      0.13%     -0.03%  [kernel.vmlinux]         [k] vm_area_alloc
> > >      0.18%     -0.03%  [kernel.vmlinux]         [k] arch_get_unmapped_area_topdown
> > >      0.08%     -0.03%  [kernel.vmlinux]         [k] userfaultfd_unmap_complete
> > >      0.10%     -0.03%  [kernel.vmlinux]         [k] tlb_gather_mmu
> > >      0.30%     -0.02%  [kernel.vmlinux]         [k] ___slab_alloc
> > > 
> > > I think the insteresting item is "get_partial_node". It seems this fix
> > > makes "get_partial_node" slightly more frequent. HMM, however, I still
> > > can't figure out why this is happening. Do you have any thoughts on it?
> > 
> > I'm not sure if it's statistically significant or just noise, +0.09% could
> > be noise?
> 
> small number does't always mean it's noise. When perf samples get_partial_node
> on the spin lock call chain, its subroutines (spin lock) are hotter, so
> the proportion of subroutine execution is higher. If the function -
> get_partial_node itself (excluding subroutines) executes very quickly,
> the proportion is lower.
> 
> I also expend the perf data with call chain:
> 
> * w/o fix:
> 
> We can calculate the proportion of spin locks introduced by get_partial_node
> is: 31.05% / 49.91% = 62.21%
> 
>     49.91%  mmap2_processes  [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
>             |
>              --49.91%--native_queued_spin_lock_slowpath
>                        |
>                         --49.91%--_raw_spin_lock_irqsave
>                                   |
>                                   |--31.05%--get_partial_node
>                                   |          |
>                                   |          |--23.66%--get_any_partial
>                                   |          |          ___slab_alloc
>                                   |          |
>                                   |           --7.40%--___slab_alloc
>                                   |                     __kmem_cache_alloc_bulk
>                                   |
>                                   |--10.84%--barn_get_empty_sheaf
>                                   |          |
>                                   |          |--6.18%--__kfree_rcu_sheaf
>                                   |          |          kvfree_call_rcu
>                                   |          |
>                                   |           --4.66%--__pcs_replace_empty_main
>                                   |                     kmem_cache_alloc_noprof
>                                   |
>                                   |--5.10%--barn_put_empty_sheaf
>                                   |          |
>                                   |           --5.09%--__pcs_replace_empty_main
>                                   |                     kmem_cache_alloc_noprof
>                                   |
>                                   |--2.01%--barn_replace_empty_sheaf
>                                   |          __pcs_replace_empty_main
>                                   |          kmem_cache_alloc_noprof
>                                   |
>                                    --0.78%--__put_partials
>                                              |
>                                               --0.78%--__kmem_cache_free_bulk.part.0
>                                                         rcu_free_sheaf
> 
> 
> * with fix:
> 
> Similarly, the proportion of spin locks introduced by get_partial_node
> is: 39.91% / 42.82% = 93.20%
> 
>     42.82%  mmap2_processes  [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
>             |
>             ---native_queued_spin_lock_slowpath
>                |
>                 --42.82%--_raw_spin_lock_irqsave
>                           |
>                           |--39.91%--get_partial_node
>                           |          |
>                           |          |--28.25%--get_any_partial
>                           |          |          ___slab_alloc
>                           |          |
>                           |           --11.66%--___slab_alloc
>                           |                     __kmem_cache_alloc_bulk
>                           |
>                           |--1.09%--barn_get_empty_sheaf
>                           |          |
>                           |           --0.90%--__kfree_rcu_sheaf
>                           |                     kvfree_call_rcu
>                           |
>                           |--0.96%--barn_replace_empty_sheaf
>                           |          __pcs_replace_empty_main
>                           |          kmem_cache_alloc_noprof
>                           |
>                            --0.77%--__put_partials
>                                      __kmem_cache_free_bulk.part.0
>                                      rcu_free_sheaf
> 
> 
> So, 62.21% -> 93.20% could reflect that get_partial_node contribute more
> overhead at this point.

Thanks for the detailed notes. I'll try to reproduce it to see what exactly
happened.

-- 
Thanks,
Hao

> 
> > > So, I'd like to know if you think dynamically or adaptively adjusting
> > > capacity is a worthwhile idea.
> > 
> > In the followup series, there will be automatically determined capacity to
> > roughly match the current capacity of cpu partial slabs:
> > 
> > https://lore.kernel.org/all/20260112-sheaves-for-all-v2-4-98225cfb50cf@suse.cz/
> > 
> > We can use that as starting point for further tuning. But I suspect making
> > it adjust dynamically would be complicated.
> 
> Thanks, will continue to evaluate this series.
> 
> Regards,
> Zhao
> 
>