lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eu7erwjzoflxb7wzm7j3iitrwjoukajixasel2s3isfav4i3rv@ko2c2dtmnj2l>
Date: Tue, 10 Feb 2026 17:55:02 +0100
From: Michal Koutný <mkoutny@...e.com>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: tj@...nel.org, hannes@...xchg.org, brauner@...nel.org, 
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH v2] cgroup: avoid css_set_lock in cgroup_css_set_fork()

On Tue, Feb 10, 2026 at 12:19:27PM +0100, Mateusz Guzik <mjguzik@...il.com> wrote:
> This is going to depend on the scale you test on. I was testing on
> south of 32. But I also got a miniscule win from removing css set lock
> as the problem for me, instead everything shifted to tasklist.

To be on the same page -- that means you have nr_cpus >= 32?

> Per my other e-mail tasklist lock retains the terrible 3-times locking
> and it is doing rather expensive work while holding it. It is
> plausible it happens to be at the top at that scale, but that's only
> an argument for fixing it. Even if you don't see the css thing at the
> top at the moment, it will be there once someone(tm) sorts out the
> tasklist problem.

I did a quick test (with 6.18.8-1.g886f4c4-default), first `perf top`
while will-it-scale was running:

  74.23%  [kernel]                        [k] native_queued_spin_lock_slowpath
   6.91%  [kernel]                        [k] intel_idle_irq
   0.87%  [kernel]                        [k] update_sd_lb_stats.constprop.0
   0.68%  [kernel]                        [k] _raw_spin_lock
   0.63%  [kernel]                        [k] clear_page_erms
   0.56%  [kernel]                        [k] sched_balance_find_dst_group
   0.40%  [kernel]                        [k] alloc_vmap_area

and then bpftrace for the waiters:
  $ bpftrace -e 'kprobe:native_queued_spin_lock_slowpath {@[arg0]=count();}
                 END {for($kv : @) {printf("%s\t%d\n", ksym($kv.0), (int64)$kv.1);} clear(@); }'\
                 >bpftrace.out
  $ sort -k2 -r -n bpftrace.out | head | column -t
  pidmap_lock         10482583
  nft_pcpu_tun_ctx    3693517
  css_set_lock        1511164
  input_pool          976252
  tasklist_lock       798578
  nft_pcpu_tun_ctx    481962
  0xffff8abc3ffd55b0  95371
  0xffff8a6d3ffd65b0  93686
  0xffff8a5e218f0840  29501
  0xffff8a5e451dca40  29421

or measured by cummulative waiting time:
  $ bpftrace -e 'kprobe:native_queued_spin_lock_slowpath {@[cpu]=arg0; @st[cpu]=nsecs;}
                 kretprobe:native_queued_spin_lock_slowpath /@[cpu]/ {$lat=nsecs-@st[cpu]; @lats[@[cpu]]=sum($lat);}
                 END {for($kv : @lats) {printf("%s\t%d\n", ksym($kv.0), (int64)$kv.1);} clear(@lats); clear(@st); clear(@) }'\
                 >bpftrace2.out
  
  $ sort -k2 -r -n bpftrace2.out | head -n15 | column -t
  pidmap_lock         1931209805
  rcu_state           1823286316
  rcu_state           1581455156
  rcu_state           1328804835
  rcu_state           1299517157
  rcu_state           1134101627
  nft_pcpu_tun_ctx    1027837665
  0xffff8abc3ffd55b0  861441978
  0xffff8a6d3ffd65b0  850732998
  css_set_lock        520009479
  input_pool          316598763
  tasklist_lock       127161061
  0xffff8aac40023200  32380418
  0xffff8a5e002ab600  30194951
  rcu_state           18334578

Hm, it's interesting that is suggestive of why I saw no big change with
css_set_lock in my setup.


Michal

Download attachment "signature.asc" of type "application/pgp-signature" (266 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ