linux-kernel - Re: [PATCH 00/14] sched: Support shared runqueue locking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <02811bd7-b401-4e16-bb7d-4edeb0b89ffd@arm.com>
Date: Thu, 18 Sep 2025 16:15:45 +0100
From: Christian Loehle <christian.loehle@....com>
To: Peter Zijlstra <peterz@...radead.org>, tj@...nel.org
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com, juri.lelli@...hat.com,
 vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org,
 bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
 longman@...hat.com, hannes@...xchg.org, mkoutny@...e.com,
 void@...ifault.com, arighi@...dia.com, changwoo@...lia.com,
 cgroups@...r.kernel.org, sched-ext@...ts.linux.dev, liuwenfang@...or.com,
 tglx@...utronix.de
Subject: Re: [PATCH 00/14] sched: Support shared runqueue locking

On 9/10/25 16:44, Peter Zijlstra wrote:
> Hi,
> 
> As mentioned [1], a fair amount of sched ext weirdness (current and proposed)
> is down to the core code not quite working right for shared runqueue stuff.
> 
> Instead of endlessly hacking around that, bite the bullet and fix it all up.
> 
> With these patches, it should be possible to clean up pick_task_scx() to not
> rely on balance_scx(). Additionally it should be possible to fix that RT issue,
> and the dl_server issue without further propagating lock breaks.
> 
> As is, these patches boot and run/pass selftests/sched_ext with lockdep on.
> 
> I meant to do more sched_ext cleanups, but since this has all already taken
> longer than I would've liked (real life interrupted :/), I figured I should
> post this as is and let TJ/Andrea poke at it.
> 
> Patches are also available at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/cleanup
> 
> 
> [1] https://lkml.kernel.org/r/20250904202858.GN4068168@noisy.programming.kicks-ass.net
> 
> 
> ---
>  include/linux/cleanup.h  |   5 +
>  include/linux/sched.h    |   6 +-
>  kernel/cgroup/cpuset.c   |   2 +-
>  kernel/kthread.c         |  15 +-
>  kernel/sched/core.c      | 370 +++++++++++++++++++++--------------------------
>  kernel/sched/deadline.c  |  26 ++--
>  kernel/sched/ext.c       | 104 +++++++------
>  kernel/sched/fair.c      |  23 ++-
>  kernel/sched/idle.c      |  14 +-
>  kernel/sched/rt.c        |  13 +-
>  kernel/sched/sched.h     | 225 ++++++++++++++++++++--------
>  kernel/sched/stats.h     |   2 +-
>  kernel/sched/stop_task.c |  14 +-
>  kernel/sched/syscalls.c  |  80 ++++------
>  14 files changed, 495 insertions(+), 404 deletions(-)
> 
> 

Hi Peter, A couple of issues popped up when testing this [1] (that don't trigger on [2]):

When booting (arm64 orion o6) I get:

[    1.298020] sched: DL replenish lagged too much
[    1.298364] ------------[ cut here ]------------
[    1.298377] WARNING: CPU: 4 PID: 0 at kernel/sched/deadline.c:239 inactive_task_timer+0x3d0/0x474
[    1.298413] Modules linked in:
[    1.298436] CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Tainted: G S                  6.17.0-rc4-cix-build+ #56 PREEMPT 
[    1.298455] Tainted: [S]=CPU_OUT_OF_SPEC
[    1.298463] Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 0.3.0-1 2025-04-28T03:35:34+00:00
[    1.298473] pstate: 034000c9 (nzcv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[    1.298486] pc : inactive_task_timer+0x3d0/0x474
[    1.298505] lr : inactive_task_timer+0x394/0x474
[    1.298522] sp : ffff800083d4be00
[    1.298530] x29: ffff800083d4be20 x28: ffff00008362d888 x27: ffff800082ab1f88
[    1.298561] x26: ffff800082ab4a98 x25: ffff0001fef50c18 x24: 0000000000019999
[    1.298589] x23: 000000000000cccc x22: ffff0001fef51708 x21: ffff00008362d640
[    1.298616] x20: ffff0001fef50c00 x19: ffff00008362d7f0 x18: fffffffffff0b580
[    1.298642] x17: ffff80017c966000 x16: ffff800083d48000 x15: 0000000000000028
[    1.298669] x14: 0000000000000000 x13: 00000000000c4000 x12: 00000000000000c5
[    1.298695] x11: 0000000000004bb8 x10: 0000000000004bb8 x9 : 0000000000000000
[    1.298722] x8 : 0000000000000000 x7 : 0000000000000011 x6 : ffff0001fef51bc0
[    1.298747] x5 : ffff0001fef50c00 x4 : 00000000000000cc x3 : 0000000000000000
[    1.298773] x2 : ffff80017c966000 x1 : 0000000000000000 x0 : ffffffffffff3333
[    1.298800] Call trace:
[    1.298808]  inactive_task_timer+0x3d0/0x474 (P)
[    1.298830]  __hrtimer_run_queues+0x3c4/0x440
[    1.298852]  hrtimer_interrupt+0xe4/0x244
[    1.298871]  arch_timer_handler_phys+0x2c/0x44
[    1.298893]  handle_percpu_devid_irq+0xa8/0x1f0
[    1.298916]  handle_irq_desc+0x40/0x58
[    1.298933]  generic_handle_domain_irq+0x1c/0x28
[    1.298950]  gic_handle_irq+0x4c/0x11c
[    1.298965]  call_on_irq_stack+0x30/0x48
[    1.298982]  do_interrupt_handler+0x80/0x84
[    1.299001]  el1_interrupt+0x34/0x64
[    1.299022]  el1h_64_irq_handler+0x18/0x24
[    1.299037]  el1h_64_irq+0x6c/0x70
[    1.299052]  finish_task_switch.isra.0+0xac/0x2bc (P)
[    1.299070]  __schedule+0x45c/0xffc
[    1.299088]  schedule_idle+0x28/0x48
[    1.299104]  do_idle+0x184/0x27c
[    1.299121]  cpu_startup_entry+0x34/0x3c
[    1.299137]  secondary_start_kernel+0x134/0x154
[    1.299158]  __secondary_switched+0xc0/0xc4
[    1.299179] irq event stamp: 1634
[    1.299189] hardirqs last  enabled at (1633): [<ffff800081486354>] el1_interrupt+0x54/0x64
[    1.299210] hardirqs last disabled at (1634): [<ffff800081486324>] el1_interrupt+0x24/0x64
[    1.299229] softirqs last  enabled at (1614): [<ffff8000800bf7b0>] handle_softirqs+0x4a0/0x4b8
[    1.299248] softirqs last disabled at (1609): [<ffff800080010600>] __do_softirq+0x14/0x20
[    1.299262] ---[ end trace 0000000000000000 ]---

and when running actual tests (e.g. iterating through all scx schedulers under load):

[  146.532691] ================================                                                                                                                                                                                                                                    
[  146.536947] WARNING: inconsistent lock state                                                                                                                                                                                                                                    
[  146.541204] 6.17.0-rc4-cix-build+ #58 Tainted: G S      W                                                                                                                                                                                                                       
[  146.547457] --------------------------------                                                                                                                                                                                                                                    
[  146.551713] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.                                                                                                                                                                                                                
[  146.557705] rcu_tasks_trace/79 [HC0[0]:SC0[0]:HE0:SE1] takes:                                                                                                                                                                                                                   
[  146.563438] ffff000089c90e58 (&dsq->lock){?.-.}-{2:2}, at: __task_rq_lock+0x88/0x194                                                                                                                                                                                            
[  146.571179] {IN-HARDIRQ-W} state was registered at:                                                                                                                                                                                                                             
[  146.576042]   lock_acquire+0x1c8/0x338                                                                                                                                                                                                                                          
[  146.579788]   _raw_spin_lock+0x48/0x60                                                                                                                                                                                                                                          
[  146.583536]   dispatch_enqueue+0x130/0x3e8                                                                                                                                                                                                                                      
[  146.587632]   do_enqueue_task+0x2f0/0x464                                                                                                                                                                                                                                       
[  146.591629]   enqueue_task_scx+0x1b0/0x290                                                                                                                                                                                                                                      
[  146.595712]   enqueue_task+0x84/0x18c                                                                                                                                                                                                                                           
[  146.599360]   ttwu_do_activate+0x84/0x25c                                                                                                                                                                                                                                       
[  146.603361]   try_to_wake_up+0x310/0x5f8                                                                                                                                                                                                                                        
[  146.607271]   wake_up_process+0x18/0x24                                                                                                                                                                                                                                         
[  146.611094]   kick_pool+0x9c/0x17c                                                                                                                                                                                                                                              
[  146.614483]   __queue_work+0x544/0x7a8                                                                                                                                                                                                                                          
[  146.618223]   __queue_delayed_work+0x118/0x15c                                                                                                                                                                                                                                  
[  146.622653]   mod_delayed_work_on+0xcc/0xe0                                                                                                                                                                                                                                     
[  146.626823]   kblockd_mod_delayed_work_on+0x20/0x30                                                                                                                                                                                                                             
[  146.631696]   blk_mq_kick_requeue_list+0x1c/0x28                                                                                                                                                                                                                                
[  146.636307]   blk_flush_complete_seq+0xd4/0x2a4                                                                                                                                                                                                                                 
[  146.640824]   flush_end_io+0x1e0/0x3f4                                                                                                                                                                                                                                          
[  146.644559]   blk_mq_end_request+0x60/0x154                                                                                                                                                                                                                                     
[  146.648733]   nvme_end_req+0x30/0x78                                                                                                                                                                                                                                            
[  146.652306]   nvme_complete_rq+0x7c/0x218                                                                                                                                                                                                                                       
[  146.656302]   nvme_pci_complete_rq+0x98/0x110                                                                                                                                                                                                                                   
[  146.660650]   nvme_poll_cq+0x1cc/0x3b4                                                                                                                                                                                                                                          
[  146.664385]   nvme_irq+0x34/0x88                                                                                                                                                                                                                                                
[  146.667600]   __handle_irq_event_percpu+0x88/0x304                                                                                                                                                                                                                              
[  146.672384]   handle_irq_event+0x4c/0xa8                                                                                                                                                                                                                                        
[  146.676293]   handle_fasteoi_irq+0x108/0x20c                                                                                                                                                                                                                                    
[  146.680555]   handle_irq_desc+0x40/0x58                                                                                                                                                                                                                                         
[  146.684378]   generic_handle_domain_irq+0x1c/0x28                                                                                                                                                                                                                               
[  146.689068]   gic_handle_irq+0x4c/0x11c                                                                                                                                                                                                                                         
[  146.692891]   call_on_irq_stack+0x30/0x48                                                                                                                                                                                                                                       
[  146.696891]   do_interrupt_handler+0x80/0x84                                                                                                                                                                                                                                    
[  146.701151]   el1_interrupt+0x34/0x64                                                                                                                                                                                                                                           
[  146.704810]   el1h_64_irq_handler+0x18/0x24                                                                                                                                                                                                                                     
[  146.708979]   el1h_64_irq+0x6c/0x70                                                                                                                                                                                                                                             
[  146.712453]   cpuidle_enter_state+0x12c/0x53c                                                                                                                                                                                                                                   
[  146.716796]   cpuidle_enter+0x38/0x50                                                                                                                                                                                                                                           
[  146.720458]   do_idle+0x204/0x27c                                                                                                                                                                                                                                               
[  146.723759]   cpu_startup_entry+0x38/0x3c                                                                                                                                                                                                                                       
[  146.727755]   secondary_start_kernel+0x134/0x154                                                                                                                                                                                                                                
[  146.732370]   __secondary_switched+0xc0/0xc4                                                                                                                                                                                                                                    
[  146.736638] irq event stamp: 1754                                                                                                                                                                                                                                               
[  146.739938] hardirqs last  enabled at (1753): [<ffff800081497184>] _raw_spin_unlock_irqrestore+0x6c/0x70                                                                                                                                                                        
[  146.749405] hardirqs last disabled at (1754): [<ffff8000814965e4>] _raw_spin_lock_irqsave+0x84/0x88                                                                                                                                                                             
[  146.758437] softirqs last  enabled at (1664): [<ffff800080195598>] rcu_tasks_invoke_cbs+0x100/0x394                                                                                                                                                                             
[  146.767476] softirqs last disabled at (1660): [<ffff800080195598>] rcu_tasks_invoke_cbs+0x100/0x394                                                                                                                                                                             
[  146.776506]                                                                                                                                                                                                                                                                     
[  146.776506] other info that might help us debug this:                                                                                                                                                                                                                           
[  146.783019]  Possible unsafe locking scenario:                                                                                                                                                                                                                                  
[  146.783019]                                                                                                                                                                                                                                                                     
[  146.788923]        CPU0                                                                                                                                                                                                                                                         
[  146.791356]        ----                                                                                                                                                                                                                                                         
[  146.793788]   lock(&dsq->lock);                                                                                                                                                                                                                                                 
[  146.796915]   <Interrupt>                                                                                                                                                                                                                                                       
[  146.799521]     lock(&dsq->lock);                                                                                                                                                                                                                                               
[  146.802821]                                                                                                                                                                                                                                                                     
[  146.802821]  *** DEADLOCK ***                                                                                                                                                                                                                                                   
[  146.802821]                                                                                                                                                                                                                                                                     
[  146.808725] 3 locks held by rcu_tasks_trace/79:                                                                                                                                                                                                                                 
[  146.813242]  #0: ffff800082e6e650 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{4:4}, at: rcu_tasks_one_gp+0x328/0x570                                                                                                                                                                
[  146.823403]  #1: ffff800082adc1f0 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock+0x10/0x1c                                                                                                                                                                                  
[  146.832014]  #2: ffff000089c90e58 (&dsq->lock){?.-.}-{2:2}, at: __task_rq_lock+0x88/0x194                                                                                                                                                                                       
[  146.840178]                                                                                                                                                                                     
[  146.813242]  #0: ffff800082e6e650 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{4:4}, at: rcu_tasks_one_gp+0x328/0x570                                                                                                                                                                
[  146.823403]  #1: ffff800082adc1f0 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock+0x10/0x1c                                                                                                                                                                                  
[  146.832014]  #2: ffff000089c90e58 (&dsq->lock){?.-.}-{2:2}, at: __task_rq_lock+0x88/0x194                                                                                                                                                                                       
[  146.840178] 

[  146.840178] stack backtrace:
[  146.844521] CPU: 10 UID: 0 PID: 79 Comm: rcu_tasks_trace Tainted: G S      W           6.17.0-rc4-cix-build+ #58 PREEMPT 
[  146.855463] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[  146.860240] Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 0.3.0-1 2025-04-28T03:35:34+00:00
[  146.872136] Sched_ext: simple (enabled+all), task: runnable_at=-4ms
[  146.872138] Call trace:
[  146.880822]  show_stack+0x18/0x24 (C)
[  146.884471]  dump_stack_lvl+0x90/0xd0
[  146.888131]  dump_stack+0x18/0x24
[  146.891432]  print_usage_bug.part.0+0x29c/0x364
[  146.895950]  mark_lock+0x778/0x978
[  146.899338]  mark_held_locks+0x58/0x90
[  146.903074]  lockdep_hardirqs_on_prepare+0x100/0x210
[  146.908025]  trace_hardirqs_on+0x5c/0x1cc
[  146.912025]  _raw_spin_unlock_irqrestore+0x6c/0x70
[  146.916803]  task_call_func+0x110/0x164
[  146.920625]  trc_wait_for_one_reader.part.0+0x5c/0x3b8
[  146.925750]  check_all_holdout_tasks_trace+0x124/0x480
[  146.930874]  rcu_tasks_wait_gp+0x1f0/0x3b4
[  146.934957]  rcu_tasks_one_gp+0x4a4/0x570
[  146.938953]  rcu_tasks_kthread+0xd4/0xe0
[  146.942862]  kthread+0x148/0x208
[  146.946079]  ret_from_fork+0x10/0x20
              

(This actually locks up the system without any further print FWIW).

I'll keep testing and start debugging now, but if I can help you with anything immediately, please
do shout.


[1]
This referring to sched/cleanup at time of writing:
e127838bf8f9 sched: Cleanup NOCLOCK
ce024feefe1c sched/ext: Implement p->srq_lock support
6ef342071dd7 sched: Add {DE,EN}QUEUE_LOCKED
ed738ce6f9fb sched: Add shared runqueue locking to __task_rq_lock()
94f197f28834 sched: Add flags to {put_prev,set_next}_task() methods
254d43c94105 sched: Add locking comments to sched_class methods
f8864b505a17 sched: Make __do_set_cpus_allowed() use the sched_change pattern
d0e9cfb835d3 sched: Rename do_set_cpus_allowed()
cfcabf45249d sched: Fix do_set_cpus_allowed() locking
f7b9b39041fb sched: Fix migrate_disable_switch() locking
91128b33456a sched: Move sched_class::prio_changed() into the change pattern
c59dc6ce071b sched: Cleanup sched_delayed handling for class switches
13ea43940095 sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern
f0b336327a1b sched: Re-arrange the {EN,DE}QUEUE flags
b55442cb4ec1 sched: Employ sched_change guards

[2]
5b726e9bf954 sched/fair: Get rid of throttled_lb_pair()