linux-kernel - Re: [PATCH 00/14] sched: Support shared runqueue locking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250925090001.GX4067720@noisy.programming.kicks-ass.net>
Date: Thu, 25 Sep 2025 11:00:01 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Christian Loehle <christian.loehle@....com>
Cc: tj@...nel.org, linux-kernel@...r.kernel.org, mingo@...hat.com,
	juri.lelli@...hat.com, vincent.guittot@...aro.org,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, vschneid@...hat.com, longman@...hat.com,
	hannes@...xchg.org, mkoutny@...e.com, void@...ifault.com,
	arighi@...dia.com, changwoo@...lia.com, cgroups@...r.kernel.org,
	sched-ext@...ts.linux.dev, liuwenfang@...or.com, tglx@...utronix.de
Subject: Re: [PATCH 00/14] sched: Support shared runqueue locking

On Thu, Sep 18, 2025 at 04:15:45PM +0100, Christian Loehle wrote:

> Hi Peter, A couple of issues popped up when testing this [1] (that don't trigger on [2]):
> 
> When booting (arm64 orion o6) I get:
> 
> [    1.298020] sched: DL replenish lagged too much
> [    1.298364] ------------[ cut here ]------------
> [    1.298377] WARNING: CPU: 4 PID: 0 at kernel/sched/deadline.c:239 inactive_task_timer+0x3d0/0x474

Ah, right. Robot reported this one too. I'll look into it. Could be one
of the patches in sched/urgent cures it, but who knows. I'll have a
poke.

> and when running actual tests (e.g. iterating through all scx schedulers under load):
> 
> [  146.532691] ================================                                                                     
> [  146.536947] WARNING: inconsistent lock state                                                                     
> [  146.541204] 6.17.0-rc4-cix-build+ #58 Tainted: G S      W                                                        
> [  146.547457] --------------------------------                                                                     
> [  146.551713] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.                                                 
> [  146.557705] rcu_tasks_trace/79 [HC0[0]:SC0[0]:HE0:SE1] takes:                                                    
> [  146.563438] ffff000089c90e58 (&dsq->lock){?.-.}-{2:2}, at: __task_rq_lock+0x88/0x194                             

> [  146.840178]                                                                                                      
> [  146.813242]  #0: ffff800082e6e650 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{4:4}, at: rcu_tasks_one_gp+0x328/0x570 
> [  146.823403]  #1: ffff800082adc1f0 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock+0x10/0x1c                   
> [  146.832014]  #2: ffff000089c90e58 (&dsq->lock){?.-.}-{2:2}, at: __task_rq_lock+0x88/0x194                        
> 
> [  146.840178] stack backtrace:
> [  146.844521] CPU: 10 UID: 0 PID: 79 Comm: rcu_tasks_trace Tainted: G S      W           6.17.0-rc4-cix-build+ #58 PREEMPT 
> [  146.855463] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
> [  146.860240] Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 0.3.0-1 2025-04-28T03:35:34+00:00
> [  146.872136] Sched_ext: simple (enabled+all), task: runnable_at=-4ms
> [  146.872138] Call trace:
> [  146.880822]  show_stack+0x18/0x24 (C)
> [  146.884471]  dump_stack_lvl+0x90/0xd0
> [  146.888131]  dump_stack+0x18/0x24
> [  146.891432]  print_usage_bug.part.0+0x29c/0x364
> [  146.895950]  mark_lock+0x778/0x978
> [  146.899338]  mark_held_locks+0x58/0x90
> [  146.903074]  lockdep_hardirqs_on_prepare+0x100/0x210
> [  146.908025]  trace_hardirqs_on+0x5c/0x1cc
> [  146.912025]  _raw_spin_unlock_irqrestore+0x6c/0x70
> [  146.916803]  task_call_func+0x110/0x164

Ooh, yeah, that's buggered. Let me go fix!

Thanks for testing!