linux-kernel - Re: [tip: sched/urgent] sched/deadline: Fix dl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923220215.GH3419281@noisy.programming.kicks-ass.net>
Date: Wed, 24 Sep 2025 00:02:15 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Marek Szyprowski <m.szyprowski@...sung.com>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
	John Stultz <jstultz@...gle.com>, x86@...nel.org,
	'Linux Samsung SOC' <linux-samsung-soc@...r.kernel.org>,
	Mark Rutland <mark.rutland@....com>
Subject: Re: [tip: sched/urgent] sched/deadline: Fix dl_server getting stuck

On Mon, Sep 22, 2025 at 11:57:02PM +0200, Marek Szyprowski wrote:
> On 18.09.2025 08:56, tip-bot2 for Peter Zijlstra wrote:
> > The following commit has been merged into the sched/urgent branch of tip:
> >
> > Commit-ID:     077e1e2e0015e5ba6538d1c5299fb299a3a92d60
> > Gitweb:        https://git.kernel.org/tip/077e1e2e0015e5ba6538d1c5299fb299a3a92d60
> > Author:        Peter Zijlstra <peterz@...radead.org>
> > AuthorDate:    Tue, 16 Sep 2025 23:02:41 +02:00
> > Committer:     Peter Zijlstra <peterz@...radead.org>
> > CommitterDate: Thu, 18 Sep 2025 08:50:05 +02:00
> >
> > sched/deadline: Fix dl_server getting stuck
> >
> > John found it was easy to hit lockup warnings when running locktorture
> > on a 2 CPU VM, which he bisected down to: commit cccb45d7c429
> > ("sched/deadline: Less agressive dl_server handling").
> >
> > While debugging it seems there is a chance where we end up with the
> > dl_server dequeued, with dl_se->dl_server_active. This causes
> > dl_server_start() to return without enqueueing the dl_server, thus it
> > fails to run when RT tasks starve the cpu.
> >
> > When this happens, dl_server_timer() catches the
> > '!dl_se->server_has_tasks(dl_se)' case, which then calls
> > replenish_dl_entity() and dl_server_stopped() and finally return
> > HRTIMER_NO_RESTART.
> >
> > This ends in no new timer and also no enqueue, leaving the dl_server
> > 'dead', allowing starvation.
> >
> > What should have happened is for the bandwidth timer to start the
> > zero-laxity timer, which in turn would enqueue the dl_server and cause
> > dl_se->server_pick_task() to be called -- which will stop the
> > dl_server if no fair tasks are observed for a whole period.
> >
> > IOW, it is totally irrelevant if there are fair tasks at the moment of
> > bandwidth refresh.
> >
> > This removes all dl_se->server_has_tasks() users, so remove the whole
> > thing.
> >
> > Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
> > Reported-by: John Stultz <jstultz@...gle.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> > Tested-by: John Stultz <jstultz@...gle.com>
> > ---
> 
> This patch landed in today's linux-next as commit 077e1e2e0015 
> ("sched/deadline: Fix dl_server getting stuck"). In my tests I found 
> that it breaks CPU hotplug on some of my systems. On 64bit 
> Exynos5433-based TM2e board I've captured the following lock dep warning 
> (which unfortunately doesn't look like really related to CPU hotplug):

Right -- I've looked at this patch a few times over the day, and the
only thing I can think of is that we keep the dl_server timer running.
But I already gave you a patch that *should've* stopped it.

There were a few issues with it -- notably if you've booted with
something like isolcpus / nohz_full it might not have worked because the
site I put the dl_server_stop() would only get ran if there was a root
domain attached to the CPU.

Put it in a different spot, just to make sure.

There is also the fact that dl_server_stop() uses
hrtimer_try_to_cancel(), which can 'fail' when the timer is actively
running. But if that is the case, it must be spin-waiting on rq->lock
-- since the caller of dl_server_stop() will be holding that. Once
dl_server_stop() completes and the rq->lock is released, the timer will
see !dl_se->dl_throttled and immediately stop without restarting.

So that *should* not be a problem.

Anyway, clutching at staws here etc.

> # for i in /sys/devices/system/cpu/cpu[1-9]; do echo 0 >$i/online; done
> Detected VIPT I-cache on CPU7
> CPU7: Booted secondary processor 0x0000000101 [0x410fd031]
> ------------[ cut here ]------------
> WARNING: CPU: 7 PID: 0 at kernel/rcu/tree.c:4329 
> rcutree_report_cpu_starting+0x1e8/0x348

This is really weird; this does indeed look like CPU7 decides to boot
again. AFAICT it is not hotplug failing and bringing the CPU back again,
but it is really starting again.

I'm not well versed enough in ARM64 foo to know what would cause a CPU
to boot -- but on x86_64 this isn't something that would easily happen
by accident.

Not stopping a timer would certainly not be sufficient -- notably
hrtimers_cpu_dying() would have migrated the thing.

> (system is frozen at this point).

The whole lockdep and freezing thing is typically printk choking on
itself.

My personal way around this are these here patches:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental

They don't apply cleanly anymore, but the conflict isn't hard, so I've
not taken the bother to rebase them yet. It relies on the platform
having earlyprintk configured, then add force_early_printk to your
kernel cmdline to have earlyprintk completely take over.

Typical early serial drivers are lock-free and don't suffer from
lockups.

If you get it to work, you might get more data out of it.


---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3eb6faa91d06..c0b1dc360e68 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8363,6 +8363,7 @@ static inline void sched_set_rq_offline(struct rq *rq, int cpu)
 		BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
 		set_rq_offline(rq);
 	}
+	dl_server_stop(&rq->fair_server);
 	rq_unlock_irqrestore(rq, &rf);
 }