lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCqJbnemY8EBYu=4w3ABfrDkuc+dUShDDcjufFpsh7qv1g@mail.gmail.com>
Date: Tue, 16 Sep 2025 10:35:46 -0700
From: John Stultz <jstultz@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>, LKML <linux-kernel@...r.kernel.org>, 
	Ingo Molnar <mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Valentin Schneider <vschneid@...hat.com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	Suleiman Souhlal <suleiman@...gle.com>, Qais Yousef <qyousef@...alina.io>, 
	Joel Fernandes <joelagnelf@...dia.com>, kuyo chang <kuyo.chang@...iatek.com>, 
	hupu <hupu.gm@...il.com>, kernel-team@...roid.com
Subject: Re: [RFC][PATCH] sched/deadline: Fix dl_server getting stuck,
 allowing cpu starvation

On Tue, Sep 16, 2025 at 4:02 AM Peter Zijlstra <peterz@...radead.org> wrote:
> Now, the case John trips seems to be that there were tasks, we ran tasks
> until budget exhausted, dequeued the server and did start_dl_timer().
>
> Then the bandwidth timer fires at a point where there are no more fair
> tasks, replenish_dl_entity() gets called, which *should* set the
> 0-laxity timer, but doesn't -- because !server_has_tasks() -- and then
> nothing.
>
> So perhaps we should do something like the below. Simply continue
> as normal, until we do a whole cycle without having seen a task.
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 5b64bc621993..269ca2eb5ba9 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -875,7 +875,7 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se)
>          */
>         if (dl_se->dl_defer && !dl_se->dl_defer_running &&
>             dl_time_before(rq_clock(dl_se->rq), dl_se->deadline - dl_se->runtime)) {
> -               if (!is_dl_boosted(dl_se) && dl_se->server_has_tasks(dl_se)) {
> +               if (!is_dl_boosted(dl_se)) {
>
>                         /*
>                          * Set dl_se->dl_defer_armed and dl_throttled variables to
> @@ -1171,12 +1171,6 @@ static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_
>                 if (!dl_se->dl_runtime)
>                         return HRTIMER_NORESTART;
>
> -               if (!dl_se->server_has_tasks(dl_se)) {
> -                       replenish_dl_entity(dl_se);
> -                       dl_server_stopped(dl_se);
> -                       return HRTIMER_NORESTART;
> -               }
> -
>                 if (dl_se->dl_defer_armed) {
>                         /*
>                          * First check if the server could consume runtime in background.
>
>
> Notably, this removes all ->server_has_tasks() users, so if this works
> and is correct, we can completely remove that callback and simplify
> more.

So this does seem to avoid this lockup warning issue I've been seeing
in my initial testing. I've not done much other testing with it
though.

I of course still see the thread spawning issues with my
ksched_football test that come from keeping the dl_server running for
the whole period, but that's a separate thing I'm trying to isolate.

Tested-by: John Stultz <jstultz@...gle.com>

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ