lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMvUAQ81WEBZdTQ7@jlelli-thinkpadt14gen4.remote.csb>
Date: Thu, 18 Sep 2025 11:42:25 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: John Stultz <jstultz@...gle.com>, LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Valentin Schneider <vschneid@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Xuewen Yan <xuewen.yan94@...il.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Qais Yousef <qyousef@...alina.io>,
	Joel Fernandes <joelagnelf@...dia.com>,
	kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>,
	kernel-team@...roid.com
Subject: Re: [RFC][PATCH] sched/deadline: Fix dl_server getting stuck,
 allowing cpu starvation

On 18/09/25 11:04, Peter Zijlstra wrote:
> On Thu, Sep 18, 2025 at 10:37:04AM +0200, Juri Lelli wrote:
> > On 17/09/25 19:30, Peter Zijlstra wrote:
> > > On Wed, Sep 17, 2025 at 03:56:20PM +0200, Juri Lelli wrote:
> > > 
> > > > > + * By stopping at this point the dl_server retains bandwidth, which, if a new
> > > > > + * task wakes up imminently (starting the server again), can be used --
> > > > > + * subject to CBS wakeup rules -- without having to wait for the next period.
> > > > 
> > > > In both cases we still defer until either the new period or the current
> > > > 0-laxity, right?
> > > > 
> > > > The stop cleans all the flags, so subsequent start calls
> > > > enqueue(ENQUEUE_WAKEUP) -> update_dl_entity() which sets dl_throttled
> > > > and dl_defer_armed in both cases and then we start_dl_timer (defer
> > > > timer) after it (without enqueueing right away).
> > > > 
> > > > Or maybe I am still a bit lost. :)
> > > 
> > > The way I read it earlier today:
> > > 
> > >   dl_server_start()
> > >     enqueue_dl_entity(WAKEUP)
> > >       if (WAKEUP)
> > > 	task_contending();
> > > 	update_dl_entity()
> > > 	  dl_entity_overflows() := true
> > > 	  update_dl_revised_wakeup();
> > > 
> > > In that case, it is possible to continue running with a slight
> > > adjustment to the runtime (it gets scaled back to account for 'lost'
> > > time or somesuch IIRC).
> > > 
> > 
> > Hummm, but this is for !implicit (dl_deadline != dl_period) tasks, is
> > it? And dl-servers are implicit.
> 
> Bah. You're right.
> 
> So how about this:
> 
>   dl_server_timer()
>     if (dl_se->dl_defer_armed)
>       dl_se->dl_defer_running = 1;
> 
>     enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH)
> 
>  __pick_task_dl()
>    p = dl_se->server_pick_task(dl_se);
>    if (!p)
>      dl_server_stop()
>        dl_se->dl_defer_armed = 0;
>        dl_se->dl_throttled = 0;
>        dl_se->dl_server_active = 0;
>        /* notably it leaves dl_defer_running == 1 */
> 
>  dl_server_start()
>    dl_se->dl_server_active = 1;
>    enqueue_dl_entity(WAKEUP)
>      if (WAKEUP)
>        task_contending();
>        update_dl_entity()
>          if (dl_server() && dl_se->dl_defer)
> 	   if (!dl_se->dl_defer_running) /* !true := false */
> 	     /* do not set dl_defer_armed / dl_throttled */
> 
> Note: update_curr_dl_se() will eventually clear dl_defer_running when it
> gets throttled.

Ah right! I indeed missed the dl_defer_running condition check.

> And so it continues with the previous reservation. And I suppose the
> question is, should it do update_dl_revised_wakeup() in this case?

No, it can use its current reservation, so no need to possibly shrink
it. Also the revised rule is for constrained tasks anyway.

I think we are good! Thanks a lot for fixing this and for bearing with
me. :)

Feel free to add my Reviewed/Tested/Acked-by as you see fit.

Juri


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ