lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250918090445.GF3289052@noisy.programming.kicks-ass.net>
Date: Thu, 18 Sep 2025 11:04:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: John Stultz <jstultz@...gle.com>, LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Valentin Schneider <vschneid@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Xuewen Yan <xuewen.yan94@...il.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Qais Yousef <qyousef@...alina.io>,
	Joel Fernandes <joelagnelf@...dia.com>,
	kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>,
	kernel-team@...roid.com
Subject: Re: [RFC][PATCH] sched/deadline: Fix dl_server getting stuck,
 allowing cpu starvation

On Thu, Sep 18, 2025 at 10:37:04AM +0200, Juri Lelli wrote:
> On 17/09/25 19:30, Peter Zijlstra wrote:
> > On Wed, Sep 17, 2025 at 03:56:20PM +0200, Juri Lelli wrote:
> > 
> > > > + * By stopping at this point the dl_server retains bandwidth, which, if a new
> > > > + * task wakes up imminently (starting the server again), can be used --
> > > > + * subject to CBS wakeup rules -- without having to wait for the next period.
> > > 
> > > In both cases we still defer until either the new period or the current
> > > 0-laxity, right?
> > > 
> > > The stop cleans all the flags, so subsequent start calls
> > > enqueue(ENQUEUE_WAKEUP) -> update_dl_entity() which sets dl_throttled
> > > and dl_defer_armed in both cases and then we start_dl_timer (defer
> > > timer) after it (without enqueueing right away).
> > > 
> > > Or maybe I am still a bit lost. :)
> > 
> > The way I read it earlier today:
> > 
> >   dl_server_start()
> >     enqueue_dl_entity(WAKEUP)
> >       if (WAKEUP)
> > 	task_contending();
> > 	update_dl_entity()
> > 	  dl_entity_overflows() := true
> > 	  update_dl_revised_wakeup();
> > 
> > In that case, it is possible to continue running with a slight
> > adjustment to the runtime (it gets scaled back to account for 'lost'
> > time or somesuch IIRC).
> > 
> 
> Hummm, but this is for !implicit (dl_deadline != dl_period) tasks, is
> it? And dl-servers are implicit.

Bah. You're right.

So how about this:

  dl_server_timer()
    if (dl_se->dl_defer_armed)
      dl_se->dl_defer_running = 1;

    enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH)

 __pick_task_dl()
   p = dl_se->server_pick_task(dl_se);
   if (!p)
     dl_server_stop()
       dl_se->dl_defer_armed = 0;
       dl_se->dl_throttled = 0;
       dl_se->dl_server_active = 0;
       /* notably it leaves dl_defer_running == 1 */

 dl_server_start()
   dl_se->dl_server_active = 1;
   enqueue_dl_entity(WAKEUP)
     if (WAKEUP)
       task_contending();
       update_dl_entity()
         if (dl_server() && dl_se->dl_defer)
	   if (!dl_se->dl_defer_running) /* !true := false */
	     /* do not set dl_defer_armed / dl_throttled */

Note: update_curr_dl_se() will eventually clear dl_defer_running when it
gets throttled.

And so it continues with the previous reservation. And I suppose the
question is, should it do update_dl_revised_wakeup() in this case?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ