[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251031130543.GV4068168@noisy.programming.kicks-ass.net>
Date: Fri, 31 Oct 2025 14:05:43 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: Juri Lelli <juri.lelli@...hat.com>, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>,
Clark Williams <williams@...hat.com>, arighi@...dia.com
Subject: Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with
expired deadline
On Thu, Oct 30, 2025 at 07:42:05PM +0100, Peter Zijlstra wrote:
> On Wed, Oct 22, 2025 at 12:11:51PM +0200, Gabriele Monaco wrote:
>
> Sorry, finally cycling back to this.
>
> > > So how about something like this for starters?
> > >
> >
> > Thanks Peter for sharing this patch, I run it through my test and the model
> > seems to pass (i.e. no more boosting after deadline). What I found curious
> > however, is that throughout the test, servers went only through replenish
> > events.
> > The system under test is mostly idle (6 periodic dl tasks on a 16 CPUs virtme-ng
> > VM), so I expect not to see any task boosted by the servers, but in 5 minutes I
> > didn't even observe any start/stop for the server.
> >
> > I'm not sure why this is happening, but looking at traces it seems replenish
> > occurs more often and perhaps doesn't let the server stop:
> >
> > <idle>-0 [009] d.h3. 14.312395: (+950124) event_nomiss: -9: idle x dl_replenish_idle -> idle
> > <idle>-0 [009] d.h3. 14.312401: (+6) sched_dl_replenish: comm=server pid=-9 runtime=50000000 deadline=15253307235 yielded=0
> > <idle>-0 [009] d.h3. 15.262771: (+950370) event_nomiss: -9: idle x dl_replenish_idle -> idle
> > <idle>-0 [009] d.h3. 15.262781: (+10) sched_dl_replenish: comm=server pid=-9 runtime=50000000 deadline=16203668554 yielded=0
> > <idle>-0 [009] d.h3. 16.213117: (+950336) event_nomiss: -9: idle x dl_replenish_idle -> idle
> > <idle>-0 [009] d.h3. 16.213123: (+6) sched_dl_replenish: comm=server pid=-9 runtime=50000000 deadline=17154029879 yielded=0
> >
> > Is this expected?
>
> Sort of, that was next on the list. Let me see if I can make it stop a
> little more.
OK, so I've gone over things again and all I got was a comment.
That is, today I think it all works as expected.
The dl_server will stop once the fair class goes idle long enough. Can
you confirm this?
---
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1152,6 +1152,94 @@ static void __push_dl_task(struct rq *rq
/* a defer timer will not be reset if the runtime consumed was < dl_server_min_res */
static const u64 dl_server_min_res = 1 * NSEC_PER_MSEC;
+
+/*
+ * dl_server && dl_defer:
+ * dl_defer_armed = 0
+ * dl_defer_running = 0
+ * dl_throttled = 0
+ *
+ * [1] dl_server_start()
+ * dl_server_active = 1;
+ * enqueue_dl_entity()
+ * update_dl_entity(WAKEUP)
+ * if (!dl_defer_running)
+ * dl_defer_armed = 1;
+ * dl_defer_throttled = 1;
+ * if (dl_throttled && start_dl_timer())
+ * return;
+ * // start server into waiting for zero-laxity
+ *
+ * // deplete server runtime from fair-class
+ * [2] update_curr_dl_se()
+ * if (dl_defer && dl_throttled && dl_runtime_exceeded())
+ * dl_defer_running = 0;
+ * hrtimer_try_to_cancel(); // stop timer
+ * replenish_dl_new_period()
+ * // advance period
+ * dl_throttled = 1;
+ * dl_defer_armed = 1;
+ * start_dl_timer(); // restart timer
+ * // back into waiting for zero-laxity
+ *
+ * // timer actually fires means we have runtime
+ * [4] dl_server_timer()
+ * if (dl_defer_armed)
+ * dl_defer_running = 1;
+ * enqueue_dl_entity(REPLENISH)
+ * replenish_dl_entity()
+ * opt-fwd-period
+ * if (dl_throttled)
+ * dl_throttled = 0;
+ * if (dl_defer_armed)
+ * dl_defer_armed = 0;
+ * __enqueue_dl_entity();
+ * // server queued
+ *
+ * // schedule server
+ * [5] pick_task_dl()
+ * p = server_pick_task();
+ * if (!p)
+ * dl_server_stop()
+ * dequeue_dl_entity();
+ * hrtimer_try_to_cancel();
+ * dl_defer_armed = 0;
+ * dl_throttled = 0;
+ * dl_server_active = 0;
+ * // goto [1]
+ *
+ * // server running
+ * [6] update_curr_dl_se()
+ * if (dl_runtime_exceeded())
+ * dl_throttled = 1;
+ * dequeue_dl_entity();
+ * start_dl_timer();
+ * // replenish-timer
+ *
+ * // goto [2]
+ *
+ * [7] dl_server_timer()
+ * enqueue_dl_entity(REPLENISH)
+ * replenish_dl_entity()
+ * fwd-period
+ * if (dl_throttled)
+ * dl_throttled = 0;
+ * __enqueue_dl_entity();
+ * // goto [5]
+ *
+ * Notes:
+ *
+ * - When there are fair tasks running the most likely loop is [2]->[2].
+ * the dl_server never actually runs, the timer never fires.
+ *
+ * - When there is actual fair starvation; the timer fires and starts the
+ * dl_server. This will then throttle and replenish like a normal DL
+ * task. Notably it will not 'defer' again.
+ *
+ * - When fair goes idle, it will not consume dl_server budget so the server
+ * will start. However, it will find there are no fair tasks to run and
+ * stop itself.
+ */
static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_dl_entity *dl_se)
{
struct rq *rq = rq_of_dl_se(dl_se);
Powered by blists - more mailing lists