linux-kernel - Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with expired deadline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1f2ad071e59db2ed8bc0b382ae202b7474d07afc.camel@redhat.com>
Date: Fri, 31 Oct 2025 14:24:17 +0100
From: Gabriele Monaco <gmonaco@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>, linux-kernel@...r.kernel.org, Ingo
 Molnar <mingo@...hat.com>, Clark Williams <williams@...hat.com>,
 arighi@...dia.com
Subject: Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with
 expired deadline

On Fri, 2025-10-31 at 14:05 +0100, Peter Zijlstra wrote:
> On Thu, Oct 30, 2025 at 07:42:05PM +0100, Peter Zijlstra wrote:
> > On Wed, Oct 22, 2025 at 12:11:51PM +0200, Gabriele Monaco wrote:
> > > 
> > > Is this expected?
> > 
> > Sort of, that was next on the list. Let me see if I can make it stop a
> > little more.
> 
> OK, so I've gone over things again and all I got was a comment.
> 
> That is, today I think it all works as expected.
> 
> The dl_server will stop once the fair class goes idle long enough. Can
> you confirm this?
> 

I'm going to go through your comment more carefully, but what I can observe now
is a bit different:

After this patch, consuming bandwidth in background on fair tasks and on idle is
equivalent. Updating idle time does effectively replenish after exhausting
runtime and we never stop the server (IMO this is correct behaviour only for
fair tasks, since there's potentially something to do).
At least this is the behaviour I get on a mostly idle system.

Different scenario if I have the CPU busy with other tasks (e.g. RT policies),
there I can see the server stopping and starting again.
After I do this I seem to get a different behaviour (even some boosting after
idle), I'm trying to understand what's going on.

Does this behaviour make sense to you?

Thanks,
Gabriele

> ---
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1152,6 +1152,94 @@ static void __push_dl_task(struct rq *rq
>  /* a defer timer will not be reset if the runtime consumed was <
> dl_server_min_res */
>  static const u64 dl_server_min_res = 1 * NSEC_PER_MSEC;
>  
> +
> +/*
> + * dl_server && dl_defer:
> + *   dl_defer_armed = 0
> + *   dl_defer_running = 0
> + *   dl_throttled = 0
> + *
> + * [1] dl_server_start()
> + *   dl_server_active = 1;
> + *   enqueue_dl_entity()
> + *     update_dl_entity(WAKEUP)
> + *       if (!dl_defer_running)
> + *         dl_defer_armed = 1;
> + *         dl_defer_throttled = 1;
> + *     if (dl_throttled && start_dl_timer())
> + *       return;
> + *       // start server into waiting for zero-laxity
> + *
> + * // deplete server runtime from fair-class
> + * [2] update_curr_dl_se()
> + *   if (dl_defer && dl_throttled && dl_runtime_exceeded())
> + *     dl_defer_running = 0;
> + *     hrtimer_try_to_cancel();   // stop timer
> + *     replenish_dl_new_period()
> + *       // advance period
> + *       dl_throttled = 1;
> + *       dl_defer_armed = 1;
> + *       start_dl_timer();        // restart timer
> + *       // back into waiting for zero-laxity
> + *
> + * // timer actually fires means we have runtime
> + * [4] dl_server_timer()
> + *   if (dl_defer_armed)
> + *     dl_defer_running = 1;
> + *   enqueue_dl_entity(REPLENISH)
> + *     replenish_dl_entity()
> + *       opt-fwd-period
> + *       if (dl_throttled)
> + *         dl_throttled = 0;
> + *       if (dl_defer_armed)
> + *         dl_defer_armed = 0;
> + *     __enqueue_dl_entity();
> + *     // server queued
> + *
> + * // schedule server
> + * [5] pick_task_dl()
> + *   p = server_pick_task();
> + *   if (!p)
> + *     dl_server_stop()
> + *       dequeue_dl_entity();
> + *       hrtimer_try_to_cancel();
> + *       dl_defer_armed = 0;
> + *       dl_throttled = 0;
> + *       dl_server_active = 0;
> + *       // goto [1]
> + *
> + * // server running
> + * [6] update_curr_dl_se()
> + *   if (dl_runtime_exceeded())
> + *     dl_throttled = 1;
> + *     dequeue_dl_entity();
> + *     start_dl_timer();
> + *     // replenish-timer
> + *
> + * // goto [2]
> + *
> + * [7] dl_server_timer()
> + *   enqueue_dl_entity(REPLENISH)
> + *     replenish_dl_entity()
> + *       fwd-period
> + *       if (dl_throttled)
> + *         dl_throttled = 0;
> + *     __enqueue_dl_entity();
> + *     // goto [5]
> + *
> + * Notes:
> + *
> + *  - When there are fair tasks running the most likely loop is [2]->[2].
> + *    the dl_server never actually runs, the timer never fires.
> + *
> + *  - When there is actual fair starvation; the timer fires and starts the
> + *    dl_server. This will then throttle and replenish like a normal DL
> + *    task. Notably it will not 'defer' again.
> + *
> + *  - When fair goes idle, it will not consume dl_server budget so the server
> + *    will start. However, it will find there are no fair tasks to run and
> + *    stop itself.
> + */
>  static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct
> sched_dl_entity *dl_se)
>  {
>  	struct rq *rq = rq_of_dl_se(dl_se);