linux-kernel - Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YS=PrWDx+YGVR7bmq0_SoKNztzGrreApCd9qk1yBLA5bA@mail.gmail.com>
Date:   Mon, 6 Nov 2023 14:32:02 -0500
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Daniel Bristot de Oliveira <bristot@...nel.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vineeth Pillai <vineeth@...byteword.org>,
        Shuah Khan <skhan@...uxfoundation.org>,
        Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server

Hi Daniel,

On Sat, Nov 4, 2023 at 6:59 AM Daniel Bristot de Oliveira
<bristot@...nel.org> wrote:
>
> Among the motivations for the DL servers is the real-time throttling
> mechanism. This mechanism works by throttling the rt_rq after
> running for a long period without leaving space for fair tasks.
>
> The base dl server avoids this problem by boosting fair tasks instead
> of throttling the rt_rq. The point is that it boosts without waiting
> for potential starvation, causing some non-intuitive cases.
>
> For example, an IRQ dispatches two tasks on an idle system, a fair
> and an RT. The DL server will be activated, running the fair task
> before the RT one. This problem can be avoided by deferring the
> dl server activation.
>
> By setting the zerolax option, the dl_server will dispatch an
> SCHED_DEADLINE reservation with replenished runtime, but throttled.
>
> The dl_timer will be set for (period - runtime) ns from start time.
> Thus boosting the fair rq on its 0-laxity time with respect to
> rt_rq.
>
> If the fair scheduler has the opportunity to run while waiting
> for zerolax time, the dl server runtime will be consumed. If
> the runtime is completely consumed before the zerolax time, the
> server will be replenished while still in a throttled state. Then,
> the dl_timer will be reset to the new zerolax time
>
> If the fair server reaches the zerolax time without consuming
> its runtime, the server will be boosted, following CBS rules
> (thus without breaking SCHED_DEADLINE).
>
> Signed-off-by: Daniel Bristot de Oliveira <bristot@...nel.org>
> ---
>  include/linux/sched.h   |   2 +
>  kernel/sched/deadline.c | 100 +++++++++++++++++++++++++++++++++++++++-
>  kernel/sched/fair.c     |   3 ++
>  3 files changed, 103 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5ac1f252e136..56e53e6fd5a0 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -660,6 +660,8 @@ struct sched_dl_entity {
>         unsigned int                    dl_non_contending : 1;
>         unsigned int                    dl_overrun        : 1;
>         unsigned int                    dl_server         : 1;
> +       unsigned int                    dl_zerolax        : 1;
> +       unsigned int                    dl_zerolax_armed  : 1;
>
>         /*
>          * Bandwidth enforcement timer. Each -deadline task has its
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 1d7b96ca9011..69ee1fbd60e4 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -772,6 +772,14 @@ static inline void replenish_dl_new_period(struct sched_dl_entity *dl_se,
>         /* for non-boosted task, pi_of(dl_se) == dl_se */
>         dl_se->deadline = rq_clock(rq) + pi_of(dl_se)->dl_deadline;
>         dl_se->runtime = pi_of(dl_se)->dl_runtime;
> +
> +       /*
> +        * If it is a zerolax reservation, throttle it.
> +        */
> +       if (dl_se->dl_zerolax) {
> +               dl_se->dl_throttled = 1;
> +               dl_se->dl_zerolax_armed = 1;
> +       }
>  }
>
>  /*
> @@ -828,6 +836,7 @@ static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
>   * could happen are, typically, a entity voluntarily trying to overcome its
>   * runtime, or it just underestimated it during sched_setattr().
>   */
> +static int start_dl_timer(struct sched_dl_entity *dl_se);
>  static void replenish_dl_entity(struct sched_dl_entity *dl_se)
>  {
>         struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> @@ -874,6 +883,28 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se)
>                 dl_se->dl_yielded = 0;
>         if (dl_se->dl_throttled)
>                 dl_se->dl_throttled = 0;
> +
> +       /*
> +        * If this is the replenishment of a zerolax reservation,
> +        * clear the flag and return.
> +        */
> +       if (dl_se->dl_zerolax_armed) {
> +               dl_se->dl_zerolax_armed = 0;
> +               return;
> +       }
> +
> +       /*
> +        * A this point, if the zerolax server is not armed, and the deadline
> +        * is in the future, throttle the server and arm the zerolax timer.
> +        */
> +       if (dl_se->dl_zerolax &&
> +           dl_time_before(dl_se->deadline - dl_se->runtime, rq_clock(rq))) {
> +               if (!is_dl_boosted(dl_se)) {
> +                       dl_se->dl_zerolax_armed = 1;
> +                       dl_se->dl_throttled = 1;
> +                       start_dl_timer(dl_se);
> +               }
> +       }
>  }
>
>  /*
> @@ -1024,6 +1055,13 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
>                 }
>
>                 replenish_dl_new_period(dl_se, rq);
> +       } else if (dl_server(dl_se) && dl_se->dl_zerolax) {
> +               /*
> +                * The server can still use its previous deadline, so throttle
> +                * and arm the zero-laxity timer.
> +                */
> +               dl_se->dl_zerolax_armed = 1;
> +               dl_se->dl_throttled = 1;
>         }
>  }
>
> @@ -1056,8 +1094,20 @@ static int start_dl_timer(struct sched_dl_entity *dl_se)
>          * We want the timer to fire at the deadline, but considering
>          * that it is actually coming from rq->clock and not from
>          * hrtimer's time base reading.
> +        *
> +        * The zerolax reservation will have its timer set to the
> +        * deadline - runtime. At that point, the CBS rule will decide
> +        * if the current deadline can be used, or if a replenishment
> +        * is required to avoid add too much pressure on the system
> +        * (current u > U).
>          */
> -       act = ns_to_ktime(dl_next_period(dl_se));
> +       if (dl_se->dl_zerolax_armed) {
> +               WARN_ON_ONCE(!dl_se->dl_throttled);
> +               act = ns_to_ktime(dl_se->deadline - dl_se->runtime);

Just a question, here if dl_se->deadline - dl_se->runtime is large,
then does that mean that server activation will be much more into the
future? So say I want to give CFS 30%, then it will take 70% of the
period before CFS preempts RT thus "starving" CFS for this duration. I
think that's Ok for smaller periods and runtimes, though.

I think it does reserve the amount of required CFS bandwidth so it is
probably OK, though it is perhaps letting RT run more initially (say
if CFS tasks are not CPU bound and occasionally wake up, they will
always be hit by the 70% latency AFAICS which may be large for large
periods and small runtimes).

I/we're currently trying these patches on ChromeOS as well.

Just started going over it to understand the patch. Looking nice so
far and thanks,

 - Joel