[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YRjNAUXFD8rPQQ53Vu8j8ka=Z-WEZS54JXEroVH7TPVjg@mail.gmail.com>
Date: Wed, 10 May 2023 23:19:27 -0700
From: Joel Fernandes <joel@...lfernandes.org>
To: Vineeth Pillai <vineeth@...byteword.org>
Cc: luca.abeni@...tannapisa.it, Juri Lelli <juri.lelli@...hat.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Steven Rostedt <rostedt@...dmis.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org
Subject: Re: [PATCH v2 1/2] sched/deadline: Improve reclaim bandwidth accuracy
for GRUB
Hi Vineeth,
Nice work. ;-) Few quick comments below:
On Wed, May 10, 2023 at 6:46 PM Vineeth Pillai <vineeth@...byteword.org> wrote:
>
> Current reclaim calculation for GRUB is a bit inaccurate and the
> inaccuracy gets larger as the bandwidth of tasks becomes smaller.
> I have a test program to show the issue - it runs one or more
> deadline threads and observes the utilization. Following tests
> are run on an isolated cpu(isolcpus=3) in a 4 cpu system and the
> results as shown below:
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.29
>
> RUN 2: runtime=2ms, deadline=period=10ms, RT capacity = 95%
> TID[704]: RECLAIM=1, (r=2ms, d=10ms, p=10ms), Util: 79.96
> TID[704]: RECLAIM=1, (r=2ms, d=10ms, p=10ms), Util: 80.06
> TID[704]: RECLAIM=1, (r=2ms, d=10ms, p=10ms), Util: 80.00
>
> RUN 3: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.70
>
> When running multiple tasks, the reclaimed bandwidth is divided
> proportionately, but is not reclaimed to the max allowable limit:
>
> RUN 4: 2 SCHED_FLAG_RECLAIM tasks, 1 normal task
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=10ms
> Task 3: runtime=5ms, deadline=period=20ms(normal)
> TID[624]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 20.10
> TID[625]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 20.10
> TID[626]: RECLAIM=0, (r=5ms, d=20ms, p=20ms), Util: 25.07
> TID[624]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 20.06
> TID[625]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 20.13
> TID[626]: RECLAIM=0, (r=5ms, d=20ms, p=20ms), Util: 25.12
> TID[624]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 19.95
> TID[625]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 19.93
> TID[626]: RECLAIM=0, (r=5ms, d=20ms, p=20ms), Util: 25.04
>
> I have also tested multiple tasks on all cpus allowing for tasks to
> migrate and see the same issue there as well. Running 10 tasks on 3
> cpus with 6 SCHED_FLAG_RECLAIM and 4 normal tasks, top shows:
> %Cpu0 : 70.1 us, 0.3 sy, 0.0 ni, 29.3 id, 0.0 wa
> %Cpu1 : 69.1 us, 0.3 sy, 0.0 ni, 30.3 id, 0.3 wa
> %Cpu2 : 70.5 us, 0.3 sy, 0.0 ni, 29.2 id, 0.0 wa
>
> The max{} logic in the existing implementation seems to not fully
> capture the GRUB algorithm.
>
> This patch fixes the issue by appropriatley caping the max allowed
> utilization and also slightly adjusting GRUB algorithm to account
> for a mix of normal deadline and SCHED_FLAG_RECLAIM tasks.
Looked at the patch quickly as I am due for bedtime ;-). I feel like
this does more than should be done in one patch. So you should
consider splitting it IMHO.
> According to the GRUB rule, the runtime is depreciated as a factor
> of active bandwidth of the runqueue: "dq = -dt", where U is the
Saying "dq = -dt" where U does not make sense because U is not in the
equation ;-). Suggest rephrase.
> active bandwidth. Also, we do not allocate the full bandwidth of a
> cpu to deadline task, but only a portion(Umax) to it, so as to avoid
> deadline tasks starving lower class tasks. The equation could be
> re-written as "dq = -(U / Umax) * dt"
Isn't the equation in the code right now as:
dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
?
That's what the kernel docs say [1].
So what do you mean by "could be re-written" ?
[1] https://docs.kernel.org/scheduler/sched-deadline.html
>
> Since both normal deadline and SCHED_FLAG_RECLAIM tasks can share
> cpu, we need to consider bandwidth of only SCHED_FLAG_RECLAIM tasks
> in the equation:
> "dq = -(Ureclaim / Umax_reclaim) * dt"
This makes sense, we deplete only as a percent of what's reclaimable
and not the whole damn thing. You did test a mix of reclaim and
non-reclaim tasks so that's good, also since Luca approves that's
great. ;-)
>
> Following are the results with this patch:
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 94.98
> TID[616]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.04
> TID[616]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.01
All these look 100% correct to me. Do these tasks start at the same
time or are they shifted in their respective activations? Just wanted
to be sure it behaves the same either way...
[...]
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e8df6d31c1e..13d85af0f42b 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -257,6 +257,11 @@ static inline bool dl_entity_is_special(const struct sched_dl_entity *dl_se)
> #endif
> }
>
> +static inline bool dl_entity_is_reclaim(const struct sched_dl_entity *dl_se)
> +{
> + return dl_se->flags & SCHED_FLAG_RECLAIM;
> +}
> +
Can this helper addition be split out to a different patch?
> /*
> * Tells if entity @a should preempt entity @b.
> */
> @@ -754,10 +759,20 @@ struct dl_rq {
> u64 extra_bw;
>
> /*
> - * Inverse of the fraction of CPU utilization that can be reclaimed
> - * by the GRUB algorithm.
> + * Maximum available bandwidth for this runqueue. This is used to
> + * calculate reclaimable bandwidth for SCHED_FLAG_RECLAIM tasks.
> + * By restricting maximum usable bandwidth, we aim to give other
> + * tasks on lower classes a chance to run, when competing with
> + * SCHED_FLAG_RECLAIM tasks.
> */
> - u64 bw_ratio;
> + u64 max_bw;
> +
> + /*
> + * Active bandwidth of SCHED_FLAG_RECLAIM tasks on this rq.
> + * This will be a subset of running_bw.
> + */
> + u64 reclaim_bw;
> +
And perhaps addition and use of these new fields if it makes sense.
I will take a closer look at your patches later or after v2..
- Joel
> };
>
> #ifdef CONFIG_FAIR_GROUP_SCHED
> --
> 2.40.1
>
Powered by blists - more mailing lists