[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180601174526.GA105687@joelaf.mtv.corp.google.com>
Date: Fri, 1 Jun 2018 10:45:26 -0700
From: Joel Fernandes <joelaf@...gle.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Juri Lelli <juri.lelli@...hat.com>,
Patrick Bellasi <patrick.bellasi@....com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Morten Rasmussen <Morten.Rasmussen@....com>,
viresh kumar <viresh.kumar@...aro.org>,
Valentin Schneider <valentin.schneider@....com>,
Quentin Perret <quentin.perret@....com>,
Luca Abeni <luca.abeni@...tannapisa.it>,
Claudio Scordino <claudio@...dence.eu.com>,
Joel Fernandes <joel@...lfernandes.org>,
kernel-team@...roid.com,
Alessio Balsini <alessio.balsini@...tannapisa.it>
Subject: Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization
On Fri, Jun 01, 2018 at 03:53:07PM +0200, Vincent Guittot wrote:
> > >> >> The example with a RT task described in the cover letter can be
> > >> >> run with a DL task and will give similar results.
> > >
> > > In the cover letter you says:
> > >
> > > A rt-app use case which creates an always running cfs thread and a
> > > rt threads that wakes up periodically with both threads pinned on
> > > same CPU, show lot of frequency switches of the CPU whereas the CPU
> > > never goes idles during the test.
> > >
> > > I would say that's a quite specific corner case where your always
> > > running CFS task has never accumulated a util_est sample.
> > >
> > > Do we really have these cases in real systems?
> >
> > My example is voluntary an extreme one because it's easier to
> > highlight the problem
> >
> > >
> > > Otherwise, it seems to me that we are trying to solve quite specific
> > > corner cases by adding a not negligible level of "complexity".
> >
> > By complexity, do you mean :
> >
> > Taking into account the number cfs running task to choose between
> > rq->dl.running_bw and avg_dl.util_avg
> >
> > I'm preparing a patchset that will provide the cfs waiting time in
> > addition to dl/rt util_avg for almost no additional cost. I will try
> > to sent the proposal later today
>
>
> The code below adds the tracking of the waiting level of cfs tasks because of
> rt/dl preemption. This waiting time can then be used when selecting an OPP
> instead of the dl util_avg which could become higher than dl bandwidth with
> "long" runtime
>
> We need only one new call for the 1st cfs task that is enqueued to get these additional metrics
> the call to arch_scale_cpu_capacity() can be removed once the later will be
> taken into account when computing the load (which scales only with freq
> currently)
>
> For rt task, we must keep to take into account util_avg to have an idea of the
> rt level on the cpu which is given by the badnwodth for dl
>
> ---
> kernel/sched/fair.c | 27 +++++++++++++++++++++++++++
> kernel/sched/pelt.c | 8 ++++++--
> kernel/sched/sched.h | 4 +++-
> 3 files changed, 36 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index eac1f9a..1682ea7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5148,6 +5148,30 @@ static inline void hrtick_update(struct rq *rq)
> }
> #endif
>
> +static inline void update_cfs_wait_util_avg(struct rq *rq)
> +{
> + /*
> + * If cfs is already enqueued, we don't have anything to do because
> + * we already updated the non waiting time
> + */
> + if (rq->cfs.h_nr_running)
> + return;
> +
> + /*
> + * If rt is running, we update the non wait time before increasing
> + * cfs.h_nr_running)
> + */
> + if (rq->curr->sched_class == &rt_sched_class)
> + update_rt_rq_load_avg(rq_clock_task(rq), rq, 1);
> +
> + /*
> + * If dl is running, we update the non time before increasing
> + * cfs.h_nr_running)
> + */
> + if (rq->curr->sched_class == &dl_sched_class)
> + update_dl_rq_load_avg(rq_clock_task(rq), rq, 1);
> +}
> +
Please correct me if I'm wrong but the CFS preemption-decay happens in
set_next_entity -> update_load_avg when the CFS task is scheduled again after
the preemption. Then can we not fix this issue by doing our UTIL_EST magic
from set_next_entity? But yeah probably we need to be careful with overhead..
IMO I feel its overkill to account dl_avg when we already have DL's running
bandwidth we can use. I understand it may be too instanenous, but perhaps we
can fix CFS's problems within CFS itself and not have to do this kind of
extra external accounting ?
I also feel its better if we don't have to call update_{rt,dl}_rq_load_avg
from within CFS class as being done above.
thanks,
- Joel
Powered by blists - more mailing lists