[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180517151701.GC162290@joelaf.mtv.corp.google.com>
Date: Thu, 17 May 2018 08:17:01 -0700
From: Joel Fernandes <joel@...lfernandes.org>
To: Patrick Bellasi <patrick.bellasi@....com>
Cc: linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Morten Rasmussen <morten.rasmussen@....com>,
Juri Lelli <juri.lelli@...hat.com>,
Joel Fernandes <joelaf@...gle.com>,
Todd Kjos <tkjos@...gle.com>, kernel-team@...roid.com,
Steve Muckle <smuckle@...gle.com>
Subject: Re: [PATCH 3/3] sched/fair: schedutil: explicit update only when
required
Hi Patrick,
On Mon, May 14, 2018 at 05:32:06PM +0100, Patrick Bellasi wrote:
> On 12-May 23:25, Joel Fernandes wrote:
> > On Sat, May 12, 2018 at 11:04:43PM -0700, Joel Fernandes wrote:
> > > On Thu, May 10, 2018 at 04:05:53PM +0100, Patrick Bellasi wrote:
> > > > Schedutil updates for FAIR tasks are triggered implicitly each time a
> > > > cfs_rq's utilization is updated via cfs_rq_util_change(), currently
> > > > called by update_cfs_rq_load_avg(), when the utilization of a cfs_rq has
> > > > changed, and {attach,detach}_entity_load_avg().
> > > >
> > > > This design is based on the idea that "we should callback schedutil
> > > > frequently enough" to properly update the CPU frequency at every
> > > > utilization change. However, such an integration strategy has also
> > > > some downsides:
> > >
> > > I agree making the call explicit would make schedutil integration easier so
> > > that's really awesome. However I also fear that if some path in the fair
> > > class in the future changes the utilization but forgets to update schedutil
> > > explicitly (because they forgot to call the explicit public API) then the
> > > schedutil update wouldn't go through. In this case the previous design of
> > > doing the schedutil update in the wrapper kind of was a nice to have
>
> I cannot see right now other possible future paths where we can
> actually change the utilization signal without considering that,
> eventually, we should call an existing API to update schedutil if it
> makes sense.
>
> What I can see more likely instead, also because it already happened a
> couple of time, is that because of code changes in fair.c we end up
> calling (implicitly) schedutil with a wrong utilization value.
>
> To note this kind of broken dependency it has already been more
> difficult than possibly noticing an update of the utilization without
> a corresponding explicit call of the public API.
Ok, we are in agreement this is a good thing to do :)
> > > > @@ -5397,9 +5366,27 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > update_cfs_group(se);
> > > > }
> > > >
> > > > - if (!se)
> > > > + /* The task is visible from the root cfs_rq */
> > > > + if (!se) {
> > > > + unsigned int flags = 0;
> > > > +
> > > > add_nr_running(rq, 1);
> > > >
> > > > + if (p->in_iowait)
> > > > + flags |= SCHED_CPUFREQ_IOWAIT;
> > > > +
> > > > + /*
> > > > + * !last_update_time means we've passed through
> > > > + * migrate_task_rq_fair() indicating we migrated.
> > > > + *
> > > > + * IOW we're enqueueing a task on a new CPU.
> > > > + */
> > > > + if (!p->se.avg.last_update_time)
> > > > + flags |= SCHED_CPUFREQ_MIGRATION;
> > > > +
> > > > + cpufreq_update_util(rq, flags);
> > > > + }
> > > > +
> > > > hrtick_update(rq);
> > > > }
> > > >
> > > > @@ -5456,10 +5443,12 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > update_cfs_group(se);
> > > > }
> > > >
> > > > + /* The task is no more visible from the root cfs_rq */
> > > > if (!se)
> > > > sub_nr_running(rq, 1);
> > > >
> > > > util_est_dequeue(&rq->cfs, p, task_sleep);
> > > > + cpufreq_update_util(rq, 0);
> > >
> > > One question about this change. In enqueue, throttle and unthrottle - you are
> > > conditionally calling cpufreq_update_util incase the task was
> > > visible/not-visible in the hierarchy.
> > >
> > > But in dequeue you're unconditionally calling it. Seems a bit inconsistent.
> > > Is this because of util_est or something? Could you add a comment here
> > > explaining why this is so?
> >
> > The big question I have is incase se != NULL, then its still visible at the
> > root RQ level.
>
> My understanding it that you get !se at dequeue time when we are
> dequeuing a task from a throttled RQ. Isn't it?
I don't think so? !se means the RQ is not throttled.
> Thus, this means you are dequeuing a throttled task, I guess for
> example because of a migration.
> However, the point is that a task dequeue from a throttled RQ _is
> already_ not visible from the root RQ, because of the sub_nr_running()
> done by throttle_cfs_rq().
Yes that's what I was wondering, so my point was if its already not visible,
then why call schedutil. I felt call schedutil only if its visible like you
were doing for the other paths.
>
> > In that case should we still call the util_est_dequeue and the
> > cpufreq_update_util?
>
> I had a better look at the different code paths and I've possibly come
> up with some interesting observations. Lemme try to resume theme here.
>
> First of all, we need to distinguish from estimated utilization
> updates and schedutil updates, since they respond to two very
> different goals.
I agree with your assessments below and about not calling cpufreq when CPU is
about to idle.
thanks!
- Joel
Powered by blists - more mailing lists