[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220510184436.fdgzzcfqqevinx5p@wubuntu>
Date: Tue, 10 May 2022 19:44:36 +0100
From: Qais Yousef <qais.yousef@....com>
To: Lukasz Luba <lukasz.luba@....com>
Cc: Xuewen Yan <xuewen.yan@...soc.com>, dietmar.eggemann@....com,
rafael@...nel.org, viresh.kumar@...aro.org, mingo@...hat.com,
peterz@...radead.org, vincent.guittot@...aro.org,
rostedt@...dmis.org, linux-kernel@...r.kernel.org,
di.shen@...soc.com, Xuewen Yan <xuewen.yan94@...il.com>
Subject: Re: [PATCH] sched: Take thermal pressure into account when determine
rt fits capacity
On 05/10/22 18:44, Lukasz Luba wrote:
[...]
> To properly answer this question we probably have to analyze the timings
> and this update path - how often it is actually called. Keep in mind
> we are going to solve CPU capacity inversion for RT class, which
> contains latency sensitive tasks. In this approach the information
This was an attempt for a generic inversion detection. We update
rq->cpu_capacity which is used by capacity_of() in the same path.
I didn't feel brave to write a quick patch in the topology code, but we can
certainly do the detection there in topology_update_thermal_pressure().
> about HW status is coming from this CFS load balance path.
> What if that load balance is not called that often as RT might require?
> What if there is a light load on CPUs, but GPU caused them to throttle,
> reducing capacity by a decent chunk e.g. 50%?
> That would translate to some RT periodic task which takes 2ms every
> 8ms to take 4ms, while maybe on other less power hungry CPU it could
> take 3ms.
>
> The usage of thermal_load_avg() in the scale_rt_capacity() looks OK
> for the CFS, but might not be from the RT class point of view.
> The RT class might want to realize faster that CPUs have changed the
> capacity.
> Maybe it's OK with that patch [1] and boot config shifter=-5, but in
> default boot config for shifter=0 we can suffer for hundreds of ms
> running on lower capacity cpu (which is quite high number of frames
> nowadays).
>
> Without a research and experiments data I'm afraid this is too
> big step to make, with this CFS load balance path.
I think Xuewen didn't want to use thermal_load_avg(), and that's the question
I deferred.
>
> >
> > >
> > > > +
> > > > + rq->cpu_capacity_inverted = 0;
> > > > +
> > > > + for_each_possible_cpu(cpu) {
> > > > + unsigned long cap = arch_scale_cpu_capacity(cpu);
> > > > +
> > > > + if (capacity_orig <= cap)
> > > > + continue;
>
> The search loop here assumes that other CPUs (fortunately not in the
> same freq domain) don't suffer due to reduced capacity. This might be
> not true - when we have ~1 Watt budget for all CPUs in the system and
> single big core can use 3-4W at max or single mid core ~1.2W.
I defined capacity inversion against capacity_orig. IMHO that's the sensible
definition to make.
Would be good to hear more/other suggestions.
>
> > > > +
> > > > + if (cap > inv_cap) {
> > > > + rq->cpu_capacity_inverted = inv_cap;
> > > > + break;
> > > > + }
> > > > + }
> > > > +
> > > > + }
> > > >
> > > > sdg->sgc->capacity = capacity;
> > > > sdg->sgc->min_capacity = capacity;
> > > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > > > index 8dccb34eb190..bfe84c870bf9 100644
> > > > --- a/kernel/sched/sched.h
> > > > +++ b/kernel/sched/sched.h
> > > > @@ -992,6 +992,7 @@ struct rq {
> > > >
> > > > unsigned long cpu_capacity;
> > > > unsigned long cpu_capacity_orig;
> > > > + unsigned long cpu_capacity_inverted;
> > > >
> > > > struct callback_head *balance_callback;
> > > >
> > > > @@ -2807,6 +2808,11 @@ static inline unsigned long capacity_orig_of(int cpu)
> > > > return cpu_rq(cpu)->cpu_capacity_orig;
> > > > }
> > > >
> > > > +static inline unsigned long cpu_in_capacity_inversion(int cpu)
> > > > +{
> > > > + return cpu_rq(cpu)->cpu_capacity_inverted;
> > > > +}
> > > > +
> > > > /**
> > > > * enum cpu_util_type - CPU utilization type
> > > > * @FREQUENCY_UTIL: Utilization used to select frequency
> > > >
> > > >
> > > > --->8---
> > >
> > > The patch is amazing for me, and the complexity is not too high. Would
> > > you please push the patch?
> > > I think the idea is yours, I don't want to use it as my patch v2.
> >
> > I'd be happy to add a commit message so that you can include it in your v2.
> >
> > First, I'd like to hear from Vincent and Lukasz they're happy with this
> > approach.
> >
> > I've been trying to think how we can do this generically but can't find an
> > alternative to the extra loop or additional fallback_cpu_mask. Maybe the mask
> > is okay if we protect it with sched_asymmetric_cpucapacity static key..
> >
>
> I'm sorry Qais, I see that you are trying to bring this
> real-CPU-capacity information into RT, but the source and quality of
> this information IMO might matter. I cannot help you w/o experiment
> results of your proposed approach.
The question I was posing here is whether to handle thermal only in inversion
case as I was suggesting or do better. We are still trickling through the
details, but first, I wanted to make sure there's no objection to this
direction (detect inversion and bail out in rt_task_fits_capacity() for cpus in
capacity inversion).
Cheers
--
Qais Yousef
Powered by blists - more mailing lists