[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240126004724.tuh7ahpoysm3hw3x@airbuntu>
Date: Fri, 26 Jan 2024 00:47:24 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Ingo Molnar <mingo@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
linux-kernel@...r.kernel.org,
Pierre Gondois <Pierre.Gondois@....com>
Subject: Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when
updating misfit
On 01/25/24 10:35, Dietmar Eggemann wrote:
> On 24/01/2024 22:43, Qais Yousef wrote:
> > On 01/23/24 18:07, Dietmar Eggemann wrote:
> >> On 22/01/2024 19:02, Qais Yousef wrote:
> >>> On 01/22/24 09:59, Dietmar Eggemann wrote:
> >>>> On 05/01/2024 23:20, Qais Yousef wrote:
> >>>>> From: Qais Yousef <qais.yousef@....com>
>
> [...]
>
> >>>> What happen when we hotplug out all CPUs of one CPU capacity value?
> >>>> IMHO, we don't call asym_cpu_capacity_scan() with !new_topology
> >>>> (partition_sched_domains_locked()).
> >>>
> >>> Right. I missed that. We can add another intersection check against
> >>> cpu_active_mask.
> >>>
> >>> I am assuming the skipping was done by design, not a bug that needs fixing?
> >>> I see for suspend (cpuhp_tasks_frozen) the domains are rebuilt, but not for
> >>> hotplug.
> >>
> >> IMHO, it's by design. We setup asym_cap_list only when new_topology is
> >> set (update_topology_flags_workfn() from init_cpu_capacity_callback() or
> >> topology_init_cpu_capacity_cppc()). I.e. when the (max) CPU capacity can
> >> change.
> >> In all the other !new_topology cases we check `has_asym |= sd->flags &
> >> SD_ASYM_CPUCAPACITY` and set sched_asym_cpucapacity accordingly in
> >> build_sched_domains(). Before we always reset sched_asym_cpucapacity in
> >> detach_destroy_domains().
> >> But now we would have to keep asym_cap_list in sync with the active CPUs
> >> I guess.
> >
> > Okay, so you suggest we need to update the code to keep it in sync. Let's see
> > first if Vincent is satisfied with this list traversal or we need to go another
> > way :-)
>
> Yes, if preventing the 'increase of balance_interval' will cure this
> issue as well, then this will be definitely the less invasive fix.
>
> Can you not easily do a 'perf bench sched messaging -g X -l Y' test on
> you M1 to get some numbers behind this additional list traversal in
> pick_next_task_fair()?
I can do. But I noticed sometimes there're unexplainable variation in numbers
when moving kernels that I am not 100% sure is due to random unrelated changes
in caching behavior or due to something I've done. ie: they get better or worse
in unexpected ways. I run `perf sched bench pipe`, which is similar enough
I guess? I don't know how to fill these -g and -l numbers sensibly.
I had issues when running perf to collect stats. But maybe I wasn't specifying
the right options. I will try again.
>
> > I think it is worth having this asym_capacity list available. It seemed several
> > times we needed it and just a little work is required to make it available for
> > potential future users. Even if we don't merge immediately.
>
> I agree. It would give us this ordered (by max CPU capacity) list of
> CPUs to iterate over.
Okay. I need to figure out how to fix this hotplug issue to keep the list in
sync.
Thanks
--
Qais Yousef
Powered by blists - more mailing lists