lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230227205725.dipvh3i7dvyrv4tv@airbuntu>
Date:   Mon, 27 Feb 2023 20:57:25 +0000
From:   Qais Yousef <qyousef@...alina.io>
To:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Juri Lelli <juri.lelli@...hat.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Waiman Long <longman@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>, tj@...nel.org,
        linux-kernel@...r.kernel.org, luca.abeni@...tannapisa.it,
        claudio@...dence.eu.com, tommaso.cucinotta@...tannapisa.it,
        bristot@...hat.com, mathieu.poirier@...aro.org,
        cgroups@...r.kernel.org,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Wei Wang <wvw@...gle.com>, Rick Yiu <rickyiu@...gle.com>,
        Quentin Perret <qperret@...gle.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Sudeep Holla <sudeep.holla@....com>,
        Zefan Li <lizefan.x@...edance.com>, linux-s390@...r.kernel.org,
        x86@...nel.org
Subject: Re: [PATCH v3] sched: cpuset: Don't rebuild root domains on
 suspend-resume

On 02/24/23 16:14, Dietmar Eggemann wrote:
> On 23/02/2023 16:38, Qais Yousef wrote:
> 
> IMHO the patch title is misleading since what you want to avoid in
> certain cases is that the RD DL accounting is updated.

The code calls it rebuild_root_domain() ..

> 
> > On 02/06/23 22:14, Qais Yousef wrote:
> >> Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")

.. and so is the original patch title.

I think I have enough explanation in the commit message and renamed the
function name to be more descriptive too.

> >> enabled rebuilding root domain on cpuset and hotplug operations to
> >> correct deadline accounting.
> >>
> >> Rebuilding root domain is a slow operation and we see 10+ of ms delays
> >> on suspend-resume because of that (worst case captures 20ms which
> >> happens often).
> >>
> >> Since nothing is expected to change on suspend-resume operation; skip
> >> rebuilding the root domains to regain the some of the time lost.
> >>
> >> Achieve this by refactoring the code to pass whether dl accoutning needs
> >> an update to rebuild_sched_domains(). And while at it, rename
> >> rebuild_root_domains() to update_dl_rd_accounting() which I believe is
> >> a more representative name since we are not really rebuilding the root
> >> domains, but rather updating dl accounting at the root domain.
> >>
> >> Some users of rebuild_sched_domains() will skip dl accounting update
> >> now:
> >>
> >> 	* Update sched domains when relaxing the domain level in cpuset
> >> 	  which only impacts searching level in load balance
> 
> This one is cpuset related. (1)
> 
> >> 	* update sched domains when cpufreq governor changes and we need
> >> 	  to create the perf domains
> 
> This one is drivers/base/arch_topology.c [arm/arm64/...] related. (2)
> 
> There are several levels of passing this `update_dl_accounting`
> information through. I guess it looks like this:
> 
> 					update_dl_accounting
> 
> arm/arm64/riscv/parisc specific:
> update_topology_flags_workfn()		true
> rebuild_sched_domains_energy()		false (2)
> 
> cpuset_hotplug_workfn()                 cpus_updated ||
>                          force_rebuild == CPUSET_FORCE_REBUILD_PRS_ERROR
> 
> ->rebuild_sched_domains(update_dl_accounting)
> 
>   update_cpumasks_hier()		true
>   update_relax_domain_level()		false (1)
>   update_flag()				true
>   update_prstate()			true
> 
>   ->rebuild_sched_domains_locked(update_dl_accounting)
> 
>     ->partition_and_rebuild_sched_domains(..., update_dl_accounting)
> 
>         if (update_dl_accounting)
>           update_dl_rd_accounting()
> 
> 
> There is already a somehow hidden interface for `sd/rd rebuild`
> 
>   int __weak arch_update_cpu_topology(void)
> 
> which lets partition_sched_domains_locked() figure out whether sched
> domains have to be rebuild..
> 
> But in your case it is more on the interface `cpuset/hotplug -> sd/rd
> rebuild` and not only `arch -> `sd/rd rebuild``.
> 
> IMHO, it would be still nice to have only one way to tell `sd/rd
> rebuild` what to do and what not to do during sd/rd/(pd) rebuild.

IIUC you're suggesting to introduce some new mechanism to detect if hotplug has
lead to a cpu to disappear or not and use that instead? Are you saying I can
use arch_update_cpu_topology() for that? Something like this?

	diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
	index e5ddc8e11e5d..60c3dcf06f0d 100644
	--- a/kernel/cgroup/cpuset.c
	+++ b/kernel/cgroup/cpuset.c
	@@ -1122,7 +1122,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
	 {
		mutex_lock(&sched_domains_mutex);
		partition_sched_domains_locked(ndoms_new, doms_new, dattr_new);
	-       if (update_dl_accounting)
	+       if (arch_update_cpu_topology())
			update_dl_rd_accounting();
		mutex_unlock(&sched_domains_mutex);
	 }

I am not keen on this. arm64 seems to just read a value without a side effect.
But x86 does reset this value so we can't read it twice in the same call tree
and I'll have to extract it.

The better solution that was discussed before is to not iterate through every
task in the system and let cpuset track when dl tasks are added to it and do
smarter iteration. ATM even if there are no dl tasks in the system we'll
blindly go through every task in the hierarchy to update nothing.

But I'll leave that to Juri to address if he wants. The original change has
introduced a regression and people have noticed when phones cycle through
suspend resume (screen unlock). Juri - could you please chip in on how you want
to address this regression? In theory I should be just a reporter, but trying
my best to help ;-)


Cheers

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ