linux-kernel - Re: [PATCH v7] sched: Consolidate cpufreq updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240902125815.vu3s25ciib34eu3a@airbuntu>
Date: Mon, 2 Sep 2024 13:58:15 +0100
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Christian Loehle <christian.loehle@....com>,
	"Rafael J. Wysocki" <rafael@...nel.org>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Hongyan Xia <hongyan.xia2@....com>,
	John Stultz <jstultz@...gle.com>, linux-pm@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7] sched: Consolidate cpufreq updates

On 09/02/24 14:30, Vincent Guittot wrote:
> On Sun, 1 Sept 2024 at 19:51, Qais Yousef <qyousef@...alina.io> wrote:
> >
> > On 08/13/24 10:27, Vincent Guittot wrote:
> > > On Tue, 13 Aug 2024 at 10:25, Vincent Guittot
> > > <vincent.guittot@...aro.org> wrote:
> > > >
> > > > On Mon, 5 Aug 2024 at 17:35, Christian Loehle <christian.loehle@....com> wrote:
> > > > > Hi Qais,
> > > > > the idea of SCHED_CPUFREQ_FORCE_UPDATE and the possiblity of spamming
> > > > > freq updates still bothered me so let me share my thoughts even though
> > > > > it might be niche enough for us not to care.
> > > > >
> > > > > 1. On fast_switch systems, assuming they are fine with handling the
> > > > > actual updates, we have a bit more work on each context_switch() and
> > > > > some synchronisation, too. That should be fine, if anything there's
> > > > > some performance regression in a couple of niche cases.
> > > > >
> > > > > 2. On !fast_switch systems this gets more interesting IMO. So we have
> > > > > a sugov DEADLINE task wakeup for every (in a freq-diff resulting)
> > > > > update request. This task will preempt whatever and currently will
> > > > > pretty much always be running on the CPU it ran last on (so first CPU
> > > > > of the PD).
> > > >
> > > > The !fast_switch is a bit of concern for me too but not for the same
> > > > reason and maybe the opposite of yours IIUC your proposal below:
> > > >
> > > > With fast_switch we have the following sequence:
> > > >
> > > > sched_switch() to task A
> > > > cpufreq_driver_fast_switch -> write new freq target
> > > > run task A
> > > >
> > > > This is pretty straight forward but we have the following sequence
> > > > with !fast_switch
> > > >
> > > > sched_switch() to task A
> > > > queue_irq_work -> raise an IPI on local CPU
> > > > Handle IPI -> wakeup and queue sugov dl worker on local CPU (always
> > > > with 1 CPU per PD)
> > > > sched_switch() to sugov dl task
> > > > __cpufreq_driver_target() which can possibly block on a lock
> > > > sched_switch() to task A
> > > > run task A
> > > >
> > >
> > > sent a bit too early
> > >
> > > > We can possibly have 2 context switch and one IPi for each "normal"
> > > > context switch which is not really optimal
> > >
> > > It would be good to find a way to skip the spurious back and forth
> > > between the normal task and sugov
> >
> > Hmm I think we use affinity to keep the sugov running on policy->related_cpus.
> > Relaxing this will make it less of a problem, but won't eliminate it.
> 
> yes, but it's not a problem of relaxing affinity here

If we have 1 CPU per PD, then relaxing affinity will allow it to run anywhere.
I am just this will be safe on all platforms of course.

But yeah, I don't think this is a solution anyway but the simplest thing to
make it harder to hit.

> The problem is that the 1st switch to task A will be preempted by
> sugov so the 1st switch is useless. You should call cpufreq_update
> before switching to A so that we skip the useless switch to task A and
> directly switch to sugov 1st then task A

Can we do this safely after we pick task A, but before we do the actual context
switch? One of the reasons I put this too late is because there's a late call
to balance_calbacks() that can impact the state of the rq and important to take
into account based on my previous testing and analysis.

Any reason we need to run the sugov worker as DL instead for example being
a softirq?