linux-kernel - Re: [PATCH v5] sched: Consolidate cpufreq updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtCHfnDAD-p_ScqHh7cZ=7AXDTw3fE0+ynMHNvCjuG8ogw@mail.gmail.com>
Date: Wed, 5 Jun 2024 14:22:38 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Viresh Kumar <viresh.kumar@...aro.org>, 
	Ingo Molnar <mingo@...nel.org>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall <bsegall@...gle.com>, 
	Mel Gorman <mgorman@...e.de>, Daniel Bristot de Oliveira <bristot@...hat.com>, 
	Valentin Schneider <vschneid@...hat.com>, Christian Loehle <christian.loehle@....com>, 
	Hongyan Xia <hongyan.xia2@....com>, John Stultz <jstultz@...gle.com>, linux-pm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5] sched: Consolidate cpufreq updates

Hi Qais,

On Sun, 2 Jun 2024 at 00:40, Qais Yousef <qyousef@...alina.io> wrote:
>
> On 05/30/24 11:46, Qais Yousef wrote:
>
> > +static __always_inline void
> > +__update_cpufreq_ctx_switch(struct rq *rq, struct task_struct *prev)
> > +{
>
> I found a problem here. We should check if prev was sugov task. I hit a
> corner case where we were constantly switching between RT task and sugov.
>
>         if (prev && prev->dl.flags & SCHED_FLAG_SUGOV) {
>                 /* Sugov just did an update, don't be too aggressive */
>                 return;
>         }
>

I reran my test with this v5 and the fix above but the problem is
still there, it waits for the next tick to update the frequency
whereas the cpu was idle.

Also continuing here the discussion started on v2:

I agree that in the current implementation we are probably calling way
too much cpufreq_update, we can optimize some sequences and using the
context switch is a good way to get a better sampling but this is not
enough and we still need to call cpufreq_update in some other case
involving enqueue. The delay of waiting for the next tick is not
acceptable nor sustainable especially with 250 and lower HZ but I'm
pretty sure it would be the same for some system using 1000HZ. IIUC
new HW is becoming much more efficient at updating the frequency so it
would not be a problem for this new system to update performance more
frequently especially when it ends up being as simple as writing a
value in a memory region without waiting for it to be applied (like
cpufreq fast_switch methods). All this to say that always/only waiting
for context switch or tick might be suitable for your case but it
doesn't look like the right solution for all devices and systems

> > +#ifdef CONFIG_CPU_FREQ
> > +     /*
> > +      * RT and DL should always send a freq update. But we can do some
> > +      * simple checks to avoid it when we know it's not necessary.
> > +      *
> > +      * iowait_boost will always trigger a freq update too.
> > +      *
> > +      * Fair tasks will only trigger an update if the root cfs_rq has
> > +      * decayed.
> > +      *
> > +      * Everything else should do nothing.
> > +      */
> > +     switch (current->policy) {
> > +     case SCHED_NORMAL:
> > +     case SCHED_BATCH:
> > +             if (unlikely(current->in_iowait)) {
> > +                     cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT | SCHED_CPUFREQ_FORCE_UPDATE);
> > +                     return;
> > +             }
> > +
> > +#ifdef CONFIG_SMP
> > +             if (unlikely(rq->cfs.decayed)) {
> > +                     rq->cfs.decayed = false;
> > +                     cpufreq_update_util(rq, 0);
> > +                     return;
> > +             }
> > +#endif
> > +             return;
> > +     case SCHED_FIFO:
> > +     case SCHED_RR:
> > +             if (prev && rt_policy(prev->policy)) {
> > +#ifdef CONFIG_UCLAMP_TASK
> > +                     unsigned long curr_uclamp_min = uclamp_eff_value(current, UCLAMP_MIN);
> > +                     unsigned long prev_uclamp_min = uclamp_eff_value(prev, UCLAMP_MIN);
> > +
> > +                     if (curr_uclamp_min == prev_uclamp_min)
> > +#endif
> > +                             return;
> > +             }
> > +#ifdef CONFIG_SMP
> > +             /* Stopper task masquerades as RT */
> > +             if (unlikely(current->sched_class == &stop_sched_class))
> > +                     return;
> > +#endif
> > +             cpufreq_update_util(rq, SCHED_CPUFREQ_FORCE_UPDATE);
> > +             return;
> > +     case SCHED_DEADLINE:
> > +             if (current->dl.flags & SCHED_FLAG_SUGOV) {
> > +                     /* Ignore sugov kthreads, they're responding to our requests */
> > +                     return;
> > +             }
> > +             cpufreq_update_util(rq, SCHED_CPUFREQ_FORCE_UPDATE);
> > +             return;
> > +     default:
> > +             return;
> > +     }
> > +#endif
> > +}