[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd7f537a-87a5-43e6-bde1-acb82dd4ec1e@arm.com>
Date: Wed, 27 Mar 2024 13:35:01 +0000
From: Hongyan Xia <hongyan.xia2@....com>
To: Qais Yousef <qyousef@...alina.io>, "Rafael J. Wysocki"
<rafael@...nel.org>, Viresh Kumar <viresh.kumar@...aro.org>,
Ingo Molnar <mingo@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall
<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Christian Loehle <christian.loehle@....com>, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched: Consolidate cpufreq updates
On 24/03/2024 02:01, Qais Yousef wrote:
> Improve the interaction with cpufreq governors by making the
> cpufreq_update_util() calls more intentional.
>
> At the moment we send them when load is updated for CFS, bandwidth for
> DL and at enqueue/dequeue for RT. But this can lead to too many updates
> sent in a short period of time and potentially be ignored at a critical
> moment due to the rate_limit_us in schedutil.
>
> For example, simultaneous task enqueue on the CPU where 2nd task is
> bigger and requires higher freq. The trigger to cpufreq_update_util() by
> the first task will lead to dropping the 2nd request until tick. Or
> another CPU in the same policy triggers a freq update shortly after.
>
> Updates at enqueue for RT are not strictly required. Though they do help
> to reduce the delay for switching the frequency and the potential
> observation of lower frequency during this delay. But current logic
> doesn't intentionally (at least to my understanding) try to speed up the
> request.
>
> To help reduce the amount of cpufreq updates and make them more
> purposeful, consolidate them into these locations:
>
> 1. context_switch()
> 2. task_tick_fair()
> 3. {attach, detach}_entity_load_avg()
> 4. update_blocked_averages()
>
> The update at context switch should help guarantee that DL and RT get
> the right frequency straightaway when they're RUNNING. As mentioned
> though the update will happen slightly after enqueue_task(); though in
> an ideal world these tasks should be RUNNING ASAP and this additional
> delay should be negligible. For fair tasks we need to make sure we send
> a single update for every decay for the root cfs_rq. Any changes to the
> rq will be deferred until the next task is ready to run, or we hit TICK.
> But we are guaranteed the task is running at a level that meets its
> requirements after enqueue.
>
> To guarantee RT and DL tasks updates are never missed, we add a new
> SCHED_CPUFREQ_FORCE_UPDATE to ignore the rate_limit_us. If we are
> already running at the right freq, the governor will end up doing
> nothing, but we eliminate the risk of the task ending up accidentally
> running at the wrong freq due to rate_limit_us.
There may be two things in this patch:
1. Have well-defined, centralized places where we update CPU frequency.
2. The FORCE_UPDATE flag.
I agree that at the moment, frequency updates inside the scheduler are
scattered around in many places, and they can be called consecutively in
a short period of time. Defining those places explicitly instead of
triggering frequency updates here and there sounds like a good idea, so
I definitely support 1.
Not sure about 2. I think rate limit is there for a reason, although I
don't have that many platforms to test on to know whether forcing the
update is a problem.
>
> [...]
Powered by blists - more mailing lists