[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZL2Z8InSLmI5GU9L@localhost.localdomain>
Date: Sun, 23 Jul 2023 23:21:52 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
"Gautham R. Shenoy" <gautham.shenoy@....com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
"Rafael J . Wysocki" <rafael@...nel.org>
Subject: Re: Stopping the tick on a fully loaded system
(Adding Rafael in Cc)
Le Thu, Jul 20, 2023 at 03:00:37PM +0200, Anna-Maria Behnsen a écrit :
> I had also a look at teo. It makes things better but does not solve the
> underlying problem that I see here - please correct me if I missed
> something or if I'm simply wrong:
>
> Yes, the governors have to decide in the end, whether it makes sense to
> stop the tick or not. For this decision, the governors require information
> about the current state of the core and how long nothing has to be done
> propably. At the moment the governors therefore call
> tick_nohz_get_sleep_length(). This checks first whether the tick can be
> stopped. Then it takes into account whether rcu, irq_work, arch_work needs
> the CPU or a timer softirq is pending. If non of this is true, then the
> timers are checked. So tick_nohz_get_sleep_length() isn't only based on
> timers already.
Right but those things (rcu/irq work, etc...) act kind of like timers here
and they should be considered as exceptions.
The timer infrastructure shouldn't take into account the idle activity,
this is really a job for the cpuidle governors.
> The information about the sleep length of the scheduler perspective is
> completely missing in the current existing check for the probable sleep
> length.
>
> Sure, teo takes scheduler utilization into account directly in the
> governor. But for me it is not comprehensible, why the CPU utilization
> check is done after asking for the possible sleep length where timers are
> taken into account. If the CPU is busy anyway, the information generated by
> tick_nohz_next_event() is irrelevant. And when the CPU is not busy, then it
> makes sense to ask for the sleep length also from a timer perspective.
>
> When this CPU utilization check is implemented directly inside the
> governor, every governor has to implement it on it's own. So wouldn't it
> make sense to implement a "how utilized is the CPU out of a scheduler
> perspective" in one place and use this as the first check in
> tick_nohz_get_sleep_length()/tick_nohz_next_event()?
>
Well, beyond that, there might be other situations where the governor may
decide not to stop the tick even if tick_nohz_next_event() says it's possible
to do so. That's the purpose of having that next event as an input among many
others for the cpuidle governors.
As such, calling tmigr_cpu_deactivate() on next tick _evaluation_ time instead of
tick _stop_ time is always going to be problematic.
Can we fix that and call tmigr_cpu_deactivate() from tick_nohz_stop_tick()
instead? This will change a bit the locking scenario because
tick_nohz_stop_tick() doesn't hold the base lock. Is it a problem though?
In the worst case a remote tick happens and handles the earliest timer
for the current CPU while it's between tick_nohz_next_event() and
tick_nohz_stop_tick(), but then the current CPU would just propagate
an earlier deadline than needed. No big deal.
Though I could be overlooking some race or something else making that
not possible of course...
Thanks.
Powered by blists - more mailing lists