[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0j727j13rDRhMk5HHR8WqhWqo5ySdyjSXctZW-Rizur3A@mail.gmail.com>
Date: Tue, 23 Sep 2025 19:25:27 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Linux PM <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>,
Christian Loehle <christian.loehle@....com>
Subject: Re: [PATCH v1 3/3] cpuidle: governors: menu: Special-case nohz_full CPUs
On Thu, Sep 18, 2025 at 5:07 PM Frederic Weisbecker <frederic@...nel.org> wrote:
>
> Le Thu, Sep 11, 2025 at 07:07:42PM +0200, Rafael J. Wysocki a écrit :
> > On Thu, Sep 11, 2025 at 4:17 PM Frederic Weisbecker <frederic@...nel.org> wrote:
> > > So, when !tick_nohz_full_cpu(dev->cpu), what is the purpose of this tick stopped
> > > special case?
> > >
> > > Is it because the next dynamic tick is a better prediction than the typical
> > > interval once the tick is stopped?
> >
> > When !tick_nohz_full_cpu(dev->cpu), the tick is a safety net against
> > getting stuck in a shallow idle state for too long. In that case, if
> > the tick is stopped, the safety net is not there and it is better to
> > use a deep state.
>
> Right.
>
> > However, data->next_timer_ns is a lower limit for the idle state
> > target residency because this is when the next timer is going to
> > trigger.
>
> Ok.
>
> >
> > > Does that mean we might become more "pessimistic" concerning the predicted idle
> > > time for nohz_full CPUs?
> >
> > Yes, and not just we might, but we do unless the idle periods in the
> > workload are "long".
>
> Ok.
>
> >
> > > I guess too shallow C-states are still better than too deep but there should be
> > > a word about that introduced side effect (if any).
> >
> > Yeah, I agree.
> >
> > That said, on a nohz_full CPU there is no safety net against getting
> > stuck in a shallow idle state because the tick is not present. That's
> > why currently the governors don't allow shallow states to be used on
> > nohz_full CPUs.
> >
> > The lack of a safety net is generally not a problem when the CPU has
> > been isolated to run something doing real work all the time, with
> > possible idle periods in the workload, but there are people who
> > isolate CPUs for energy-saving reasons and don't run anything on them
> > on purpose. For those folks, the current behavior to select deep idle
> > states every time is actually desirable.
>
> So far I haven't heard from anybody using nohz_full for powersavings. If
> you have I'd be curious about it.
There is a project called LPMD that does this:
https://github.com/intel/intel-lpmd
> Whether a task runs tickless or not, it
> still runs and the CPU isn't sleeping. Also CPU 0 stays periodic on nohz_full,
> which alone is a problem for powersaving but also prevents a whole package
> from entering low power mode on NUMA.
That's not a problem for the above because it uses "isolation" for
taking some specific CPUs out of use (CPU0 is never one of them
AFAICS).
Also, it does depend on idle governors always putting those CPUs into
deep idle states.
> Let's say it not optimized toward powersaving...
Oh well ...
> > So there are two use cases that cannot be addressed at once and I'm
> > thinking about adding a control knob to allow the user to decide which
> > way to go.
>
> I'm tempted to say we should focus on having not too deep states,
> at the expense of having too shallow. Of course I'm not entirely
> comfortable with the idea because nohz_full CPUs may be idle for a while
> on some workloads. And everyone deserves a rest at some point after
> a long day.
Right.
> I guess force restarting the tick upon idle entry would probably be
> bad for tiny idle round-trips?
It wouldn't be exactly cheap in terms of latency I think.
> As for such a knob, I'm not sure anybody would use it.
Fair enough.
Powered by blists - more mailing lists