[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0ijNkUQdTGRUHRUQKKeEzCR354CGkf-L2oUsG51bnU5oA@mail.gmail.com>
Date: Mon, 20 Oct 2025 15:18:22 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: Doug Smythies <dsmythies@...us.net>, "Rafael J. Wysocki" <rafael@...nel.org>,
Sergey Senozhatsky <senozhatsky@...omium.org>, Linux PM <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>, Tomasz Figa <tfiga@...omium.org>
Subject: Re: RE: [PATCH v1] cpuidle: governors: menu: Predict longer idle time
when in doubt
On Sun, Oct 19, 2025 at 4:45 PM Christian Loehle
<christian.loehle@....com> wrote:
>
> On 10/18/25 16:10, Doug Smythies wrote:
> > Hi all,
> >
> > I have been following and testing these menu.c changes over the last months,
> > but never reported back on this email list because:
> > 1.) I never found anything significant to report.
> > 2.) I always seemed to be a week or more behind the conversations.
>
> Your input is always appreciated!
Indeed.
> >
> > On 2025.10.18 04:47 Rafael wrote:
> >> On Fri, Oct 17, 2025 at 8:37 PM Christian Loehle wrote:
> >>> On 10/17/25 10:39, Rafael J. Wysocki wrote:
> >>>> On Fri, Oct 17, 2025 at 10:22 AM Christian Loehle wrote:
> >>>>> On 10/16/25 17:25, Rafael J. Wysocki wrote:
> >>>>>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >>>>>>
> >>>>>> It is reported that commit 85975daeaa4d ("cpuidle: menu: Avoid discarding
> >>>>>> useful information") led to a performance regression on Intel Jasper Lake
> >>>>>> systems because it reduced the time spent by CPUs in idle state C7 which
> >>>>>> is correlated to the maximum frequency the CPUs can get to because of an
> >>>>>> average running power limit [1].
> >
> > I would like to understand Sergey's benchmark test better, and even try
> > to repeat the results on my test system. I would also like to try to
> > separate the variables in an attempt to isolate potential contributors.
> >
> > To eliminate the PL1 effect, limit the CPU frequency to 2300 MHz and repeat
> > the test. To eliminate potential CPU frequency scaling contributions, use the
> > performance CPU frequency scaling governor. Both changes at once would
> > be an acceptable first step.
> >
> > Sergey: Would you be willing to do that test?
> > Sergey: Could you provide more details about your test?
>
> +1
> Depending on what the actual test does maybe offlining CPUs and comparing would
> be interesting too (if this means that we never reach throttling on this system).
While it would be kind of interesting to know the test details, I
don't think that this is just one test.
Sergey mentioned several different symptoms in his initial message:
https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/
which kind of indicates several different tests regressing, so this
appears to be a whole-platform issue.
> >>From the turbostat data of the other day, it seems that the system was
> > only power throttled for about 25 seconds for each test. What we don't know
> > is how long the test took overall or the magnitude of any contributions from
> > the power limit throttling.
>
> If I didn't mess up it should be >800s, at least from the sum of idle time
> Sergey provided. (excludes active time)
> That makes the powerthrottling story less plausible IMO.
Quite evidently, there is a correlation between the max CPU ("busy")
frequency and the time spent in core C7 on that system.
The only explanation that I can offer is a firmware mechanism turning
spare power into a CPU boost.
RAPL is such a mechanism and it doesn't throttle strictly speaking,
but it prevents the CPU package (in the case of PL1) from using more
energy than it is allowed to use over a given time frame. One way to
achieve that is to allow CPUs to run fast at the beginning of the
measurement window and then throttle them below a certain power level,
but it is not the only way and it is not likely to be used. Moreover,
it is unlikely that the time spent in C7 will affect that because that
time is not known when the measurement window starts.
Another approach is to keep the package power on a "trajectory" to
meet the goal and adjust periodically given what all of the CPUs are
doing. In that case, it will throttle sometimes when the direction of
changes is mispredicted, but overall it will set OPPs with certain
expectation regarding the trend.
Also, on some platforms high-turgo OPPs are "locked" when deep core
idle states (typically C6 and above) are not utilized, but I'm not
aware of that being done on Jasper Lake.
Powered by blists - more mailing lists