[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jCT5exCOz1gmHN+gXaamn-W0Yg0g8KN77vB5tUmsGFOw@mail.gmail.com>
Date: Thu, 5 Feb 2026 20:27:29 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Sumit Gupta <sumitg@...dia.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Mario Limonciello <mario.limonciello@....com>,
Russell Haley <yumpusamongus@...il.com>, "zhenglifeng (A)" <zhenglifeng1@...wei.com>,
pierre.gondois@....com, viresh.kumar@...aro.org, ionela.voinescu@....com,
corbet@....net, rdunlap@...radead.org, ray.huang@....com,
gautham.shenoy@....com, perry.yuan@....com, zhanjie9@...ilicon.com,
linux-pm@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-doc@...r.kernel.org, acpica-devel@...ts.linux.dev,
linux-kernel@...r.kernel.org, linux-tegra@...r.kernel.org, treding@...dia.com,
jonathanh@...dia.com, vsethi@...dia.com, ksitaraman@...dia.com,
sanjayc@...dia.com, nhartman@...dia.com, bbasu@...dia.com
Subject: Re: [PATCH v7 4/7] ACPI: CPPC: add APIs and sysfs interface for min/max_perf
On Thu, Feb 5, 2026 at 8:21 PM Sumit Gupta <sumitg@...dia.com> wrote:
>
> >>>>>>>>>>> Hi Sumit,
> >>>>>>>>>>>
> >>>>>>>>>>> I am thinking that maybe it is better to call these two sysfs
> >>>>>>>>>>> interface
> >>>>>>>>>>> 'min_freq' and 'max_freq' as users read and write khz instead
> >>>>>>>>>>> of raw
> >>>>>>>>>>> value.
> >>>>>>>>>> Thanks for the suggestion.
> >>>>>>>>>> Kept min_perf/max_perf to match the CPPC register names
> >>>>>>>>>> (MIN_PERF/MAX_PERF), making it clear to users familiar with
> >>>>>>>>>> CPPC what's being controlled.
> >>>>>>>>>> The kHz unit is documented in the ABI.
> >>>>>>>>>>
> >>>>>>>>>> Thank you,
> >>>>>>>>>> Sumit Gupta
> >>>>>>>>> On my x86 machine with kernel 6.18.5, the kernel is exposing raw
> >>>>>>>>> values:
> >>>>>>>>>
> >>>>>>>>>> grep . /sys/devices/system/cpu/cpu0/acpi_cppc/*
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/feedback_ctrs:ref:342904018856568
> >>>>>>>>>
> >>>>>>>>> del:437439724183386
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/guaranteed_perf:63
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf:88
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq:0
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf:36
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf:1
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq:3900
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf:62
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf:62
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/wraparound_time:18446744073709551615
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> It would be surprising for a nearby sysfs interface with very
> >>>>>>>>> similar
> >>>>>>>>> names to use kHz instead.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Russell Haley
> >>>>>>>> I can rename to either of the below:
> >>>>>>>> - min/max_freq: might be confused with scaling_min/max_freq.
> >>>>>>>> - min/max_perf_freq: keeps the CPPC register association clear.
> >>>>>>>>
> >>>>>>>> Rafael, Any preferences here?
> >>>>>>> On x86 the units in CPPC are not kHz and there is no easy reliable
> >>>>>>> way
> >>>>>>> to convert them to kHz.
> >>>>>>>
> >>>>>>> Everything under /sys/devices/system/cpu/cpu0/acpi_cppc/ needs to be
> >>>>>>> in CPPC units, not kHz (unless, of course, kHz are CPPC units).
> >>>>>
> >>>>> In v1 [1], these controls were added under acpi_cppc sysfs.
> >>>>> After discussion, they were moved under cpufreq, and [2] was merged
> >>>>> first.
> >>>>> The decision to use frequency scale instead of raw perf was made
> >>>>> for consistency with other cpufreq interfaces as per (v3 [3]).
> >>>>>
> >>>>> CPPC units in our case are also not in kHz. The kHz conversion uses the
> >>>>> existing cppc_perf_to_khz()/cppc_khz_to_perf() helpers which are
> >>>>> already
> >>>>> used in cppc_cpufreq attributes. So the conversion behavior is
> >>>>> consistent
> >>>>> with existing cpufreq interfaces.
> >>>>>
> >>>>> [1]
> >>>>> https://lore.kernel.org/lkml/076c199c-a081-4a7f-956c-f395f4d5e156@nvidia.com/
> >>>>>
> >>>>> [2]
> >>>>> https://lore.kernel.org/all/20250507031941.2812701-1-zhenglifeng1@huawei.com/
> >>>>>
> >>>>> [3]
> >>>>> https://lore.kernel.org/lkml/80e16de0-63e4-4ead-9577-4ebba9b1a02d@nvidia.com/
> >>>>>
> >>>>>
> >>>>>> That said, the new attributes will show up elsewhere.
> >>>>>>
> >>>>>> So why do you need to add these things in the first place?
> >>>>> Currently there's no sysfs interface to dynamically control the
> >>>>> MIN_PERF/MAX_PERF bounds when using autonomous mode. This helps
> >>>>> users tune power and performance at runtime.
> >>>> So what about scaling_min_freq and scaling_max_freq?
> >>>>
> >>>> intel_pstate uses them for an analogous purpose.
> >>> FWIW same thing for amd_pstate.
> >>>
> >> intel_pstate and amd_pstate seem to use setpolicy() to update
> >> scaling_min/max_freq and program MIN_PERF/MAX_PERF.
> > That's one possibility.
> >
> > intel_pstate has a "cpufreq-compatible" mode (in which case it is
> > called intel_cpufreq) and still uses HWP (which is the underlying
> > mechanism for CPPC on Intel platforms).
> >
> >> However, as discussed in v5 [1], cppc_cpufreq cannot switch to
> >> a setpolicy based approach because:
> >> - We need per-CPU control of auto_sel: With setpolicy, we can't
> >> dynamically disable auto_sel for individual CPUs and return to the
> >> target() (no target hook available).
> >> intel_pstate and amd_pstate seem to set HW autonomous mode for
> >> all CPUs, not per-CPU.
> >> - We need to retain the target() callback - the CPPC spec allows
> >> desired_perf to be used even when autonomous selection is enabled.
> > intel_pstate in the "cpufreq-compatible" mode updates its HWP min and
> > max limits when .target() (or .fast_switch() or .adjust_perf()) is
> > called.
> >
> > I guess that would not be sufficient in cppc_cpufreq for some reason?
> >
> >> [1]
> >> https://lore.kernel.org/lkml/66f58f43-631b-40a0-8d42-4e90cd24b757@arm.com/
>
> We can do the same as intel_cpufreq. CPPC spec allows setting
> MIN_PERF/MAX_PERF even when auto_selection is disabled, so we will
> have to update them always from policy limits in target().
>
> However, this would override BIOS-configured MIN_PERF/MAX_PERF values.
> Since policy->min/max are set from hardware capabilities during init,
> any governor would overwrite BIOS bounds with policy limits (hardware
> capability bounds) on their first frequency request - even when user
> hasn't explicitly changed scaling_min/max_freq.
>
> Does intel_cpufreq also override BIOS-configured HWP min/max values?
Yes, it does.
> Should we preserve BIOS-configured values until user explicitly changes
> scaling_min/max_freq?
Why would that be useful?
> Is there any mechanism in cpufreq core to detect explicit user changes to scaling_min/max_freq?
Not today, but since scaling_min/max_freq have their own freq QoS
requests, it should be doable if need be.
In any case, I would very much prefer using the existing
scaling_min/max_freq interface, even if that would require some
additional plumbing, to adding new sysfs attributes pretty much for
the same purpose that would only be used by one driver.
Powered by blists - more mailing lists