linux-kernel - Re: [PATCH v2 1/1] arm64: dts: rockchip: rk3528: Add CPU frequency scaling support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABjd4YzVqOssjMH0VUiUJF78dV9n6Dq6EipWYqUXoxHu6ehUSw@mail.gmail.com>
Date: Wed, 16 Jul 2025 19:48:09 +0400
From: Alexey Charkov <alchark@...il.com>
To: Chukun Pan <amadeus@....edu.cn>
Cc: conor+dt@...nel.org, devicetree@...r.kernel.org, heiko@...ech.de, 
	jonas@...boo.se, krzk+dt@...nel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-kernel@...r.kernel.org, linux-rockchip@...ts.infradead.org, 
	ziyao@...root.org
Subject: Re: [PATCH v2 1/1] arm64: dts: rockchip: rk3528: Add CPU frequency
 scaling support

On Wed, Jul 16, 2025 at 6:30 PM Chukun Pan <amadeus@....edu.cn> wrote:
>
> Hi,
>
> > > There has often been the argument that selecting a frequency that has
> > > the same voltage as a faster frequency does not save any power.
> > >
> > > Hence I remember that we dropped slower frequencies on other socs
> > > that share the same voltage with a higher frequency.
>
> Sorry, but soc.dtsi on rockchip doesn't seem to have dropped slower
> frequencies with the same voltage?
>
> rk3562.dtsi: CPU 408MHz -  816MHz 825mV | GPU 300MHz - 600MHz 825mV
> rk3566.dtsi: CPU 408MHz -  816MHz 850mV | GPU 200MHz - 400MHz 850mV
> rk3576.dtsi: CPU 408MHz - 1200MHz 700mV | GPU 300MHz - 600MHz 700mV

They were dropped for RK3588 when merging the CPU OPPs that I
submitted, after respective feedback from Daniel Lezcano [1]. GPU OPPs
for RK3588 were merged before that discussion, so they still carry
"lower frequency, same voltage" entries.

[1] https://lore.kernel.org/all/731aac66-f698-4a1e-b9ee-46a7f24ecae5@linaro.org/

> > I.e. here the mainline kernel will always choose opp-1008000000 as
> > long as the regulator voltage is 875000 uV, unless explicitly
> > prevented from doing so by userspace. Whereas the BSP kernel [1] would
> > request different frequencies for different silicon, e.g.
> > opp-1200000000 for a silicon specimen with a leakage value of L4 and
> > opp-1416000000 for a silicon specimen with a leakage value of L8 - all
> > for the same regulator voltage of 875000 uV.
> >
> > So my 2 cents would be: no added benefit in having "lower frequency,
> > same voltage" OPPs defined here until we implement an OPP driver
> > reading the NVMEM programmed leakage values and selecting different
> > *-L* voltages for each OPP depending on those. Once there is this
> > support in the drivers, those OPPs can be added together with
> > leakage-specific voltages (opp-microvolt-L0..11).
>
> I assume this has nothing to do with the NVMEM driver?

There is an OPP driver in BSP code (for v6 BSP kernels) or a custom
cpufreq driver (for v5 BSP kernels) which read factory determined
leakage values for the particular silicon specimen from an OTP (NVMEM)
cell and select different initial voltage values for each OPP based on
that. Those voltages come from opp-microvolt-L0..11 DT properties,
where -L0..11 represent different leakage values. Default is no -L*
suffix.

These are then further tuned in the BSP code by stepping over adjacent
regulator voltages until the PVTPLL provides a frequency closest to
what the OPP defines. The kernel then uses this calibrated voltage for
the particular OPP instead of the original DT provided one, as it
better suits the particular silicon specimen and operating conditions
(PVTPLL = process, voltage and temperature adjusting PLL).

The calibration process can result in varying voltages for PVTPLL
based OPPs, even if the DT listed the same voltage for each, so it
might make sense to list those once OPP calibration support gets
included in mainline driver code. Before then they won't be used by
any of the energy aware governors and would thus only add bloat.

> From [1], we can see that the voltage used by the same board from
> 408MHz to 1008MHz is the same. The actual test is also like this:
>
> The first  board: CPU 408MHz - 1008MHz both 850mV | 1200MHz 862mV
> The second board: CPU 408MHz - 1008MHz both 875mV | 1200MHz 875mV
>
> [1] https://github.com/rockchip-linux/kernel/blob/develop-5.10/arch/arm64/boot/dts/rockchip/rk3528.dtsi#L227-L271
>
> > Right now OPP values with frequencies lower than 1008000000 won't be
> > selected by any of the energy-aware cpufreq governors anyway, because
> > their voltages are the same. Exercise for the reader: try to convince
> > e.g. the "schedutil" governor to select anything below 1008000000
> > without touching
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq :) This may
> > change if OPP tuning logic is implemented such as in [2]: that will
> > try and find the _voltage_ resulting in PVTPLL providing a frequency
> > closest to what cpufreq requested, and use that for the in-memory OPP
> > table instead of what was provided by the DTS.
>
> Thanks for the clarification, so should we remove 408MHz, 600MHz and
> 816MHz from the opp-table? Is this also the case with GPU's opp-table?

I would say yes, drop for now and add them later when the need arises
(i.e. once we have driver support for OPP calibration using
PVTM/PVTPLL feedback).

They do not hurt per se, but they result in something unused right now
becoming part of the device tree ABI, thus limiting options to rethink
stuff later.

Best regards,
Alexey