linux-kernel - [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230527132747.83196-1-frank@oltmanns.dev>
Date:   Sat, 27 May 2023 15:27:44 +0200
From:   Frank Oltmanns <frank@...manns.dev>
To:     linux-arm-kernel@...ts.infradead.org, linux-clk@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-sunxi@...ts.linux.dev
Cc:     Frank Oltmanns <frank@...manns.dev>,
        Andre Przywara <andre.przywara@....com>,
        Chen-Yu Tsai <wens@...e.org>, Icenowy Zheng <icenowy@...c.io>,
        Jernej Skrabec <jernej.skrabec@...il.com>,
        Maxime Ripard <mripard@...nel.org>,
        Michael Turquette <mturquette@...libre.com>,
        Rob Herring <robh@...nel.org>,
        Samuel Holland <samuel@...lland.org>,
        Stephen Boyd <sboyd@...nel.org>
Subject: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks

I would like to bring your attention to the current process of setting the rate
of an NKM clock. As it stands, when setting the rate of an NKM clock, the rate
nearest but less than or equal to the requested rate is found, instead of the
nearest rate. Moreover, ccu_nkm_find_best() is called multiple times (footnote
[1]) when setting a rate, each time iterating over all combinations of n, k, and
m.

In response to this, I propose the following refinements to optimize the NKM
clock setting:
 a. when finding the best rate use the nearest rate, even if it is greater than
    the requested rate (PATCH 1)
 b. utilize binary search to find the best rate by going through a
    precalculated, ordered list of all meaningful combinations of n, k, and m
    (PATCH 2)

To illustrate, consider an NKM clock with min_n = 1, max_n = 16, min_k = 2,
max_k = 4, min_m = 1, and max_m = 16. With the current approach, we have to go
through 1024 combinations, of which only 275 are actually meaningful (the
remaining 749 are combinations that result in the same frequency as other
combinations). So, when selecting from these sorted 275 combinations we find the
closest rate after 9 instead of 1024 steps.

As an example, I calculated the table off-line for the pll-mipi clock of
sun50i-a64 (PATCH 3). However, I have identified two other potential strategies:
 2. calculate before first use and persist for subsequent uses (i.e. never free,
    see footnote [2])
 3. calculate before first use and free after setting the rate.

Each approach carries its own merits. The one I implemented is the most
efficient in terms of computation time but consumes the most memory. The second
saves compute time after the initial use, while the third minimizes memory usage
at the cost of additional computation.

The motivation for these proposed changes lies in the current behavior of rate
selection for NKM clocks, which doesn't observe the CLK_SET_RATE_PARENT flag.
I.e. it does not select a different rate for the parent clock to find the
optimal rate. I believe this is likely due to the fact that selecting a new rate
is quite compute intense, as it would involve iterating or calculating rates for
the parent clock and then testing each rate with different n, k, and m
combinations.

As an example, if the parent is an NM clock, we would have to work through the
combinations of the parent's factors (the parent's n) and divisor (the parent's
m). This results in five nested loops to evaluate all possible rates, an effort
that escalates if the parent could also influence the grandparent's rate. In my
example case (sun50i-a64) the pll-mipi's parent (pll-video0) has 2048
combinations of n and m, of which 1266 are meaningful because the others result
in the same frequency for pll-video0.

If we can come up with a way to iterate over the possible rates of a parent,
this would eventually allow us to make NKM clocks obey the CLK_SET_RATE_PARENT
flag. Because it would only require 11,349 (9 * 1,266) steps instead of
2,097,152 (1,024 * 2,048).

Things I considered but don't have a clear answer to:
 - Is there a specific reason, why currently only clock rates less than the
   requested rate are considered when setting a new rate?
 - Do you think it is worth the memory and increased complexity to be able to
   change the parent's clock rate?

I look forward to hearing your thoughts on these proposed changes. Thank you for
your time and consideration.

Footnotes:
[1] Multiple times because ccu_nkm_find_best is (indirectly) called from clk.c
in
 - clk_core_req_round_rate_nolock()
 - clk_calc_new_rates() (which in turn is called three times for reasons that
   currently elude me)
 - clk_change_rate

[2] Actually, we could free the memory in a new ccu_nkm_terminate() function,
which could become part of ccu_nkm_ops. But if my code searching skills don't
betray me, there is currently no other clock that implements the terminate
function.

Frank Oltmanns (3):
  clk: sunxi-ng: nkm: Minimize difference when finding rate
  clk: sunxi-ng: Implement precalculated NKM rate selection
  clk: sunxi-ng: sun50i-a64: Precalculate NKM combinations for pll-mipi

 drivers/clk/sunxi-ng/ccu-sun50i-a64.c | 281 ++++++++++++++++++++++++++
 drivers/clk/sunxi-ng/ccu_mux.c        |   2 +-
 drivers/clk/sunxi-ng/ccu_nkm.c        |  97 +++++++--
 drivers/clk/sunxi-ng/ccu_nkm.h        |  26 +++
 4 files changed, 385 insertions(+), 21 deletions(-)

-- 
2.40.1