[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f906f85f-b110-4328-b177-02fcdf7ffe53@nvidia.com>
Date: Wed, 10 Dec 2025 15:03:50 +0000
From: Jon Hunter <jonathanh@...dia.com>
To: Aaron Kling <webgeek1234@...il.com>
Cc: Krzysztof Kozlowski <krzk@...nel.org>, Rob Herring <robh@...nel.org>,
Conor Dooley <conor+dt@...nel.org>, Thierry Reding
<thierry.reding@...il.com>, linux-kernel@...r.kernel.org,
devicetree@...r.kernel.org, linux-tegra@...r.kernel.org
Subject: Re: [PATCH v4 3/5] memory: tegra186-emc: Support non-bpmp icc scaling
On 10/12/2025 05:06, Aaron Kling wrote:
...
> Let me try to iterate the potential issues I've seen stated here. If
> I'm missing anything, please fill in the blanks.
>
> 1) If this change is applied without the related dt change and the
> pcie drvier is loaded, the emc clock can become stuck at the lowest
> rate. This is caused by the pcie driver providing icc data, but
> nothing else is. So the very low requested bandwidth results in the
> emc clock being set very low. I'm not sure there is a 'fix' for this,
> beyond making sure the dt change is merged to ensure that the cpufreq
> driver provides bandwidth info, causing the emc driver to select a
> more reasonable emc clock rate. This is a similar situation to what's
> currently blocking the tegra210 actmon series. I don't think there is
> a way for the drivers to know if icc data is missing/wrong. The
> scaling is doing exactly what it's told based on the icc routing given
> in the dt.
So this is the fundamental issue with this that must be fixed. We can't
allow the PCIe driver to slow the system down. I think that Krzysztof
suggested we need some way to determine if the necessary ICC clients are
present/registered for ICC to work. Admittedly, I have no idea if there
is a simple way to do this, but we need something like that.
> 2) Jon, you report that even with both this change and the related dt
> change, that the issue is still not fixed. But then posted a log
> showing that the emc rate is set to max. If the issue is that emc rate
> is too low, then how can debugfs report that the rate is max? For
> reference, everything scales as expected for me given this change plus
> the dt change on both p2771 and p3636+p3509.
To clarify, this broke the boot test on Tegra194 because the boot was
too slow. However, this also broke the EMC test on Tegra186 because
setting the frequency from the debugfs failed. So two different failures
on two different devices. I am guessing the EMC test would also fail on
Tegra194, but given that it does not boot, we did not get that far.
> 3) If icc is requesting enough bandwidth to set the emc clock to a
> high value, then a user tries to set debugfs max_freq to a lower
> value, this code will reject the change. I do not believe this is an
> issue unique to this code. tegra20-emc, tegra30-emc, and tegra124-emc
> all have this same flow. And so does my proposed change to
> tegra210-emc-core in the actmon series. This is why I asked if
> tegra124 ran this test, to see if the failure was unique. If this is
> not a unique failure, then I'd argue that all instances need changed,
> not just this one causing diverging results depending on the soc being
> utilized. A lot of the work I'm doing is to try to bring unity and
> feature parity to all the tegra socs I'm working on. I don't want to
> cause even more divergence.
Yes that is fair point, however, we need to detect this in the
tegra-tests so that we know that this will not work. It would be nice if
we could disable ICC from userspace and then run the test.
Bottom line here is that #1 is the problem that needs to be fixed.
Jon
--
nvpublic
Powered by blists - more mailing lists