lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <856447ae-4338-471d-a71f-a34aed749ac7@nvidia.com>
Date: Wed, 10 Dec 2025 04:08:04 +0000
From: Jon Hunter <jonathanh@...dia.com>
To: Aaron Kling <webgeek1234@...il.com>
Cc: Krzysztof Kozlowski <krzk@...nel.org>, Rob Herring <robh@...nel.org>,
 Conor Dooley <conor+dt@...nel.org>, Thierry Reding
 <thierry.reding@...il.com>, linux-kernel@...r.kernel.org,
 devicetree@...r.kernel.org, linux-tegra@...r.kernel.org
Subject: Re: [PATCH v4 3/5] memory: tegra186-emc: Support non-bpmp icc scaling


On 21/11/2025 18:17, Aaron Kling wrote:
> On Fri, Nov 21, 2025 at 5:21 AM Jon Hunter <jonathanh@...dia.com> wrote:
>>
>>
>> On 12/11/2025 07:21, Aaron Kling wrote:
>>> On Wed, Nov 12, 2025 at 12:18 AM Jon Hunter <jonathanh@...dia.com> wrote:
>>>>
>>>>
>>>> On 11/11/2025 23:17, Aaron Kling wrote:
>>>>
>>>> ...
>>>>
>>>>> Alright, I think I've got the picture of what's going on now. The
>>>>> standard arm64 defconfig enables the t194 pcie driver as a module. And
>>>>> my simple busybox ramdisk that I use for mainline regression testing
>>>>> isn't loading any modules. If I set the pcie driver to built-in, I
>>>>> replicate the issue. And I don't see the issue on my normal use case,
>>>>> because I have the dt changes as well.
>>>>>
>>>>> So it appears that the pcie driver submits icc bandwidth. And without
>>>>> cpufreq submitting bandwidth as well, the emc driver gets a very low
>>>>> number and thus sets a very low emc freq. The question becomes... what
>>>>> to do about it? If the related dt changes were submitted to
>>>>> linux-next, everything should fall into place. And I'm not sure where
>>>>> this falls on the severity scale since it doesn't full out break boot
>>>>> or prevent operation.
>>>>
>>>> Where are the related DT changes? If we can get these into -next and
>>>> lined up to be merged for v6.19, then that is fine. However, we should
>>>> not merge this for v6.19 without the DT changes.
>>>
>>> The dt changes are here [0].
>>
>> To confirm, applying the DT changes do not fix this for me. Thierry is
>> having a look at this to see if there is a way to fix this.
>>
>> BTW, I have also noticed that Thierry's memory frequency test [0] is
>> also failing on Tegra186. The test simply tries to set the frequency via
>> the sysfs and this is now failing. I am seeing ...
>>
>> memory: emc: - available rates: (* = current)
>> memory: emc:   -   40800000
>> memory: emc:   -   68000000
>> memory: emc:   -  102000000
>> memory: emc:   -  204000000
>> memory: emc:   -  408000000
>> memory: emc:   -  665600000
>> memory: emc:   -  800000000
>> memory: emc:   - 1062400000
>> memory: emc:   - 1331200000
>> memory: emc:   - 1600000000
>> memory: emc:   - 1866000000 *
>> memory: emc: - testing:
>> memory: emc:   -   40800000...OSError: [Errno 34] Numerical result out
>> of range
> 
> Question. Does this test run and pass on jetson-tk1? I based the
> tegra210 and tegra186 [0] code on tegra124 [1]. And I don't see a
> difference in the flow now. What appears to be happening is that icc
> is reporting a high bandwidth, setting the emc min_freq to something
> like 1600MHz. Then debugfs is having max_freq set to something low
> like 40.8MHz. Then the linked code block fails because the higher of
> the min_freqs is greater than the lower of the max_freqs. But if this
> same test is run on jetson-tk1, I don't see how it passes. Unless
> maybe the t124 actmon is consistently setting min freqs during the
> tests.

So we don't currently run this test on Tegra124. We could certainly try. 
I don't recall if there was an issue that prevented us from doing so now.

> An argument could be made that any attempt to set debugfs should win a
> conflict with icc. That could be done. But if that needs done here,
> I'd argue that it needs replicated across all other applicable emc
> drivers too.

The bottom line is that we cannot regress anything that was working before.

Jon

-- 
nvpublic


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ