lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALHNRZ-PGV9OcuB4aGsqw+aj5xUpRTEd4_+v7=j9=oMo9rk0oQ@mail.gmail.com>
Date: Thu, 18 Dec 2025 15:20:30 -0600
From: Aaron Kling <webgeek1234@...il.com>
To: Jon Hunter <jonathanh@...dia.com>
Cc: Krzysztof Kozlowski <krzk@...nel.org>, Rob Herring <robh@...nel.org>, Conor Dooley <conor+dt@...nel.org>, 
	Thierry Reding <thierry.reding@...il.com>, linux-kernel@...r.kernel.org, 
	devicetree@...r.kernel.org, linux-tegra@...r.kernel.org
Subject: Re: [PATCH v4 3/5] memory: tegra186-emc: Support non-bpmp icc scaling

On Thu, Dec 18, 2025 at 1:25 PM Aaron Kling <webgeek1234@...il.com> wrote:
>
> On Thu, Dec 18, 2025 at 5:12 AM Jon Hunter <jonathanh@...dia.com> wrote:
> >
> >
> > On 17/12/2025 22:44, Aaron Kling wrote:
> >
> > ...
> >
> > >> Thanks I added all these on top of next-20251216 (as that is the latest
> > >> I have tested) and Tegra194 fails to boot. We always include all the
> > >> modules in the rootfs that is being tested. You can see the boot log
> > >> here [0]. We are using an NFS rootfs for testing and I see a message
> > >> related to the NFS server not responding. I am guessing something is
> > >> running too slow again because the only thing I changed was adding your
> > >> patches. The test harness reports it is timing out ...
> > >>
> > >> FAILED: Linux Boot Test 1
> > >>          Test Owner(s): N/A
> > >>          Execution Time 219.31 sec
> > >>          Test TIMEOUT reached. Test did not report results in 120 secs
> > >>          Percent passed so far: 0.0
> > >
> > > Okay, so. Modules are in the rootfs, none get copied to the initramfs?
> > > And the rootfs is on nfs? And for this failure, nfs never gets
> > > mounted. So... for this case, no modules get loaded, implying that
> > > whatever is happening is happening with the built-in drivers. Which
> > > means this case isn't pcie related. Are there any modifications to the
> > > defconfig? It appears that there must be, to have dwc-eth-dwmac
> > > available. I will see if I can trigger anything when using ethernet.
> >
> > If you look at the boot log you will see ...
> >
> > [    7.839012] Root device found: nfs
> > [    7.908307] Ethernet interface: eth0
> > [    7.929765] IP Address: 192.168.99.2
> > [    8.173978] Rootfs mounted over nfs
> > [    8.306291] Switching from initrd to actual rootfs
> >
> > So it does mount the rootfs and so the modules would be loaded. I
>
> But the bottom of the log says:
> [ 188.360095] nfs: server 192.168.99.1 not responding, still trying
>
> So does it mount nfs and load modules, and *then* fail to talk to the
> nfs server? That doesn't make any sense. And I don't see any logs from
> driver probes after the rootfs line. And there's sync_state lines
> stating that pcie among others isn't available.
>
> > believe that PCIe is definitely loaded because that is what I observed
> > before. And yes there are a few modifications to the defconfig that we
> > make on top (that have been added over the years for various reasons) ...
> >
> > CONFIG_ARM64_PMEM=y
> > CONFIG_BROADCOM_PHY=y
> > CONFIG_DWMAC_DWC_QOS_ETH=y
> > CONFIG_EEPROM_AT24=m
> > CONFIG_EXTRA_FIRMWARE="nvidia/tegra210/xusb.bin nvidia/tegra186/xusb.bin
> > nvidia/tegra194/xusb.bin rtl_nic/rtl8153a-3.fw rtl_nic/rtl8168h-2.fw"
> > CONFIG_EXTRA_FIRMWARE_DIR="${KERNEL_FW_DIR}"
> > CONFIG_MARVELL_PHY=y
> > CONFIG_R8169=y
> > CONFIG_RANDOMIZE_BASE=n
> > CONFIG_SERIAL_TEGRA_TCU=y
> > CONFIG_SERIAL_TEGRA_TCU_CONSOLE=y
> > CONFIG_STAGING=y
> > CONFIG_STAGING_MEDIA=y
> > CONFIG_STMMAC_ETH=y
> > CONFIG_STMMAC_PLATFORM=y
> > CONFIG_USB_RTL8152=y
> > CONFIG_VIDEO_TEGRA=m
> > CONFIG_VIDEO_TEGRA_TPG=y
> > CONFIG_DWMAC_TEGRA=y
>
> I will incorporate these to a build and see if I get any different results.
>
> > Looking at the boot log I see ...
> >
> > [    3.854658] cpu cpu0: cpufreq_init: failed to get clk: -2
> > [    3.854927] cpu cpu0: cpufreq_init: failed to get clk: -2
> > [    3.855218] cpu cpu2: cpufreq_init: failed to get clk: -2
> > [    3.858438] cpu cpu2: cpufreq_init: failed to get clk: -2
> > [    3.863987] cpu cpu4: cpufreq_init: failed to get clk: -2
> > [    3.869741] cpu cpu4: cpufreq_init: failed to get clk: -2
> > [    3.875006] cpu cpu6: cpufreq_init: failed to get clk: -2
> > [    3.880725] cpu cpu6: cpufreq_init: failed to get clk: -2
> > [    3.886018] cpufreq-dt cpufreq-dt: failed register driver: -19
> >
> > So actually, I am now wondering if this is the problem?
>
> These lines are from cpufreq-dt trying to manage the cpu's directly,
> which it's not supposed to do. tegra194-cpufreq is supposed to manage
> them. I see these lines as well, when things are operating as
> expected. The real driver doesn't log anything, but the policies are
> visible in sysfs. I did a little bit of digging previously to see if I
> could remove the log churn, but was unable to do so. I would have to
> double check to be completely sure, but I am fairly certain I saw
> these lines before my changes as well. It's something that would be
> good to get fixed, but I don't think it's operable here.

Turns out, this is actually semi-operable. There's a blocklist in the
cpufreq-dt driver that includes all tegra archs <= t234 except for
t186 and t194. If I add t194 to that list, then the log lines go away.
However, it does not fix the nfs boot issue. I was finally able to
replicate it by setting up my own nfs rootfs. This series does not
affect it though, fwiw, it's the dt series that triggers this. Before
it, nfsroot boots as expected. After it, the reported issue happens.
After adding t194 to the cpufreq-dt blocklist, the issue still
happens. But... if I add "blacklist=cpufreq-dt" to the kernel
bootargs, nfs works again. I don't get this.

So, summary:
* Adding opp tables to the cpu nodes causes cpufreq-dt to try to
handle cpufreq for the soc
* Adding tegra194 to the cpufreq-dt-platdev blocklist stops log
messages about the attempt
* However, it still affects the ethernet driver, causing watchdog
timeouts and adapter resets
* Blacklisting the cpufreq-dt driver entirely prevents the issue

I'm not sure what to make of this. Anyone have thoughts? I will send a
patch separately to add t186 and t194 to the cpufreq-dt-platdev block
list as this needs to happen in any case.

Aaron

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ