linux-kernel - Re: TI K3 AM69 Kernel Panic when PCIe Controller is Enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ddc4e2df0a5593d4a6051057c6406db338f4c0ba.camel@ti.com>
Date: Thu, 6 Nov 2025 12:13:07 +0530
From: Siddharth Vadapalli <s-vadapalli@...com>
To: João Paulo Gonçalves
	<jpaulo.silvagoncalves@...il.com>
CC: Nishanth Menon <nm@...com>, Vignesh Raghavendra <vigneshr@...com>, "Kishon
 Vijay Abraham I" <kishon@...com>, Swapnil Jakhade <sjakhade@...ence.com>,
	Andrew Davis <afd@...com>, Francesco Dolcini <francesco@...cini.it>,
	João Paulo Gonçalves
	<joao.goncalves@...adex.com>, <linux-arm-kernel@...ts.infradead.org>,
	<linux-kernel@...r.kernel.org>, Siddharth Vadapalli <s-vadapalli@...com>
Subject: Re: TI K3 AM69 Kernel Panic when PCIe Controller is Enabled

On Wed, 2025-11-05 at 11:10 -0300, João Paulo Gonçalves wrote:
> Hi Siddharth,
> 
> > The E2E thread above leads to another one where the issue was claimed to be
> > seen only with the usage of an external reference clock, and it was fixed
> > with the usage of the internal reference clock. Does this hold true for the
> > board that you are using as well?
> 
> No, we changed to use the internal reference clocks on the current
> hardware revision (sent upstream on [1]) and still have the same issue.
> Please look at the PCIe nodes in [1] so you can confirm this. For
> example:
> 
> //file k3-am69-aquila.dtsi
> /* Aquila PCIE_1 */
> &pcie0_rc {
> 	pinctrl-names = "default";
> 	pinctrl-0 = <&pinctrl_pcie0_reset>;
> 	clocks = <&k3_clks 332 0>, <&serdes1 CDNS_TORRENT_REFCLK_DRIVER>;
> 	clock-names = "fck", "pcie_refclk";
> 	num-lanes = <2>;
> 	phy-names = "pcie-phy";
> 	phys = <&serdes1_pcie0_2l_link>;
> 	reset-gpios = <&main_gpio0 32 GPIO_ACTIVE_HIGH>;
> 	ti,syscon-acspcie-proxy-ctrl = <&acspcie1_proxy_ctrl 0x3>;
> 	status = "disabled";
> };
> 
> [1] https://lore.kernel.org/lkml/20251104144915.60445-1-francesco@dolcini.it/

Thank you for the details. From the logs shared in your email at:
https://lore.kernel.org/r/pod3anzbqdwl3l2zldz4sd47rtbruep72ehaf7kwcuh2bgflb2@y4ox65e66mkj/
the following lines make me suspect that the issue is related to PCIe ASPM
(Active State Power Management):

[    7.480637] pci 0000:01:00.0: ASPM: DT platform, enabling L0s-up L0s-dw
L1 ASPM-L1.1 ASPM-L1.2 PCI-PM-L1.1 PCI-PM-L1.2
[    7.493685] pci 0000:01:00.0: ASPM: DT platform, enabling ClockPM

I have two suggestions:
1. Disable ASPM using the Linux commandline option:
pcie_aspm=off
If the bootloader that you are using is U-Boot, you could run:
setenv optargs pcie_aspm=off
at U-Boot prompt before booting Linux.
2. I had seen an ASPM issue long back in 2022 and had narrowed it down to
the Data Link Layer being inactive when the PCIe Core in Linux accesses the
Configuration Space of the PCIe Endpoint:
https://lore.kernel.org/r/faa13ac2-27b6-94f3-ecde-60256bbbda1b@ti.com/
The fix for it is the patch to which I have replied above. Direct link to
the patch is:
https://lore.kernel.org/r/20220602065544.2552771-1-nathan@nathanrossi.com/
and it modifies the ASPM driver to wait for sufficient time if the PCIe
Controller doesn't have the
ability to report the Data Link Layer state (this is the case for the PCIe
Controller on the AM69 and other K3 SoCs from TI).

Please test them and let me know the results.

Regards,
Siddharth.