[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9a9280e7-d29a-475a-83fa-671acfab9d92@packett.cool>
Date: Mon, 1 Dec 2025 03:48:13 -0300
From: Val Packett <val@...kett.cool>
To: Manivannan Sadhasivam <mani@...nel.org>,
Bjorn Helgaas <helgaas@...nel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@....qualcomm.com>,
bhelgaas@...gle.com, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org, Konrad Dybcio
<konrad.dybcio@....qualcomm.com>,
Alexey Bogoslavsky <Alexey.Bogoslavsky@...disk.com>,
Jeffrey Lien <Jeff.Lien@...disk.com>, Avinash M N <Avinash.M.N@...disk.com>
Subject: Re: [PATCH v2] PCI: Add quirk to disable ASPM L1 for Sandisk SN740
NVMe SSDs
On 11/25/25 2:21 AM, Manivannan Sadhasivam wrote:
> [..]
> There are a couple of points that made me convince myself:
>
> * Other X1E laptops are working fine with ASPM L1.
> * This laptop has WCN785x WiFi/BT combo card connected to the other controller
> instance and L1 is working fine for it.
> * There is no known issue with ASPM L1 in X1E chipsets.
>
> Because of these, I was so certain that the NVMe is the fault here.
There is *a* known issue with ASPM L1 on X1E, reported by maaaany users
on #aarch64-laptops, that we discussed in another thread..
But it is a full system freeze, **not** a correctable AER message, and
it definitely happens with a bunch of various SSDs on various laptops. I
personally have had it happen both with the SN740 and an SK Hynix drive,
on a Latitude 7455. It's an SSD-only issue (disabling ASPM just for the
drive, but keeping it on for the WiFi, was enough to get to month-long
uptime) but not specific to any SSD model.
One bit of news I have about it is that I recently started using EL2
(slbounce), and I did see something that looked like that hang.. but
unlike in EL1, right before the reboot the panic LED did start blinking.
So if that was indeed from the same issue, I should now be able to catch
it into pstore (if pstore works.. trying blk with sdhc instead of efi
now 0.o) Maybe QHEE was eating the fault and itself crashing, since it
"owns" the PCIe IOMMU when it's running.. (???)
~val
Powered by blists - more mailing lists