linux-kernel - Re: [PATCH v2 0/7] Introduce SpacemiT K1 PCIe phy and host controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82848c80-15e0-4c0e-a3f6-821a7f4778a5@riscstar.com>
Date: Tue, 28 Oct 2025 14:10:22 -0500
From: Alex Elder <elder@...cstar.com>
To: Johannes Erdfelt <johannes@...felt.com>, robh@...nel.org,
 krzk+dt@...nel.org, conor+dt@...nel.org, bhelgaas@...gle.com,
 lpieralisi@...nel.org, kwilczynski@...nel.org, mani@...nel.org,
 vkoul@...nel.org, kishon@...nel.org, dlan@...too.org, guodong@...cstar.com,
 pjw@...nel.org, palmer@...belt.com, aou@...s.berkeley.edu, alex@...ti.fr,
 p.zabel@...gutronix.de, christian.bruel@...s.st.com, shradha.t@...sung.com,
 krishna.chundru@....qualcomm.com, qiang.yu@....qualcomm.com,
 namcao@...utronix.de, thippeswamy.havalige@....com, inochiama@...il.com,
 devicetree@...r.kernel.org, linux-pci@...r.kernel.org,
 linux-phy@...ts.infradead.org, spacemit@...ts.linux.dev,
 linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/7] Introduce SpacemiT K1 PCIe phy and host controller

On 10/28/25 1:42 PM, Johannes Erdfelt wrote:
> On Tue, Oct 28, 2025, Aurelien Jarno <aurelien@...el32.net> wrote:
>> Hi Alex,
>>
>> On 2025-10-17 11:21, Alex Elder wrote:
>>> On 10/16/25 11:47 AM, Aurelien Jarno wrote:
>>>> Hi Alex,
>>>>
>>>> On 2025-10-13 10:35, Alex Elder wrote:
>>>>> This series introduces a PHY driver and a PCIe driver to support PCIe
>>>>> on the SpacemiT K1 SoC.  The PCIe implementation is derived from a
>>>>> Synopsys DesignWare PCIe IP.  The PHY driver supports one combination
>>>>> PCIe/USB PHY as well as two PCIe-only PHYs.  The combo PHY port uses
>>>>> one PCIe lane, and the other two ports each have two lanes.  All PCIe
>>>>> ports operate at 5 GT/second.
>>>>>
>>>>> The PCIe PHYs must be configured using a value that can only be
>>>>> determined using the combo PHY, operating in PCIe mode.  To allow
>>>>> that PHY to be used for USB, the calibration step is performed by
>>>>> the PHY driver automatically at probe time.  Once this step is done,
>>>>> the PHY can be used for either PCIe or USB.
>>>>>
>>>>> Version 2 of this series incorporates suggestions made during the
>>>>> review of version 1.  Specific highlights are detailed below.
>>>>
>>>> With the issues mentioned in patch 4 fixed, this patchset works fine for
>>>> me. That said I had to disable ASPM by passing pcie_aspm=off on the
>>>> command line, as it is now enabled by default since 6.18-rc1 [1]. At
>>>> this stage, I am not sure if it is an issue with my NVME drive or an
>>>> issue with the controller.
>>>
>>> Can you describe what symptoms you had that required you to pass
>>> "pcie_aspm=off" on the kernel command line?
>>>
>>> I see these lines in my boot log related to ASPM (and added by
>>> the commit you link to), for both pcie1 and pcie2:
>>>
>>>    pci 0000:01:00.0: ASPM: DT platform, enabling L0s-up L0s-dw L1 AS
>>> PM-L1.1 ASPM-L1.2 PCI-PM-L1.1 PCI-PM-L1.2
>>>    pci 0000:01:00.0: ASPM: DT platform, enabling ClockPM
>>>
>>>    . . .
>>>
>>>    nvme nvme0: pci function 0000:01:00.0
>>>    nvme 0000:01:00.0: enabling device (0000 -> 0002)
>>>    nvme nvme0: allocated 64 MiB host memory buffer (16 segments).
>>>    nvme nvme0: 8/0/0 default/read/poll queues
>>>     nvme0n1: p1
>>>
>>> My NVMe drive on pcie1 works correctly.
>>>    https://www.crucial.com/ssd/p3/CT1000P3SSD8
>>>
>>>    root@...anapif3:~# df /a
>>>    Filesystem     1K-blocks     Used Available Use% Mounted on
>>>    /dev/nvme0n1p1 960302804 32063304 879385040   4% /a
>>>    root@...anapif3:~#
>>
>> Sorry for the delay, it took me time to test some more things and
>> different SSDs. First of all I still see the issue with your v3 on top
>> of v6.18-rc3, which includes some fixes for ASPM support [1].
>>
>> I have tried 3 different SSDs, none of them are working, but the
>> symptoms are different, although all related with ASPM (pcie_aspm=off
>> workarounds the issue).
>>
>> With a Fox Spirit PM18 SSD (Silicon Motion, Inc. SM2263EN/SM2263XT
>> controller), I do not have more than this:
>> [    5.196723] nvme nvme0: pci function 0000:01:00.0
>> [    5.198843] nvme 0000:01:00.0: enabling device (0000 -> 0002)
>>
>> With a WD Blue SN570 SSD, I get this:
>> [    5.199513] nvme nvme0: pci function 0000:01:00.0
>> [    5.201653] nvme 0000:01:00.0: enabling device (0000 -> 0002)
>> [    5.270334] nvme nvme0: allocated 32 MiB host memory buffer (8 segments).
>> [    5.277624] nvme nvme0: 8/0/0 default/read/poll queues
>> [   19.192350] nvme nvme0: using unchecked data buffer
>> [   48.108400] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
>> [   48.113885] nvme nvme0: Does your device have a faulty power saving mode enabled?
>> [   48.121346] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
>> [   48.176878] nvme0n1: I/O Cmd(0x2) @ LBA 0, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
>> [   48.181926] I/O error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
>> [   48.243670] nvme 0000:01:00.0: enabling device (0000 -> 0002)
>> [   48.246914] nvme nvme0: Disabling device after reset failure: -19
>> [   48.280495] Buffer I/O error on dev nvme0n1, logical block 0, async page read
>>
>>
>> Finally with a PNY CS1030 SSD (Phison PS5015-E15 controller), I get this:
>> [    5.215631] nvme nvme0: pci function 0000:01:00.0
>> [    5.220435] nvme 0000:01:00.0: enabling device (0000 -> 0002)
>> [    5.329565] nvme nvme0: allocated 64 MiB host memory buffer (16 segments).
>> [   66.540485] nvme nvme0: I/O tag 28 (401c) QID 0 timeout, disable controller
>> [   66.585245] nvme 0000:01:00.0: probe with driver nvme failed with error -4
>>
>> Note that I also tested this latest SSD on a VisionFive 2 board with exactly
>> the same kernel (I just moved the SSD and booted), and it works fine with ASPM
>> enabled (confirmed with lspci).
> 
> I have been testing this patchset recently as well, but on an Orange Pi
> RV2 board instead (and an extra RV2 specific patch to enable power to
> the M.2 slot).
> 
> I ran into the same symptoms you had ("QID 0 timeout" after about 60
> seconds). However, I'm using an Intel 600p. I can confirm my NVME drive
> seems to work fine with the "pcie_aspm=off" workaround as well.

I don't see this problem, and haven't tried to reproduce it yet.

Mani told me I needed to add these lines to ensure the "runtime
PM hierarchy of PCIe chain" won't be "broken":

	pm_runtime_set_active()
	pm_runtime_no_callbacks()
	devm_pm_runtime_enable()

Just out of curiosity, could you try with those lines added
just before these assignments in k1_pcie_probe()?

	k1->pci.dev = dev;
	k1->pci.ops = &k1_pcie_ops;
	dw_pcie_cap_set(&k1->pci, REQ_RES);

I doubt it will fix what you're seeing, but at the moment I'm
working on something else.

Thanks.

					-Alex

> Of note, I don't have this problem with the vendor 6.6.63 kernel.
> 
>>> I basically want to know if there's something I should do with this
>>> driver to address this.  (Mani, can you explain?)
>>
>> I am not sure on my side how to debug that. What I know is that it is
>> linked to ASPM L1, L0 works fine. In other words the SSDs work fine with
>> this patch:
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 79b9651584737..1a134ec68b591 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -801,8 +801,8 @@ static void pcie_aspm_override_default_link_state(struct pcie_link_state *link)
>>   	if (of_have_populated_dt()) {
>>   		if (link->aspm_support & PCIE_LINK_STATE_L0S)
>>   			link->aspm_default |= PCIE_LINK_STATE_L0S;
>> -		if (link->aspm_support & PCIE_LINK_STATE_L1)
>> -			link->aspm_default |= PCIE_LINK_STATE_L1;
>> +//		if (link->aspm_support & PCIE_LINK_STATE_L1)
>> +//			link->aspm_default |= PCIE_LINK_STATE_L1;
>>   		override = link->aspm_default & ~link->aspm_enabled;
>>   		if (override)
>>   			pci_info(pdev, "ASPM: default states%s%s\n",
>>
>> I can test more things if needed, but I don't know where to start.
> 
> I'm not a PCIe expert, but I'm more than happy to test as well.
> 
> JE
>