linux-kernel - Re: [PATCH v2 5/7] PCI: spacemit: introduce SpacemiT PCIe host driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d5d44b8b-94a5-4830-8f61-0b8cbd72889c@riscstar.com>
Date: Fri, 31 Oct 2025 08:38:04 -0500
From: Alex Elder <elder@...cstar.com>
To: Manivannan Sadhasivam <mani@...nel.org>
Cc: robh@...nel.org, krzk+dt@...nel.org, conor+dt@...nel.org,
 bhelgaas@...gle.com, lpieralisi@...nel.org, kwilczynski@...nel.org,
 vkoul@...nel.org, kishon@...nel.org, dlan@...too.org, guodong@...cstar.com,
 pjw@...nel.org, palmer@...belt.com, aou@...s.berkeley.edu, alex@...ti.fr,
 p.zabel@...gutronix.de, christian.bruel@...s.st.com, shradha.t@...sung.com,
 krishna.chundru@....qualcomm.com, qiang.yu@....qualcomm.com,
 namcao@...utronix.de, thippeswamy.havalige@....com, inochiama@...il.com,
 devicetree@...r.kernel.org, linux-pci@...r.kernel.org,
 linux-phy@...ts.infradead.org, spacemit@...ts.linux.dev,
 linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 5/7] PCI: spacemit: introduce SpacemiT PCIe host driver

On 10/31/25 1:05 AM, Manivannan Sadhasivam wrote:
> On Wed, Oct 29, 2025 at 07:10:10PM -0500, Alex Elder wrote:
>> On 10/28/25 2:06 AM, Manivannan Sadhasivam wrote:
>>> On Mon, Oct 27, 2025 at 05:24:38PM -0500, Alex Elder wrote:
>>>> On 10/26/25 11:55 AM, Manivannan Sadhasivam wrote:
>>>>> On Mon, Oct 13, 2025 at 10:35:22AM -0500, Alex Elder wrote:
>>>>>> Introduce a driver for the PCIe host controller found in the SpacemiT
>>>>>> K1 SoC.  The hardware is derived from the Synopsys DesignWare PCIe IP.
>>>>>> The driver supports three PCIe ports that operate at PCIe gen2 transfer
>>>>>> rates (5 GT/sec).  The first port uses a combo PHY, which may be
>>>>>> configured for use for USB 3 instead.
>>>>>>
>>>>>> Signed-off-by: Alex Elder <elder@...cstar.com>
>>>>>> ---
>>
>> . . .
>>
>>>>>> +	ret = devm_regulator_get_enable(dev, "vpcie3v3-supply");
>>>>>> +	if (ret)
>>>>>> +		return dev_err_probe(dev, ret,
>>>>>> +				     "failed to get \"vpcie3v3\" supply\n");
>>>>>
>>>>> As mentioned in the bindings patch, you should rely on the PWRCTRL_SLOT driver
>>>>> to handle the power supplies. It is not yet handling the PERST#, but I have a
>>>>> series floating for that:
>>>>> https://lore.kernel.org/linux-pci/20250912-pci-pwrctrl-perst-v3-0-3c0ac62b032c@oss.qualcomm.com/
>>>>
>>>> I think that just means that I'll define a DT node compatible with
>>>> "pciclass,0604", and in that node I'll specify the vpcie3v3-supply
>>>> property.  That will cause that (pwrctrl) device to get and enable
>>>> the supply before the "real" PCIe device probes.
>>>>
>>>
>>> Right.
>>>
>>>> And once your PERST work gets merged into the PCI power control
>>>> framework, a callback will allow that to assert PERST# as needed
>>>> surrounding power transitions.  (But I won't worry about that
>>>> for now.)
>>>>
>>>
>>> I'm still nervous to say that you should not worry about it (about not
>>> deasserting PERST# at the right time) as it goes against the PCIe spec.
>>> Current pwrctrl platforms supporting PERST# are working fine due to sheer luck.
>>>
>>> So it would be better to leave the pwrctrl driver out of the equation now and
>>> enable the supply in this driver itself. Later, once my pwrctrl rework gets
>>> merged, I will try to switch this driver to use it.
>>
>> As I understand it, PERST# should be only be deasserted after
>> all power rails are known to be stable.
>>
> 
> Yes
> 
>> This driver enables the regulator during probe, shortly
>> before calling dw_pcie_host_init().  That function calls
>> back to k1_pcie_init(), which enables clocks, deasserts
>> resets, and initializes the PHY before it changes the
>> PERST# state.
>>
>> By "changing the PERST# state" I mean it is asserted
>> (driven low), then deasserted after 100 milliseconds
>> (PCIE_T_PVPERL_MS).
>>
>> I have two questions on this:
>> - You say the PCI spec talks about the "right time" to
>>    deassert PERST# (relative to power).  Is that at all
>>    related to PCIE_T_PVPERL_MS?
> 
> The PCI CEM spec says that PERST# should be deasserted atleast 100ms after the
> power becomes stable. But with the current pwrctrl design, the host controller

So it *is* related to that delay, but the concern you have is about
the pwrctrl design.  Simply probing the pwrctrl device enables all
its regulators, but that probe is not synchronized with the host
controller driver.  In this driver, PERST# is deasserted in the
dw_pcie_host_ops->init callback, which could be called *before*
the pwrctrl probe, and we want it to occur only *after* it.

I find on my system that (at least based on printk() time stamps)
the PERST# is deasserted *before* regulator_bulk_enable() in
pci_pwrctrl_slot_probe() has completed.  So you're absolutely
right about the problem.  The NVMe device doesn't probe until
at least 5 seconds later.

I don't want to rely on this sketchy behavior, so here is what
I plan to do (please tell me whether you agree).

I will back out the change that moves the regulator into the
root port.  I will still use the pwrctrl model--meaning the
root port will still be compatible with "pciclass,0604"--but
because it will define no regulators or clocks, it just won't
do anything of value for now.

Later, when you add the callback to the pwrctrl framework,
we can implement that callback and then move the regulator
definitions into the root port, and no longer enable them
in the host controller driver.

> deasserts the PERST# even before the pwrctrl probe. So this is in violation of
> the spec. But depending on the endpoint device design, this might not cause any
> issue as PERST# is a level triggered signal. So once the endpoint boots up, it
> will see the PERST# deassert and will start working. I'm not justifying the
> current design, but just mentioning that you might not see any issue.

I'm not seeing any issue, but it's true, the sequence of events
is not compliant with what you describe from the spec.

> That being said, we are going to submit a series that reworks pwrctrl framework
> such that each controller can call an API to probe pwrctrl drivers. This way,
> host controller driver can make sure that the device will get powered ON before
> it deasserts the PERST#.

For DesignWare-based devices that might mean a new dw_pcie_host_ops
callback function, or maybe it can just defer calling its ->init
callback until it knows power is enabled and stable.

>> - I notice that PERST# is in a deasserted state at the
>>    time I assert it in this sequence.  Do you see any
>>    reason I should assert it early as an initialization
>>    step, or is asserting it and holding it there for
>>    100 msec sufficient?
>>
> 
> You should assert PERST# before doing any controller initialization sequence as
> that may affect the endpoint. Once PERST# is asserted, it will cause the
> endpoint to 'reset'. So you do your initialization sequence and deassert it once
> done.

Basically the probe function does basic setup, then calls
dw_pcie_host_init().  The next thing that's expected is
the ->init callback from the DesignWare core code, and
that's where PERST# is asserted, then deasserted after
some other actions.

I think I'll assert PERST# (and delay 100 msec) a little
earlier though.

. . .

>>>> In dw_handle_msi_irq(), "num_ctrls" is computed based on
>>>> num_vectors / MAX_MSI_IRQS_PER_CTRL=32.  A loop then
>>>> iterates over those "controllers"(?) to handle each bit
>>>> set in their corresponding register.
>>>>
>>>> This seems OK.  Can you explain why you think it's wrong?
>>>>
>>>
>>> So both 'ctrl' and 'msi' IRQs are interrelated. Each 'ctrl' can have upto 32 MSI
>>> vectors only. If your platform supports more than 32 MSI vectors, like 256, then
>>> the platform DT should provide 8 'msi' IRQs.
>>
>> I have asked SpacemiT about this, specifically whether there
>> are additional interrupts (I don't think there are), or if
>> not that, additional registers to support MSI 32+ (see
>> below).  In their downstream driver they handle interrupts
>> differently.  I suspect num_vectors needs to be set to 32
>> (or I'll leave it unset and take the default).

This was changed to use the default (32 MSI vectors) in v4
of this series.  I haven't heard back from SpacemiT but I'm
pretty sure this is correct.

>> In the DesignWare driver, there are up to 8 "ctrls", and each
>> ctrl has 32 bit positions representing 32 MSI vectors.  Each
>> can have an msi_irq defined.  An msi_irq is always set up for
>> ctrl 0.
>>
>> For any ctrl with an msi_irq assigned, dw_pcie_msi_host_init()
>> sets its interrupt handler to dw_chained_msi_isr(), which just
>> calls dw_handle_msi_irq().
>>
>> The way dw_handle_msi_irq() works, a single ctrl apparently can
>> handle up to 256 MSI vectors, as long as the block of 3 registers
>> that manage the ctrl (ENABLE, MASK, and STATUS presumably) are
>> consecutive in I/O memory for consecutive ctrls.
>>
> 
> I'm not sure how you came up with this observation. dw_handle_msi_irq() loops
> over the 'status' using find_next_bit() of size MAX_MSI_IRQS_PER_CTRL, which is
> 32. So I don't see how a single ctrl can handle up to 256 vectors.

This doesn't matter, because you say it's not correct, but I'll
explain what I meant.

I'm saying *if* there were 8 consecutive sets of 3 registers for
these ctrls:

           ----------
ctrl0:    | ENABLE |
           |--------|
           | STATUS |
           |--------|
           |  MASK  |
           |--------|
ctrl1:    | ENABLE |
           |--------|
           | STATUS |
           |--------|
           |  MASK  |
           |--------|
              ...

then they could all be handled by a single interrupt, based on
the way the loop works.

If every ctrl has its own interrupt, then the interrupt handler
could just handle the one ctrl's 32 possible interrupts.

This loop structure is why I thought it was OK to have 256 MSI
vectors with the one handler.

>> I looked for other examples.  I see that "pcie-fu740.c", which
>> supports compatible "sifive,fu740-pcie", sets num_vectors to
>> MAX_MSI_IRQS, but "fu740-c000.dtsi" defines just one "msi"
>> interrupt.  And "pci-dra7xx.c" seems to do something similar,
>> and maybe "pcie-rcar-gen4.c" too.
>>
> 
> Yes. But I think those are mistakes. I will submit patches to fix them.
OK.  They reinforced my thought that this was doing the right
thing.  The warning you added makes it clear it is not.

Thank you very much for your explanation Mani.

					-Alex

>>> Currently the driver is not strict about this requirement. I will send a patch
>>> to print an error message if this requirement is not satisfied.
>>>
>>>>>> +
>>>>>> +	platform_set_drvdata(pdev, k1);
>>>>>> +
>>>>>
>>>>> For setting the correct runtime PM state of the controller, you should do:
>>>>>
>>>>> pm_runtime_set_active()
>>>>> pm_runtime_no_callbacks()
>>>>> devm_pm_runtime_enable()
>>>>
>>>> OK, that's easy enough.
>>>>
>>>>> This will fix the runtime PM hierarchy of PCIe chain (from host controller to
>>>>> client drivers). Otherwise, it will be broken.
>>>> Is this documented somewhere?  (It wouldn't surprise me if it
>>>> is and I just missed it.)
>>>>
>>>
>>> Sorry no. It is on my todo list. But I'm getting motivation now.
>>>
>>>> This driver has as its origins some vendor code, and I simply
>>>> removed the runtime PM calls.  I didn't realize something would
>>>> be broken without making pm_runtime*() calls.
>>>>
>>>
>>> It is the PM framework requirement to mark the device as 'active' to allow it to
>>> participate in runtime PM. If you do not mark it as 'active' and 'enable' it,
>>> the framework will not allow propagating the runtime PM changes before *this*
>>> device. For instance, consider the generic PCI topology:
>>>
>>> PCI controller (platform device)
>>> 	|
>>> PCI host bridge
>>> 	|
>>> PCI Root Port
>>> 	|
>>> PCI endpoint device
>>>
>>> If the runtime PM is not enabled for the PCI Root Port, then if the PCI endpoint
>>> device runtime suspends, it will not trigger runtime suspend for the Root Port
>>> and also for the PCI controller. Also, since the runtime PM framework doesn't
>>> have the visibility of the devices underneath the bus (like endpoint), it may
>>> assume that no devices (children) are currently active and may trigger runtime
>>> suspend of the Root Port (parent) even though the endpoint device could be
>>> 'active'.
>>
>> So this basically marks this controller as a pass-through device that
>> doesn't itself change state for runtime PM, but still communicates that
>> somewhere at or below it there might be devices that do participate.
> 
> Yes.
> 
> - Mani