lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3e1ffe72-b6a4-45cc-a053-190077818f19@collabora.com>
Date: Wed, 5 Nov 2025 12:32:04 +0100
From: AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
To: Chen-Yu Tsai <wenst@...omium.org>, Bartosz Golaszewski <brgl@...ev.pl>,
 Manivannan Sadhasivam <mani@...nel.org>
Cc: Matthias Brugger <matthias.bgg@...il.com>,
 Ryder Lee <ryder.lee@...iatek.com>, Jianjun Wang
 <jianjun.wang@...iatek.com>, Lorenzo Pieralisi <lpieralisi@...nel.org>,
 Krzysztof Wilczyński <kwilczynski@...nel.org>,
 Rob Herring <robh@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
 linux-pci@...r.kernel.org, linux-mediatek@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI: mediatek-gen3: Ignore link up timeout

Il 05/11/25 10:21, Chen-Yu Tsai ha scritto:
> On Wed, Nov 5, 2025 at 4:45 PM AngeloGioacchino Del Regno
> <angelogioacchino.delregno@...labora.com> wrote:
>>
>> Il 05/11/25 07:28, Chen-Yu Tsai ha scritto:
>>> As mentioned in commit 886a9c134755 ("PCI: dwc: Move link handling into
>>> common code") come up later" in the code, it is possible for link up to
>>> occur later:
>>>
>>>     Let's standardize this to succeed as there are usecases where devices
>>>     (and the link) appear later even without hotplug. For example, a
>>>     reconfigured FPGA device.
>>>
>>> Another case for this is the new PCIe power control stuff. The power
>>> control mechanism only gets triggered in the PCI core after the driver
>>> calls into pci_host_probe(). The power control framework then triggers
>>> a bus rescan. In most driver implementations, this sequence happens
>>> after link training. If the driver errors out when link training times
>>> out, it will never get to the point where the device gets turned on.
>>>
>>> Ignore the link up timeout, and lower the error message down to a
>>> warning.
>>>
>>> This makes PCIe devices that have not-always-on power rails work.
>>> However there may be some reversal of PCIe power sequencing, since now
>>> the PERST# and clocks are enabled in the driver, while the power is
>>> applied afterwards.
>>>
>>> Signed-off-by: Chen-Yu Tsai <wenst@...omium.org>
>>
>> Ok, that's sensible.
>>
>> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
>>
>>> ---
>>> The change works to get my PCIe WiFi device working, but I wonder if
>>> the driver should expose more fine grained controls for the link clock
>>> and PERST# (when it is owned by the controller and not just a GPIO) to
>>> the power control framework. This applies not just to this driver.
>>>
>>> The PCI standard says that PERST# should hold the device in reset until
>>> the power rails are valid or stable, i.e. at their designated voltages.
>>
>> I completely agree with all of the above - and I can imagine multiple PCI-Express
>> controller drivers doing the same as what's being done in MTK Gen3.
>>
>> This means that the boot process may get slowed down by the port startup sequence
>> on multiple PCI-Express controllers (again not just MediaTek) and it's something
>> that must be resolved in some way... with the fastest course of action imo being
>> giving controller drivers knowledge of whether there's any device that is expected
>> to be powered off at that time (in order to at least avoid all those waits that
>> are expected to fail).
> 
> That also requires some refactoring, since all the drivers _wait_ for link
> up before going into the PCI core, which does the actual child node parsing.
> 
> I would like some input from Bartosz, who introduced the PCI power control
> framework, and Manivannan, who added slot power control.
> 
>> P.S.: Chen-Yu, did you check if the same applies to the MTK previous gen driver?
>>         Could you please check and eventually send a commit to do the same there?
> 
> My quick survey last week indicated that all the drivers except for the
> dwc family error out if link up timed out.
> 
> I don't have any hardware for the older generation though. And it looks
> like for the previous gen, the driver performs even worse, since it can
> support multiple slots, and each slot is brought up sequentially. A slot
> is discarded if link up times out. And the whole driver errors out if no
> slots are working.
> 

Hey, that's bold.

If only one driver (DWC) is working okay, there's something wrong that must be
fixed before that behavior change goes upstream (which it already did, ugh).

This needs attention from both Bartosz and Mani really-right-now.

I'm not sure about possible good solutions, and unfortunately I don't really have
any time to explore, so I'm not spitting any words on that - leaving this to both
Bartosz and Mani as that's also the right thing to do anyway.

Angelo

> 
> ChenYu
> 
>> Cheers,
>> Angelo
>>
>>> ---
>>>    drivers/pci/controller/pcie-mediatek-gen3.c | 13 +++++++++----
>>>    1 file changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c
>>> index 75ddb8bee168..5bdb312c9f9b 100644
>>> --- a/drivers/pci/controller/pcie-mediatek-gen3.c
>>> +++ b/drivers/pci/controller/pcie-mediatek-gen3.c
>>> @@ -504,10 +504,15 @@ static int mtk_pcie_startup_port(struct mtk_gen3_pcie *pcie)
>>>                ltssm_index = PCIE_LTSSM_STATE(val);
>>>                ltssm_state = ltssm_index >= ARRAY_SIZE(ltssm_str) ?
>>>                              "Unknown state" : ltssm_str[ltssm_index];
>>> -             dev_err(pcie->dev,
>>> -                     "PCIe link down, current LTSSM state: %s (%#x)\n",
>>> -                     ltssm_state, val);
>>> -             return err;
>>> +             dev_warn(pcie->dev,
>>> +                      "PCIe link down, current LTSSM state: %s (%#x)\n",
>>> +                      ltssm_state, val);
>>> +
>>> +             /*
>>> +              * Ignore the timeout, as the link may come up later,
>>> +              * such as when the PCI power control enables power to the
>>> +              * device, at which point it triggers a rescan.
>>> +              */
>>>        }
>>>
>>>        mtk_pcie_enable_msi(pcie);
>>
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ