lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cc018fc6-eef4-48b8-a754-f1e5fbce5eab@topic.nl>
Date: Wed, 11 Jun 2025 09:00:44 +0200
From: Mike Looijmans <mike.looijmans@...ic.nl>
To: Bjorn Helgaas <helgaas@...nel.org>
CC: linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
 Krzysztof WilczyƄski <kwilczynski@...nel.org>,
 Lorenzo Pieralisi <lpieralisi@...nel.org>,
 Manivannan Sadhasivam <mani@...nel.org>, Michal Simek
 <michal.simek@....com>, Rob Herring <robh@...nel.org>,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/2] PCI: xilinx: Wait for link-up status during
 initialization


Met vriendelijke groet / kind regards,

Mike Looijmans
System Expert


TOPIC Embedded Products B.V.
Materiaalweg 4, 5681 RJ Best
The Netherlands

T: +31 (0) 499 33 69 69
E: mike.looijmans@...ic.nl
W: www.topic.nl

Please consider the environment before printing this e-mail
On 10-06-2025 21:12, Bjorn Helgaas wrote:
> On Tue, Jun 10, 2025 at 04:39:03PM +0200, Mike Looijmans wrote:
>> When the driver loads, the transceiver and endpoint may still be setting
>> up a link. Wait for that to complete before continuing. This fixes that
>> the PCIe core does not work when loading the PL bitstream from
>> userspace. Existing reference designs worked because the endpoint and
>> PL were initialized by a bootloader. If the endpoint power and/or reset
>> is supplied by the kernel, or if the PL is programmed from within the
>> kernel, the link won't be up yet and the driver just has to wait for
>> link training to finish.
>> +static int xilinx_pci_wait_link_up(struct xilinx_pcie *pcie)
>> +{
>> +	u32 val;
>> +
>> +	/*
>> +	 * PCIe r6.0, sec 6.6.1 provides 100ms timeout. Since this is FPGA
>> +	 * fabric, we're more lenient and allow 200 ms for link training.
> Does this FPGA fabric refer to the Root Port or to the Endpoint?  We
> should know whether this issue is common to all xilinx Root Ports or
> specific to certain Endpoints.

The FPGA is root point. The endpoint is usually some generic PCIe device like 
an NVME or Wifi card.


> I assume that even if we wait for the link to come up and then wait
> PCIE_T_RRS_READY_MS before sending config requests, this Endpoint is
> still not ready to return an RRS response?  I'm looking at this text
> from sec 6.6.1:

My initial finding was that usually the endpoint would be ready well within 100ms.

The issue at hand here is that Xilinx assumed that their proprietary 
bootloader would have taken care of power, reset and clock signals and 
programming the FPGA. Thus, when this driver probes, seconds later, it would 
already be in a "link up" state.

In our system, reset, clock and power are under kernel control, so the 
endpoint (e.g. NVME) has just been powered-up, and the root complex (in the 
FPGA) also got powered up just a millisecond ago. So it would always report a 
"link down" at startup and give up.

Analysis showed that the PCIe root was just still training the link, and all 
that's required to make the system work is to wait for the link to be established.


>    Unless Readiness Notifications mechanisms are used, the Root Complex
>    and/or system software must allow at least 1.0 s following exit from
>    a Conventional Reset of a device, before determining that the device
>    is broken if it fails to return a Successful Completion status for a
>    valid Configuration Request. This period is independent of how
>    quickly Link training completes.
>
>    Note: This delay is analogous to the Trhfa parameter specified for
>    PCI/PCI-X, and is intended to allow an adequate amount of time for
>    devices which require self initialization.
>
> It seems like the PCI core RRS handling should already account for
> this 1.0 s period.
>
>> +	 */
>> +	return readl_poll_timeout(pcie->reg_base + XILINX_PCIE_REG_PSCR, val,
>> +			(val & XILINX_PCIE_REG_PSCR_LNKUP), 2 * USEC_PER_MSEC,
>> +			2 * PCIE_T_RRS_READY_MS * USEC_PER_MSEC);
>> +}



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ