linux-kernel - Re: [PATCH] PCI: apple: Reset the port for 100ms on probe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87o86h7pex.wl-maz@kernel.org>
Date:   Thu, 18 Nov 2021 10:01:58 +0000
From:   Marc Zyngier <maz@...nel.org>
To:     Pali Rohár <pali@...nel.org>
Cc:     Bjorn Helgaas <helgaas@...nel.org>, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, linux-pci@...r.kernel.org,
        kernel-team@...roid.com, Alyssa Rosenzweig <alyssa@...enzweig.io>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH] PCI: apple: Reset the port for 100ms on probe

On Wed, 17 Nov 2021 20:28:59 +0000,
Pali Rohár <pali@...nel.org> wrote:
> 
> Hello!
> 
> On Wednesday 17 November 2021 14:12:45 Bjorn Helgaas wrote:
> > [+cc Pali]
> > 
> > On Wed, Nov 17, 2021 at 04:00:53PM +0000, Marc Zyngier wrote:
> > > While the Apple PCIe driver works correctly when directly booted
> > > from the firmware, it fails to initialise when the kernel is booted
> > > from a bootloader using PCIe such as u-boot.
> > > 
> > > That's beacuse we're missing a proper reset of the port (we only
> > > clear the reset, but never assert it).
> > 
> > s/beacuse/because/
> > 
> > > Bring the port back to life by wiggling the #PERST pin for 100ms
> > > (as per the spec).
> > 
> > I cc'd Pali because I think he's interested in consolidating or
> > somehow rationalizing delays like this.
> > 
> > If we have a specific spec reference here, I think it would help that
> > effort.  I *think* it's PCIe r5.0, sec 6.6.1, which mentions the 100ms
> > along with some additional constraints, like waiting 100ms after Link
> > training completes for ports that support > 5.0 GT/s, whether
> > Readiness Notifications are used, and CRS Software Visiblity.
> 
> This is not 100ms timeout "after link training completes".
> 
> Timeout in this patch is between flipping PERST# signal, so timeout
> means how long needs to be endpoint card in reset state. And this
> timeout cannot be controller specific. In past I have tried to find this
> timeout in specifications, I was not able. Some summary is in my email:
> https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/
> 
> So I would like to know, why was chosen 100ms for msleep() in this
> patch?

Excellent question. I went back to my notes (and the spec), and it
looks like I have mistakenly conflated *two* delays here:

- The post-#PERST delay, which is 100ms, and which is *not* what this
  patch is doing while it really should be doing it. This is
  documented in the base PCIe spec (in Rev 2.0, this is part of
  6.6.1). The amusing part is that on this HW, it seems that only the
  delay from the falling edge matters (which is why I didn't spot the
  issue).

- The duration of the power-on #PERST assertion (Tpvperl), which is
  also 100ms, and documented in the PCIe Card Electromechanical
  Specification (Rev 1.0a, 2.2 and 2.2.1).

There is also a third delay (Tperst-clk) which represents the time
required for the clock to ramp up before releasing #PERST. No, there
is no value associated with this.

Having come to my senses, and with these constraints in mind, this is
what I currently have in my tree:

	/* Engage #PERST */
	gpiod_set_value(reset, 0);

	ret = apple_pcie_setup_refclk(pcie, port);
	if (ret < 0)
		return ret;

	/* Hold #PERST for 100ms as per the electromechanical spec */
	msleep(100);
	rmw_set(PORT_PERST_OFF, port->base + PORT_PERST);
	gpiod_set_value(reset, 1);
	/* Wait for 100ms after #PERST deassertion before anothing else */
	msleep(100);

Yes, this is totally overkill, as I assume that each port has gone
through a complete power-off and is only slowly coming back from the
dead.

In practice, I can completely remove the initial Tpvperl delay (we
have been powered-on for a long time already, and the clock is stable
when we come back from setting it up), and cut the second one by half
without observing any ill effect (though I feel safer keeping it to
its nominal value).

If nobody screams, I'll respin something shortly.

	M.

-- 
Without deviation from the norm, progress is not possible.