lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 3 Apr 2023 12:02:21 +0800
From:   Wu Zongyong <wuzongyong@...ux.alibaba.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     sdonthineni@...dia.com, bhelgaas@...gle.com,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        wllenyj@...ux.alibaba.com, wutu.xq2@...ux.alibaba.com,
        gerry@...ux.alibaba.com
Subject: Re: [RFC PATCH] PCI: avoid SBR for NVIDIA T4

On Fri, Mar 31, 2023 at 10:11:15AM +0800, Wu Zongyong wrote:
> On Thu, Mar 30, 2023 at 10:49:26AM -0500, Bjorn Helgaas wrote:
> > On Thu, Mar 30, 2023 at 10:10:16AM +0800, Wu Zongyong wrote:
> > > On Wed, Mar 29, 2023 at 12:05:15PM -0500, Bjorn Helgaas wrote:
> > > > On Wed, Mar 29, 2023 at 07:58:45PM +0800, Wu Zongyong wrote:
> > > > > Secondary bus reset will fail if NVIDIA T4 card is direct attached to a
> > > > > root port.
> > > > 
> > > > Is this only a problem when direct attached to a Root Port?  Why would
> > > > that be?  If it's *not* related to being directly under a Root Port,
> > > > don't mention that at all.
> > >
> > > Yes, this problem occurs only when the T4 card is direct attached to a
> > > Root Port.
> > > I have test it with a T4 card attached to a PCIe Switch or a PCI Bridge,
> > > and it works well.
> > 
> > From an electrical and protocol point of view, the device should not
> > be able to tell the difference, so Lukas' suggestion that it may be
> > related to reset delays seems very pertinent.
> I will test it with the commits mentioned above.
> But it may take some time since it is not easy to replace kernel in our
> environment.

I have tested it with Lukas' suggestion and it didn't work for T4 cards.

My base kernel is v5.10, and I cherry-picked the following patches:

  730643d33e2d ("PCI/PM: Resume subordinate bus in bus type callbacks")
  8ef0217227b4 ("PCI/PM: Observe reset delay irrespective of bridge_d3")
  ac91e6980563 ("PCI: Unify delay handling for reset and resume")

Any other necessary patches I should apply?

> > 
> > Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ