lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f06efdd89b418ad831ee3a5da766f3f0a584050c.camel@kernel.crashing.org>
Date:   Wed, 09 Jan 2019 18:24:00 +1100
From:   Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:     Alexey Kardashevskiy <aik@...abs.ru>,
        Jason Gunthorpe <jgg@...pe.ca>,
        David Gibson <david@...son.dropbear.id.au>
Cc:     Leon Romanovsky <leon@...nel.org>, linux-rdma@...r.kernel.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        sbest@...hat.com, saeedm@...lanox.com, alex.williamson@...hat.com,
        paulus@...ba.org, linux-pci@...r.kernel.org, bhelgaas@...gle.com,
        ogerlitz@...lanox.com, linuxppc-dev@...ts.ozlabs.org,
        davem@...emloft.net, tariqt@...lanox.com
Subject: Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

On Wed, 2019-01-09 at 15:53 +1100, Alexey Kardashevskiy wrote:
> "A PCI completion timeout occurred for an outstanding PCI-E transaction"
> it is.
> 
> This is how I bind the device to vfio:
> 
> echo vfio-pci > '/sys/bus/pci/devices/0000:01:00.0/driver_override'
> echo vfio-pci > '/sys/bus/pci/devices/0000:01:00.1/driver_override'
> echo '0000:01:00.0' > '/sys/bus/pci/devices/0000:01:00.0/driver/unbind'
> echo '0000:01:00.1' > '/sys/bus/pci/devices/0000:01:00.1/driver/unbind'
> echo '0000:01:00.0' > /sys/bus/pci/drivers/vfio-pci/bind
> echo '0000:01:00.1' > /sys/bus/pci/drivers/vfio-pci/bind
> 
> 
> and I noticed that EEH only happens with the last command. The order
> (.0,.1  or .1,.0) does not matter, it seems that putting one function to
> D3 is fine but putting another one when the first one is already in D3 -
> produces EEH. And I do not recall ever seeing this on the firestone
> machine. Weird.

Putting all functions into D3 is what allows the device to actually go
into D3.

Does it work with other devices ? We do have that bug on early P9
revisions where the attempt of bringing the link to L1 as part of the
D3 process fails in horrible ways, I thought P8 would be ok but maybe
not ...

Otherwise, it might be that our timeouts are too low (you may want to
talk to our PCIe guys internally)

Cheers,
Ben.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ