lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 7 Jul 2020 12:08:15 +0300
From:   Shay Drory <shayd@...lanox.com>
To:     Niklas Schnelle <schnelle@...ux.ibm.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Stefan Raspl <raspl@...ibm.com>
Subject: Re: mlx5 hot unplug regression on z/VM

Hello Mr. Schnelle.

I have reviewed the code and the log, and I think I understood what is the bug.
As far I understand, the bug is as you pointed out in the mail[1], switching the call order of the two function.
running mlx5_drain_health_wq() prevents new health works to be queue, so when we calling to
mlx5_unregister_device() the driver in unaware that the VF might be missing.
I will start working on a patch to fix this.

[1] https://lkml.org/lkml/2020/6/12/376

On 7/6/2020 19:12, Niklas Schnelle wrote:

> Hi Mr. Drory, Hi Netdev List,
>
> I'm the PCI Subsystem maintainer for Linux on IBM Z and since v5.8-rc1
> we've been seeing a regression with hot unplug of ConnectX-4 VFs
> from z/VM guests. In -rc1 this still looked like a simple issue and
> I wrote the following mail:
> https://lkml.org/lkml/2020/6/12/376
> sadly since I think -rc2 I've not been able to get this working consistently
> anymore (it did work consistently with the change described above on -rc1).
> In his answer Saeed Mahameed pointed me to your commits as dealing with
> similar issues so I wanted to get some input on how to debug this
> further.
>
> The commands I used to test this are as follows (on a z/VM guest running
> vanilla debug_defconfig v5.8-rc4 installed on Fedora 31) and you find the resulting
> dmesg attached to this mail:
>
> # vmcp q pcif  // query for available PCI devices
> # vmcp attach pcif <FID> to \* // where <FID> is one of the ones listed by the above command
> # vmcp detach pcif <FID> // This does a hot unplug and is where things start going wrong
>
> I guess you don't have access to hardware but I'll be happy to assist
> as good as I can since digging on my own I sadly really don't know
> enough about the mlx5_core driver to make more progress.
>
> Best regards,
> Niklas Schnelle


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ