[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 7 Jul 2020 12:08:15 +0300
From: Shay Drory <shayd@...lanox.com>
To: Niklas Schnelle <schnelle@...ux.ibm.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Stefan Raspl <raspl@...ibm.com>
Subject: Re: mlx5 hot unplug regression on z/VM
Hello Mr. Schnelle.
I have reviewed the code and the log, and I think I understood what is the bug.
As far I understand, the bug is as you pointed out in the mail[1], switching the call order of the two function.
running mlx5_drain_health_wq() prevents new health works to be queue, so when we calling to
mlx5_unregister_device() the driver in unaware that the VF might be missing.
I will start working on a patch to fix this.
[1] https://lkml.org/lkml/2020/6/12/376
On 7/6/2020 19:12, Niklas Schnelle wrote:
> Hi Mr. Drory, Hi Netdev List,
>
> I'm the PCI Subsystem maintainer for Linux on IBM Z and since v5.8-rc1
> we've been seeing a regression with hot unplug of ConnectX-4 VFs
> from z/VM guests. In -rc1 this still looked like a simple issue and
> I wrote the following mail:
> https://lkml.org/lkml/2020/6/12/376
> sadly since I think -rc2 I've not been able to get this working consistently
> anymore (it did work consistently with the change described above on -rc1).
> In his answer Saeed Mahameed pointed me to your commits as dealing with
> similar issues so I wanted to get some input on how to debug this
> further.
>
> The commands I used to test this are as follows (on a z/VM guest running
> vanilla debug_defconfig v5.8-rc4 installed on Fedora 31) and you find the resulting
> dmesg attached to this mail:
>
> # vmcp q pcif // query for available PCI devices
> # vmcp attach pcif <FID> to \* // where <FID> is one of the ones listed by the above command
> # vmcp detach pcif <FID> // This does a hot unplug and is where things start going wrong
>
> I guess you don't have access to hardware but I'll be happy to assist
> as good as I can since digging on my own I sadly really don't know
> enough about the mlx5_core driver to make more progress.
>
> Best regards,
> Niklas Schnelle
Powered by blists - more mailing lists