[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e26c2616-69cf-78f3-81db-972a0d461bc9@linux.ibm.com>
Date: Wed, 20 Jan 2021 12:10:53 -0500
From: Matthew Rosato <mjrosato@...ux.ibm.com>
To: Niklas Schnelle <schnelle@...ux.ibm.com>,
alex.williamson@...hat.com, cohuck@...hat.com
Cc: pmorel@...ux.ibm.com, borntraeger@...ibm.com, hca@...ux.ibm.com,
gor@...ux.ibm.com, gerald.schaefer@...ux.ibm.com,
linux-s390@...r.kernel.org, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] vfio-pci/zdev: Introduce the zPCI I/O vfio region
On 1/20/21 8:21 AM, Niklas Schnelle wrote:
>
>
> On 1/19/21 9:02 PM, Matthew Rosato wrote:
>> Some s390 PCI devices (e.g. ISM) perform I/O operations that have very
> .. snip ...
>> +
>> +static size_t vfio_pci_zdev_io_rw(struct vfio_pci_device *vdev,
>> + char __user *buf, size_t count,
>> + loff_t *ppos, bool iswrite)
>> +{
> ... snip ...
>> + /*
>> + * For now, the largest allowed block I/O is advertised as PAGE_SIZE,
>> + * and cannot exceed a page boundary - so a single page is enough. The
>> + * guest should have validated this but let's double-check that the
>> + * request will not cross a page boundary.
>> + */
>> + if (((region->req.gaddr & ~PAGE_MASK)
>> + + region->req.len - 1) & PAGE_MASK) {
>> + return -EIO;
>> + }
>> +
>> + mutex_lock(&zdev->lock);
>
> I plan on using the zdev->lock for preventing concurrent zPCI devices
> removal/configuration state changes between zPCI availability/error
> events and enable_slot()/disable_slot() and /sys/bus/pci/devices/<dev>/recover.
>
> With that use in place using it here causes a deadlock when doing
> "echo 0 > /sys/bus/pci/slots/<fid>/power from the host for an ISM device
> attached to a guest.
>
> This is because the (soft) hot unplug will cause vfio to notify QEMU, which
> sends a deconfiguration request to the guest, which then tries to
> gracefully shutdown the device. During that shutdown the device will
> be accessed, running into this code path which then blocks on
> the lock held by the disable_slot() code which waits on vfio
> releasing the device.
>
Oof, good catch. The primary purpose of acquiring the zdev lock here
was to ensure that the region is only being used to process one
operation at a time and at the time I wrote this initially the zdev lock
seemed like a reasonable candidate :)
> Alex may correct me if I'm wrong but I think instead vfio should
> be holding the PCI device lock via pci_device_lock(pdev).
>
OK, I can have a look at this. Thanks.
> The annotated trace with my test code looks as follows:
>
> [ 618.025091] Call Trace:
> [ 618.025093] [<00000007c1a139e0>] __schedule+0x360/0x960
> [ 618.025104] [<00000007c1a14760>] schedule_preempt_disabled+0x60/0x100
> [ 618.025107] [<00000007c1a16b48>] __mutex_lock+0x358/0x880
> [ 618.025110] [<00000007c1a170a2>] mutex_lock_nested+0x32/0x40
> [ 618.025112] [<000003ff805a3948>] vfio_pci_zdev_io_rw+0x168/0x310 [vfio_pci]
> [ 618.025120] [<000003ff8059b2b0>] vfio_pci_write+0xd0/0xe0 [vfio_pci]
> [ 618.025124] [<00000007c0fa5392>] __s390x_sys_pwrite64+0x112/0x360
> [ 618.025129] [<00000007c1a0aaf6>] __do_syscall+0x116/0x190
> [ 618.025132] [<00000007c1a1deda>] system_call+0x72/0x98
> [ 618.025137] 1 lock held by qemu-system-s39/1315:
> [ 618.025139] #0: 000000008524b4e8 (&zdev->lock){....}-{3:3}, at: vfio_pci_zdev_io_rw+0x168/0x310 [vfio_pci]
> [ 618.025151]
> Showing all locks held in the system:
> [ 618.025166] 1 lock held by khungtaskd/99:
> [ 618.025168] #0: 00000007c1ed4748 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.0+0x0/0x210
> [ 618.025194] 6 locks held by zsh/1190:
> [ 618.025196] #0: 0000000095fc0488 (sb_writers#3){....}-{0:0}, at: __do_syscall+0x116/0x190
> [ 618.025226] #1: 00000000975bf090 (&of->mutex){....}-{3:3}, at: kernfs_fop_write+0x9a/0x240
> [ 618.025236] #2: 000000008584be78 (kn->active#245){....}-{0:0}, at: kernfs_fop_write+0xa6/0x240
> [ 618.025243] #3: 000000008524b4e8 (&zdev->lock){....}-{3:3}, at: disable_slot+0x32/0x130 <-------------------------------------|
> [ 618.025252] #4: 00000007c1f53468 (pci_rescan_remove_lock){....}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0x26/0x240 |
> [ 618.025260] #5: 0000000085d8a1a0 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x32/0x1d0 |
> [ 618.025271] 1 lock held by qemu-system-s39/1312: D
> [ 618.025273] 1 lock held by qemu-system-s39/1313: E
> [ 618.025275] #0: 00000000d47e80d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm] A
> [ 618.025322] 1 lock held by qemu-system-s39/1314: D
> [ 618.025324] #0: 00000000d34700d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm] |
> [ 618.025345] 1 lock held by qemu-system-s39/1315: |
> [ 618.025347] #0: 000000008524b4e8 (&zdev->lock){....}-{3:3}, at: vfio_pci_zdev_io_rw+0x168/0x310 [vfio_pci] <------------------|
> [ 618.025355] 1 lock held by qemu-system-s39/1317:
> [ 618.025357] #0: 00000000d34480d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]
> [ 618.025378] 1 lock held by qemu-system-s39/1318:
> [ 618.025380] #0: 00000000d34380d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]
> [ 618.025400] 1 lock held by qemu-system-s39/1319:
> [ 618.025403] #0: 00000000d47e8a90 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]
> [ 618.025424] 2 locks held by zsh/1391:
> [ 618.025426] #0: 00000000d4a708a0 (&tty->ldisc_sem){....}-{0:0}, at: tty_ldisc_ref_wait+0x34/0x70
> [ 618.025435] #1: 0000038002fc72f0 (&ldata->atomic_read_lock){....}-{3:3}, at: n_tty_read+0xc8/0xa50
>
>
>> +
>> + ret = get_user_pages_fast(region->req.gaddr & PAGE_MASK, 1, 0, &gpage);
>> + if (ret <= 0) {
>> + count = -EIO;
> ... snip ...
>
Powered by blists - more mailing lists