lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 20 Jan 2021 14:21:59 +0100
From:   Niklas Schnelle <schnelle@...ux.ibm.com>
To:     Matthew Rosato <mjrosato@...ux.ibm.com>,
        alex.williamson@...hat.com, cohuck@...hat.com
Cc:     pmorel@...ux.ibm.com, borntraeger@...ibm.com, hca@...ux.ibm.com,
        gor@...ux.ibm.com, gerald.schaefer@...ux.ibm.com,
        linux-s390@...r.kernel.org, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] vfio-pci/zdev: Introduce the zPCI I/O vfio region



On 1/19/21 9:02 PM, Matthew Rosato wrote:
> Some s390 PCI devices (e.g. ISM) perform I/O operations that have very
.. snip ...
> +
> +static size_t vfio_pci_zdev_io_rw(struct vfio_pci_device *vdev,
> +				  char __user *buf, size_t count,
> +				  loff_t *ppos, bool iswrite)
> +{
... snip ...
> +	/*
> +	 * For now, the largest allowed block I/O is advertised as PAGE_SIZE,
> +	 * and cannot exceed a page boundary - so a single page is enough.  The
> +	 * guest should have validated this but let's double-check that the
> +	 * request will not cross a page boundary.
> +	 */
> +	if (((region->req.gaddr & ~PAGE_MASK)
> +			+ region->req.len - 1) & PAGE_MASK) {
> +		return -EIO;
> +	}
> +
> +	mutex_lock(&zdev->lock);

I plan on using the zdev->lock for preventing concurrent zPCI devices
removal/configuration state changes between zPCI availability/error
events and enable_slot()/disable_slot() and /sys/bus/pci/devices/<dev>/recover.

With that use in place using it here causes a deadlock when doing 
"echo 0 > /sys/bus/pci/slots/<fid>/power from the host for an ISM device
attached to a guest.

This is because the (soft) hot unplug will cause vfio to notify QEMU, which
sends a deconfiguration request to the guest, which then tries to
gracefully shutdown the device. During that shutdown the device will
be accessed, running into this code path which then blocks on
the lock held by the disable_slot() code which waits on vfio
releasing the device.

Alex may correct me if I'm wrong but I think instead vfio should
be holding the PCI device lock via pci_device_lock(pdev).

The annotated trace with my test code looks as follows:

[  618.025091] Call Trace:
[  618.025093]  [<00000007c1a139e0>] __schedule+0x360/0x960
[  618.025104]  [<00000007c1a14760>] schedule_preempt_disabled+0x60/0x100
[  618.025107]  [<00000007c1a16b48>] __mutex_lock+0x358/0x880
[  618.025110]  [<00000007c1a170a2>] mutex_lock_nested+0x32/0x40
[  618.025112]  [<000003ff805a3948>] vfio_pci_zdev_io_rw+0x168/0x310 [vfio_pci]
[  618.025120]  [<000003ff8059b2b0>] vfio_pci_write+0xd0/0xe0 [vfio_pci]
[  618.025124]  [<00000007c0fa5392>] __s390x_sys_pwrite64+0x112/0x360
[  618.025129]  [<00000007c1a0aaf6>] __do_syscall+0x116/0x190
[  618.025132]  [<00000007c1a1deda>] system_call+0x72/0x98
[  618.025137] 1 lock held by qemu-system-s39/1315:
[  618.025139]  #0: 000000008524b4e8 (&zdev->lock){....}-{3:3}, at: vfio_pci_zdev_io_rw+0x168/0x310 [vfio_pci]
[  618.025151]
               Showing all locks held in the system:
[  618.025166] 1 lock held by khungtaskd/99:
[  618.025168]  #0: 00000007c1ed4748 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.0+0x0/0x210
[  618.025194] 6 locks held by zsh/1190:
[  618.025196]  #0: 0000000095fc0488 (sb_writers#3){....}-{0:0}, at: __do_syscall+0x116/0x190
[  618.025226]  #1: 00000000975bf090 (&of->mutex){....}-{3:3}, at: kernfs_fop_write+0x9a/0x240
[  618.025236]  #2: 000000008584be78 (kn->active#245){....}-{0:0}, at: kernfs_fop_write+0xa6/0x240
[  618.025243]  #3: 000000008524b4e8 (&zdev->lock){....}-{3:3}, at: disable_slot+0x32/0x130 <-------------------------------------|
[  618.025252]  #4: 00000007c1f53468 (pci_rescan_remove_lock){....}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0x26/0x240   |
[  618.025260]  #5: 0000000085d8a1a0 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x32/0x1d0                              |
[  618.025271] 1 lock held by qemu-system-s39/1312:                                                                               D
[  618.025273] 1 lock held by qemu-system-s39/1313:                                                                               E
[  618.025275]  #0: 00000000d47e80d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]                              A
[  618.025322] 1 lock held by qemu-system-s39/1314:                                                                               D
[  618.025324]  #0: 00000000d34700d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]                              |
[  618.025345] 1 lock held by qemu-system-s39/1315:                                                                               |
[  618.025347]  #0: 000000008524b4e8 (&zdev->lock){....}-{3:3}, at: vfio_pci_zdev_io_rw+0x168/0x310 [vfio_pci] <------------------|
[  618.025355] 1 lock held by qemu-system-s39/1317:
[  618.025357]  #0: 00000000d34480d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]
[  618.025378] 1 lock held by qemu-system-s39/1318:
[  618.025380]  #0: 00000000d34380d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]
[  618.025400] 1 lock held by qemu-system-s39/1319:
[  618.025403]  #0: 00000000d47e8a90 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]
[  618.025424] 2 locks held by zsh/1391:
[  618.025426]  #0: 00000000d4a708a0 (&tty->ldisc_sem){....}-{0:0}, at: tty_ldisc_ref_wait+0x34/0x70
[  618.025435]  #1: 0000038002fc72f0 (&ldata->atomic_read_lock){....}-{3:3}, at: n_tty_read+0xc8/0xa50


> +
> +	ret = get_user_pages_fast(region->req.gaddr & PAGE_MASK, 1, 0, &gpage);
> +	if (ret <= 0) {
> +		count = -EIO;
... snip ...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ