lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 20 Jan 2021 12:10:53 -0500
From:   Matthew Rosato <mjrosato@...ux.ibm.com>
To:     Niklas Schnelle <schnelle@...ux.ibm.com>,
        alex.williamson@...hat.com, cohuck@...hat.com
Cc:     pmorel@...ux.ibm.com, borntraeger@...ibm.com, hca@...ux.ibm.com,
        gor@...ux.ibm.com, gerald.schaefer@...ux.ibm.com,
        linux-s390@...r.kernel.org, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] vfio-pci/zdev: Introduce the zPCI I/O vfio region

On 1/20/21 8:21 AM, Niklas Schnelle wrote:
> 
> 
> On 1/19/21 9:02 PM, Matthew Rosato wrote:
>> Some s390 PCI devices (e.g. ISM) perform I/O operations that have very
> .. snip ...
>> +
>> +static size_t vfio_pci_zdev_io_rw(struct vfio_pci_device *vdev,
>> +				  char __user *buf, size_t count,
>> +				  loff_t *ppos, bool iswrite)
>> +{
> ... snip ...
>> +	/*
>> +	 * For now, the largest allowed block I/O is advertised as PAGE_SIZE,
>> +	 * and cannot exceed a page boundary - so a single page is enough.  The
>> +	 * guest should have validated this but let's double-check that the
>> +	 * request will not cross a page boundary.
>> +	 */
>> +	if (((region->req.gaddr & ~PAGE_MASK)
>> +			+ region->req.len - 1) & PAGE_MASK) {
>> +		return -EIO;
>> +	}
>> +
>> +	mutex_lock(&zdev->lock);
> 
> I plan on using the zdev->lock for preventing concurrent zPCI devices
> removal/configuration state changes between zPCI availability/error
> events and enable_slot()/disable_slot() and /sys/bus/pci/devices/<dev>/recover.
> 
> With that use in place using it here causes a deadlock when doing
> "echo 0 > /sys/bus/pci/slots/<fid>/power from the host for an ISM device
> attached to a guest.
> 
> This is because the (soft) hot unplug will cause vfio to notify QEMU, which
> sends a deconfiguration request to the guest, which then tries to
> gracefully shutdown the device. During that shutdown the device will
> be accessed, running into this code path which then blocks on
> the lock held by the disable_slot() code which waits on vfio
> releasing the device.
> 

Oof, good catch.  The primary purpose of acquiring the zdev lock here 
was to ensure that the region is only being used to process one 
operation at a time and at the time I wrote this initially the zdev lock 
seemed like a reasonable candidate :)

> Alex may correct me if I'm wrong but I think instead vfio should
> be holding the PCI device lock via pci_device_lock(pdev).
> 

OK, I can have a look at this.  Thanks.


> The annotated trace with my test code looks as follows:
> 
> [  618.025091] Call Trace:
> [  618.025093]  [<00000007c1a139e0>] __schedule+0x360/0x960
> [  618.025104]  [<00000007c1a14760>] schedule_preempt_disabled+0x60/0x100
> [  618.025107]  [<00000007c1a16b48>] __mutex_lock+0x358/0x880
> [  618.025110]  [<00000007c1a170a2>] mutex_lock_nested+0x32/0x40
> [  618.025112]  [<000003ff805a3948>] vfio_pci_zdev_io_rw+0x168/0x310 [vfio_pci]
> [  618.025120]  [<000003ff8059b2b0>] vfio_pci_write+0xd0/0xe0 [vfio_pci]
> [  618.025124]  [<00000007c0fa5392>] __s390x_sys_pwrite64+0x112/0x360
> [  618.025129]  [<00000007c1a0aaf6>] __do_syscall+0x116/0x190
> [  618.025132]  [<00000007c1a1deda>] system_call+0x72/0x98
> [  618.025137] 1 lock held by qemu-system-s39/1315:
> [  618.025139]  #0: 000000008524b4e8 (&zdev->lock){....}-{3:3}, at: vfio_pci_zdev_io_rw+0x168/0x310 [vfio_pci]
> [  618.025151]
>                 Showing all locks held in the system:
> [  618.025166] 1 lock held by khungtaskd/99:
> [  618.025168]  #0: 00000007c1ed4748 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.0+0x0/0x210
> [  618.025194] 6 locks held by zsh/1190:
> [  618.025196]  #0: 0000000095fc0488 (sb_writers#3){....}-{0:0}, at: __do_syscall+0x116/0x190
> [  618.025226]  #1: 00000000975bf090 (&of->mutex){....}-{3:3}, at: kernfs_fop_write+0x9a/0x240
> [  618.025236]  #2: 000000008584be78 (kn->active#245){....}-{0:0}, at: kernfs_fop_write+0xa6/0x240
> [  618.025243]  #3: 000000008524b4e8 (&zdev->lock){....}-{3:3}, at: disable_slot+0x32/0x130 <-------------------------------------|
> [  618.025252]  #4: 00000007c1f53468 (pci_rescan_remove_lock){....}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0x26/0x240   |
> [  618.025260]  #5: 0000000085d8a1a0 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x32/0x1d0                              |
> [  618.025271] 1 lock held by qemu-system-s39/1312:                                                                               D
> [  618.025273] 1 lock held by qemu-system-s39/1313:                                                                               E
> [  618.025275]  #0: 00000000d47e80d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]                              A
> [  618.025322] 1 lock held by qemu-system-s39/1314:                                                                               D
> [  618.025324]  #0: 00000000d34700d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]                              |
> [  618.025345] 1 lock held by qemu-system-s39/1315:                                                                               |
> [  618.025347]  #0: 000000008524b4e8 (&zdev->lock){....}-{3:3}, at: vfio_pci_zdev_io_rw+0x168/0x310 [vfio_pci] <------------------|
> [  618.025355] 1 lock held by qemu-system-s39/1317:
> [  618.025357]  #0: 00000000d34480d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]
> [  618.025378] 1 lock held by qemu-system-s39/1318:
> [  618.025380]  #0: 00000000d34380d0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]
> [  618.025400] 1 lock held by qemu-system-s39/1319:
> [  618.025403]  #0: 00000000d47e8a90 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x780 [kvm]
> [  618.025424] 2 locks held by zsh/1391:
> [  618.025426]  #0: 00000000d4a708a0 (&tty->ldisc_sem){....}-{0:0}, at: tty_ldisc_ref_wait+0x34/0x70
> [  618.025435]  #1: 0000038002fc72f0 (&ldata->atomic_read_lock){....}-{3:3}, at: n_tty_read+0xc8/0xa50
> 
> 
>> +
>> +	ret = get_user_pages_fast(region->req.gaddr & PAGE_MASK, 1, 0, &gpage);
>> +	if (ret <= 0) {
>> +		count = -EIO;
> ... snip ...
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ