lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Nov 2022 04:52:16 +0000
From:   Wei Gong <gongwei833x@...il.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     linux-kernel@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
        linux-pci@...r.kernel.org
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> virtio uses the same driver for VFs and PFs.  Accordingly,
> pci_device_is_present is used to detect device presence. This function
> isn't currently working properly for VFs since it attempts reading
> device and vendor ID. As VFs are present if and only if PF is present,
> just return the value for that device.
> 
> Reported-by: Wei Gong <gongwei833x@...il.com>
> Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> ---
> 
> Wei Gong, thanks for your testing of the RFC!
> As I made a small change, would appreciate re-testing.
> 
> Thanks!
> 
> changes from RFC:
> 	use pci_physfn() wrapper to make the code build without PCI_IOV
> 
> 
>  drivers/pci/pci.c | 5 +++++
>  1 file changed, 5 insertions(+)

Tested-by: Wei Gong <gongwei833x@...il.com>

retest done and well.

I would rephrase the bug.

according to sriov's protocol specification vendor_id and
device_id field in all VFs return FFFFh when read
so when vf devs is in the pci_device_is_present,it will be
misjudged as surprise removeal

when io is issued on the vf, normally disable virtio_blk vf
devs,at this time the disable opration will hang. and virtio
blk dev io hang.

task:bash            state:D stack:    0 pid: 1773 ppid:  1241 flags:0x00004002
Call Trace:
 <TASK>
 __schedule+0x2ee/0x900
 schedule+0x4f/0xc0
 blk_mq_freeze_queue_wait+0x69/0xa0
 ? wait_woken+0x80/0x80
 blk_mq_freeze_queue+0x1b/0x20
 blk_cleanup_queue+0x3d/0xd0
 virtblk_remove+0x3c/0xb0 [virtio_blk]
 virtio_dev_remove+0x4b/0x80
 device_release_driver_internal+0x103/0x1d0
 device_release_driver+0x12/0x20
 bus_remove_device+0xe1/0x150
 device_del+0x192/0x3f0
 device_unregister+0x1b/0x60
 unregister_virtio_device+0x18/0x30
 virtio_pci_remove+0x41/0x80
 pci_device_remove+0x3e/0xb0
 device_release_driver_internal+0x103/0x1d0
 device_release_driver+0x12/0x20
 pci_stop_bus_device+0x79/0xa0
 pci_stop_and_remove_bus_device+0x13/0x20
 pci_iov_remove_virtfn+0xc5/0x130
 ? pci_get_device+0x4a/0x60
 sriov_disable+0x33/0xf0
 pci_disable_sriov+0x26/0x30
 virtio_pci_sriov_configure+0x6f/0xa0
 sriov_numvfs_store+0x104/0x140
 dev_attr_store+0x17/0x30
 sysfs_kf_write+0x3e/0x50
 kernfs_fop_write_iter+0x138/0x1c0
 new_sync_write+0x117/0x1b0
 vfs_write+0x185/0x250
 ksys_write+0x67/0xe0
 __x64_sys_write+0x1a/0x20
 do_syscall_64+0x61/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f21bd1f3ba4
RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0

when virtio_blk is performing io, as long as there two stages of:
1. dispatch io. queue_usage_counter++;
2. io is completed after receiving the interrupt. queue_usage_counter--;

disable virtio_blk vfs:
  if(!pci_device_is_present(pci_dev))
    virtio_break_device(&vp_dev->vdev);
virtqueue for vf devs will be marked broken.
the interrupt notification io is end. Since it is judged that the
virtqueue has been marked as broken, the completed io will not be
performed.
So queue_usage_counter will not be cleared.
when the disk is removed at the same time, the queue will be frozen,
and you must wait for the queue_usage_counter to be cleared.
Therefore, it leads to the removeal of hang.



Thanks,
Wei Gong

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ