linux-kernel - Re: [bug report] Potential DEADLOCK due to vfio_pci_mmap_huge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250418100444.5bc9cd97.alex.williamson@redhat.com>
Date: Fri, 18 Apr 2025 10:04:44 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Sairaj Kodilkar <sarunkod@....com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, kvm@...r.kernel.org,
 linux-kernel@...r.kernel.org, bhelgaas@...gle.com, will@...nel.org,
 joro@...tes.org, robin.murphy@....com, iommu@...ts.linux.dev,
 linux-pci@...r.kernel.org, vasant.hegde@....com,
 suravee.suthikulpanit@....com, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [bug report] Potential DEADLOCK due to
 vfio_pci_mmap_huge_fault()

On Thu, 17 Apr 2025 11:21:13 -0500
Bjorn Helgaas <helgaas@...nel.org> wrote:

> [+cc Thomas since msi_setup_device_data() is in a call path]
> 
> On Thu, Apr 17, 2025 at 05:22:00PM +0530, Sairaj Kodilkar wrote:
> > Hi everyone,
> > I am seeing following errors on the host when I run FIO tests inside the
> > guest on the latest upstream kernel. This causes guest to hang. Can anyone help with this ? 

Can you please elaborate more on the configuration and failure?  The
lockdep splat is identifying a potential deadlock, the chances of
actually hitting it appear to be low and the detection of the deadlock
sequence shouldn't affect the guest.  What device(s) are being assigned,
what version of QEMU, how is FIO being used, is the lockdep issue only
encountered when FIO is run in the guest?

I've not seen similar with lockdep enabled in my testing and I don't
know why/how FIO in the guest would be unique it triggering this
detection.

> > I have done some cursory analysis of the trace and it seems the reason is
> > `vfio_pci_mmap_huge_fault`. I think following scenario is causing the
> > deadlock.
> > 
> >      CPU0                     CPU1                                    CPU2
> > (Trying do                      (Trying to perform                  (Receives fault
> > `vfio_pci_set_msi_trigger())     operation with sysfs               during vfio_pin_pages_remote())
> >                                  /sys/bus/pci/devices/<devid>) 
> >                             
> > ===================================================================================================
> > (A) vdev->memory_lock
> >     (vfio_msi_enable())
> >                                 (C) root->kernfs_rwsem
> >                                     (kernfs_fop_readdir())
> > (B) root->kernfs_rwsem
> >     (kernfs_add_one())
> >                                                                     (E) mm->mmap_lock
> >                                                                         (do_user_addr_fault())
> >                                 (D) mm->mmap_lock
> >                                    (do_user_addr_fault())
> >                                                                     (F) vdev->memory_lock
> >                                                                         (vfio_pci_mmap_huge_fault())

Hmm, it's not evident to me how to resolve this either.  Thanks for the
report, I'll continue to puzzle over it.  Thanks,

Alex

> > Here, there is circular dependency of A->B->C->D->E->F->A.
> > Please let me know if anyone encountered this. I will be happy to help!
> > ---------------------------------------------------------------------------------
> > 
> > [ 1457.982233] ======================================================
> > [ 1457.989494] WARNING: possible circular locking dependency detected
> > [ 1457.996764] 6.15.0-rc1-0af2f6be1b42-1744803490343 #1 Not tainted
> > [ 1458.003842] ------------------------------------------------------
> > [ 1458.011105] CPU 0/KVM/8259 is trying to acquire lock:
> > [ 1458.017107] ff27171d80a8e960 (&root->kernfs_rwsem){++++}-{4:4}, at: kernfs_add_one+0x34/0x380
> > [ 1458.027027]
> > [ 1458.027027] but task is already holding lock:
> > [ 1458.034273] ff27171e19663918 (&vdev->memory_lock){++++}-{4:4}, at: vfio_pci_memory_lock_and_enable+0x2c/0x90 [vfio_pci_core]
> > [ 1458.047221]
> > [ 1458.047221] which lock already depends on the new lock.
> > [ 1458.047221]
> > [ 1458.057506]
> > [ 1458.057506] the existing dependency chain (in reverse order) is:
> > [ 1458.066629]
> > [ 1458.066629] -> #2 (&vdev->memory_lock){++++}-{4:4}:
> > [ 1458.074509]        __lock_acquire+0x52e/0xbe0
> > [ 1458.079778]        lock_acquire+0xc7/0x2e0
> > [ 1458.084764]        down_read+0x35/0x270
> > [ 1458.089437]        vfio_pci_mmap_huge_fault+0xac/0x1c0 [vfio_pci_core]
> > [ 1458.097135]        __do_fault+0x30/0x180
> > [ 1458.101918]        do_shared_fault+0x2d/0x1b0
> > [ 1458.107189]        do_fault+0x41/0x390
> > [ 1458.111779]        __handle_mm_fault+0x2f6/0x730
> > [ 1458.117339]        handle_mm_fault+0xd8/0x2a0
> > [ 1458.122606]        fixup_user_fault+0x7f/0x1d0
> > [ 1458.127963]        vaddr_get_pfns+0x129/0x2b0 [vfio_iommu_type1]
> > [ 1458.135073]        vfio_pin_pages_remote+0xd4/0x430 [vfio_iommu_type1]
> > [ 1458.142771]        vfio_pin_map_dma+0xd4/0x350 [vfio_iommu_type1]
> > [ 1458.149979]        vfio_dma_do_map+0x2dd/0x450 [vfio_iommu_type1]
> > [ 1458.157183]        vfio_iommu_type1_ioctl+0x126/0x1c0 [vfio_iommu_type1]
> > [ 1458.165076]        __x64_sys_ioctl+0x94/0xc0
> > [ 1458.170250]        do_syscall_64+0x72/0x180
> > [ 1458.175320]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [ 1458.181944]
> > [ 1458.181944] -> #1 (&mm->mmap_lock){++++}-{4:4}:
> > [ 1458.189446]        __lock_acquire+0x52e/0xbe0
> > [ 1458.194703]        lock_acquire+0xc7/0x2e0
> > [ 1458.199676]        down_read_killable+0x35/0x280
> > [ 1458.205229]        lock_mm_and_find_vma+0x96/0x280
> > [ 1458.210979]        do_user_addr_fault+0x1da/0x710
> > [ 1458.216638]        exc_page_fault+0x6d/0x200
> > [ 1458.221814]        asm_exc_page_fault+0x26/0x30
> > [ 1458.227274]        filldir64+0xee/0x170
> > [ 1458.231963]        kernfs_fop_readdir+0x102/0x2e0
> > [ 1458.237620]        iterate_dir+0xb1/0x2a0
> > [ 1458.242509]        __x64_sys_getdents64+0x88/0x130
> > [ 1458.248282]        do_syscall_64+0x72/0x180
> > [ 1458.253371]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [ 1458.260006]
> > [ 1458.260006] -> #0 (&root->kernfs_rwsem){++++}-{4:4}:
> > [ 1458.268012]        check_prev_add+0xf1/0xca0
> > [ 1458.273187]        validate_chain+0x610/0x6f0
> > [ 1458.278452]        __lock_acquire+0x52e/0xbe0
> > [ 1458.283711]        lock_acquire+0xc7/0x2e0
> > [ 1458.288678]        down_write+0x32/0x1d0
> > [ 1458.293442]        kernfs_add_one+0x34/0x380
> > [ 1458.298588]        kernfs_create_dir_ns+0x5a/0x90
> > [ 1458.304214]        internal_create_group+0x11e/0x2f0
> > [ 1458.310131]        devm_device_add_group+0x4a/0x90
> > [ 1458.315860]        msi_setup_device_data+0x60/0x110
> > [ 1458.321679]        pci_setup_msi_context+0x19/0x60
> > [ 1458.327398]        __pci_enable_msix_range+0x19d/0x640
> > [ 1458.333513]        pci_alloc_irq_vectors_affinity+0xab/0x110
> > [ 1458.340211]        vfio_pci_set_msi_trigger+0x8c/0x230 [vfio_pci_core]
> > [ 1458.347883]        vfio_pci_core_ioctl+0x2a6/0x420 [vfio_pci_core]
> > [ 1458.355164]        vfio_device_fops_unl_ioctl+0x81/0x140 [vfio]
> > [ 1458.362155]        __x64_sys_ioctl+0x93/0xc0
> > [ 1458.367295]        do_syscall_64+0x72/0x180
> > [ 1458.372336]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [ 1458.378932]
> > [ 1458.378932] other info that might help us debug this:
> > [ 1458.378932]
> > [ 1458.388965] Chain exists of:
> > [ 1458.388965]   &root->kernfs_rwsem --> &mm->mmap_lock --> &vdev->memory_lock
> > [ 1458.388965]
> > [ 1458.402717]  Possible unsafe locking scenario:
> > [ 1458.402717]
> > [ 1458.410064]        CPU0                    CPU1
> > [ 1458.415495]        ----                    ----
> > [ 1458.420939]   lock(&vdev->memory_lock);
> > [ 1458.425597]                                lock(&mm->mmap_lock);
> > [ 1458.432683]                                lock(&vdev->memory_lock);
> > [ 1458.440153]   lock(&root->kernfs_rwsem);
> > [ 1458.444905]
> > [ 1458.444905]  *** DEADLOCK ***
> > [ 1458.444905]
> > [ 1458.452589] 2 locks held by CPU 0/KVM/8259:
> > [ 1458.457627]  #0: ff27171e196636b8 (&vdev->igate){+.+.}-{4:4}, at: vfio_pci_core_ioctl+0x28a/0x420 [vfio_pci_core]
> > [ 1458.469499]  #1: ff27171e19663918 (&vdev->memory_lock){++++}-{4:4}, at: vfio_pci_memory_lock_and_enable+0x2c/0x90 [vfio_pci_core]
> > [ 1458.483306]
> > [ 1458.483306] stack backtrace:
> > [ 1458.488927] CPU: 169 UID: 0 PID: 8259 Comm: CPU 0/KVM Not tainted 6.15.0-rc1-0af2f6be1b42-1744803490343 #1 PREEMPT(voluntary)
> > [ 1458.488933] Hardware name: AMD Corporation RUBY/RUBY, BIOS RRR100EB 12/05/2024
> > [ 1458.488936] Call Trace:
> > [ 1458.488940]  <TASK>
> > [ 1458.488944]  dump_stack_lvl+0x78/0xe0
> > [ 1458.488954]  print_circular_bug+0xd5/0xf0
> > [ 1458.488965]  check_noncircular+0x14c/0x170
> > [ 1458.488970]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.488976]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.488980]  ? find_held_lock+0x32/0x90
> > [ 1458.488986]  ? local_clock_noinstr+0xd/0xc0
> > [ 1458.489001]  check_prev_add+0xf1/0xca0
> > [ 1458.489006]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.489015]  validate_chain+0x610/0x6f0
> > [ 1458.489027]  __lock_acquire+0x52e/0xbe0
> > [ 1458.489032]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.489035]  ? __lock_release+0x15d/0x2a0
> > [ 1458.489046]  lock_acquire+0xc7/0x2e0
> > [ 1458.489051]  ? kernfs_add_one+0x34/0x380
> > [ 1458.489060]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.489063]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.489067]  ? __lock_release+0x15d/0x2a0
> > [ 1458.489080]  down_write+0x32/0x1d0
> > [ 1458.489085]  ? kernfs_add_one+0x34/0x380
> > [ 1458.489090]  kernfs_add_one+0x34/0x380
> > [ 1458.489100]  kernfs_create_dir_ns+0x5a/0x90
> > [ 1458.489107]  internal_create_group+0x11e/0x2f0
> > [ 1458.489118]  devm_device_add_group+0x4a/0x90
> > [ 1458.489128]  msi_setup_device_data+0x60/0x110
> > [ 1458.489136]  pci_setup_msi_context+0x19/0x60
> > [ 1458.489144]  __pci_enable_msix_range+0x19d/0x640
> > [ 1458.489150]  ? pci_conf1_read+0x4e/0xf0
> > [ 1458.489154]  ? find_held_lock+0x32/0x90
> > [ 1458.489162]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.489165]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.489172]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.489176]  ? mark_held_locks+0x40/0x70
> > [ 1458.489182]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.489191]  pci_alloc_irq_vectors_affinity+0xab/0x110
> > [ 1458.489206]  vfio_pci_set_msi_trigger+0x8c/0x230 [vfio_pci_core]
> > [ 1458.489222]  vfio_pci_core_ioctl+0x2a6/0x420 [vfio_pci_core]
> > [ 1458.489231]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 1458.489241]  vfio_device_fops_unl_ioctl+0x81/0x140 [vfio]
> > [ 1458.489252]  __x64_sys_ioctl+0x94/0xc0
> > [ 1458.489262]  do_syscall_64+0x72/0x180
> > [ 1458.489269]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [ 1458.489273] RIP: 0033:0x7f0898724ded
> > [ 1458.489279] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
> > [ 1458.489282] RSP: 002b:00007f08965622a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [ 1458.489286] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0898724ded
> > [ 1458.489289] RDX: 00007f07800f6d00 RSI: 0000000000003b6e RDI: 000000000000001e
> > [ 1458.489291] RBP: 00007f08965622f0 R08: 00007f07800008e0 R09: 0000000000000001
> > [ 1458.489293] R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000000
> > [ 1458.489295] R13: fffffffffffffb28 R14: 0000000000000007 R15: 00007ffd0ef83ae0
> > [ 1458.489315]  </TASK>  
>