lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 21 Jun 2019 14:19:10 +0100
From:   Chris Wilson <chris@...is-wilson.co.uk>
To:     Peter Xu <peterx@...hat.com>, iommu@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org
Cc:     joro@...tes.org, peterx@...hat.com,
        Lu Baolu <baolu.lu@...ux.intel.com>, dave.jiang@...el.com
Subject: Re: [PATCH] Revert "iommu/vt-d: Fix lock inversion between iommu->lock and
 device_domain_lock"

Quoting Peter Xu (2019-06-21 03:32:05)
> This reverts commit 7560cc3ca7d9d11555f80c830544e463fcdb28b8.
> 
> With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt:
> 
>     ======================================================
>     WARNING: possible circular locking dependency detected
>     5.2.0-rc5 #78 Not tainted
>     ------------------------------------------------------
>     swapper/0/1 is trying to acquire lock:
>     00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0
>     but task is already holding lock:
>     00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
>     which lock already depends on the new lock.
>     the existing dependency chain (in reverse order) is:
>     -> #1 (device_domain_lock){....}:
>            _raw_spin_lock_irqsave+0x3c/0x50
>            dmar_insert_one_dev_info+0xbb/0x510
>            domain_add_dev_info+0x50/0x90
>            dev_prepare_static_identity_mapping+0x30/0x68
>            intel_iommu_init+0xddd/0x1422
>            pci_iommu_init+0x16/0x3f
>            do_one_initcall+0x5d/0x2b4
>            kernel_init_freeable+0x218/0x2c1
>            kernel_init+0xa/0x100
>            ret_from_fork+0x3a/0x50
>     -> #0 (&(&iommu->lock)->rlock){+.+.}:
>            lock_acquire+0x9e/0x170
>            _raw_spin_lock+0x25/0x30
>            domain_context_mapping_one+0xa5/0x4e0
>            pci_for_each_dma_alias+0x30/0x140
>            dmar_insert_one_dev_info+0x3b2/0x510
>            domain_add_dev_info+0x50/0x90
>            dev_prepare_static_identity_mapping+0x30/0x68
>            intel_iommu_init+0xddd/0x1422
>            pci_iommu_init+0x16/0x3f
>            do_one_initcall+0x5d/0x2b4
>            kernel_init_freeable+0x218/0x2c1
>            kernel_init+0xa/0x100
>            ret_from_fork+0x3a/0x50
> 
>     other info that might help us debug this:
>      Possible unsafe locking scenario:
>            CPU0                    CPU1
>            ----                    ----
>       lock(device_domain_lock);
>                                    lock(&(&iommu->lock)->rlock);
>                                    lock(device_domain_lock);
>       lock(&(&iommu->lock)->rlock);
> 
>      *** DEADLOCK ***
>     2 locks held by swapper/0/1:
>      #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422
>      #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
> 
>     stack backtrace:
>     CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78
>     Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018
>     Call Trace:
>      dump_stack+0x85/0xc0
>      print_circular_bug.cold.57+0x15c/0x195
>      __lock_acquire+0x152a/0x1710
>      lock_acquire+0x9e/0x170
>      ? domain_context_mapping_one+0xa5/0x4e0
>      _raw_spin_lock+0x25/0x30
>      ? domain_context_mapping_one+0xa5/0x4e0
>      domain_context_mapping_one+0xa5/0x4e0
>      ? domain_context_mapping_one+0x4e0/0x4e0
>      pci_for_each_dma_alias+0x30/0x140
>      dmar_insert_one_dev_info+0x3b2/0x510
>      domain_add_dev_info+0x50/0x90
>      dev_prepare_static_identity_mapping+0x30/0x68
>      intel_iommu_init+0xddd/0x1422
>      ? printk+0x58/0x6f
>      ? lockdep_hardirqs_on+0xf0/0x180
>      ? do_early_param+0x8e/0x8e
>      ? e820__memblock_setup+0x63/0x63
>      pci_iommu_init+0x16/0x3f
>      do_one_initcall+0x5d/0x2b4
>      ? do_early_param+0x8e/0x8e
>      ? rcu_read_lock_sched_held+0x55/0x60
>      ? do_early_param+0x8e/0x8e
>      kernel_init_freeable+0x218/0x2c1
>      ? rest_init+0x230/0x230
>      kernel_init+0xa/0x100
>      ret_from_fork+0x3a/0x50
> 
> domain_context_mapping_one() is taking device_domain_lock first then
> iommu lock, while dmar_insert_one_dev_info() is doing the reverse.
> 
> That should be introduced by commit:
> 
> 7560cc3ca7d9 ("iommu/vt-d: Fix lock inversion between iommu->lock and
>               device_domain_lock", 2019-05-27)
> 
> So far I still cannot figure out how the previous deadlock was
> triggered (I cannot find iommu lock taken before calling of
> iommu_flush_dev_iotlb()), however I'm pretty sure that that change
> should be incomplete at least because it does not fix all the places
> so we're still taking the locks in different orders, while reverting
> that commit is very clean to me so far that we should always take
> device_domain_lock first then the iommu lock.
> 
> We can continue to try to find the real culprit mentioned in
> 7560cc3ca7d9, but for now I think we should revert it to fix current
> breakage.
> 
> CC: Joerg Roedel <joro@...tes.org>
> CC: Lu Baolu <baolu.lu@...ux.intel.com>
> CC: dave.jiang@...el.com
> Signed-off-by: Peter Xu <peterx@...hat.com>

I've run this through our CI which was also reporting the inversion, so
Tested-by: Chris Wilson <chris@...is-wilson.co.uk>
-Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ