[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1cf01e76-88d2-c65f-0b54-b85e6da0d720@arm.com>
Date: Fri, 1 Jul 2022 09:18:08 +0100
From: Robin Murphy <robin.murphy@....com>
To: Baolu Lu <baolu.lu@...ux.intel.com>,
Joerg Roedel <joro@...tes.org>,
David Woodhouse <dwmw2@...radead.org>
Cc: kevin.tian@...el.com, ashok.raj@...el.com,
linux-kernel@...r.kernel.org, iommu@...ts.linux-foundation.org,
cai@....pw, jacob.jun.pan@...el.com
Subject: Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in
probe_acpi_namespace_devices()
On 2022-07-01 08:19, Baolu Lu wrote:
> On 2022/6/29 21:03, Robin Murphy wrote:
>> On 2019-06-12 01:28, Lu Baolu wrote:
>>> The drhd and device scope list should be iterated with the
>>> iommu global lock held. Otherwise, a suspicious RCU usage
>>> message will be displayed.
>>>
>>> [ 3.695886] =============================
>>> [ 3.695917] WARNING: suspicious RCU usage
>>> [ 3.695950] 5.2.0-rc2+ #2467 Not tainted
>>> [ 3.695981] -----------------------------
>>> [ 3.696014] drivers/iommu/intel-iommu.c:4569 suspicious
>>> rcu_dereference_check() usage!
>>> [ 3.696069]
>>> other info that might help us debug this:
>>>
>>> [ 3.696126]
>>> rcu_scheduler_active = 2, debug_locks = 1
>>> [ 3.696173] no locks held by swapper/0/1.
>>> [ 3.696204]
>>> stack backtrace:
>>> [ 3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+
>>> #2467
>>> [ 3.696370] Call Trace:
>>> [ 3.696404] dump_stack+0x85/0xcb
>>> [ 3.696441] intel_iommu_init+0x128c/0x13ce
>>> [ 3.696478] ? kmem_cache_free+0x16b/0x2c0
>>> [ 3.696516] ? __fput+0x14b/0x270
>>> [ 3.696550] ? __call_rcu+0xb7/0x300
>>> [ 3.696583] ? get_max_files+0x10/0x10
>>> [ 3.696631] ? set_debug_rodata+0x11/0x11
>>> [ 3.696668] ? e820__memblock_setup+0x60/0x60
>>> [ 3.696704] ? pci_iommu_init+0x16/0x3f
>>> [ 3.696737] ? set_debug_rodata+0x11/0x11
>>> [ 3.696770] pci_iommu_init+0x16/0x3f
>>> [ 3.696805] do_one_initcall+0x5d/0x2e4
>>> [ 3.696844] ? set_debug_rodata+0x11/0x11
>>> [ 3.696880] ? rcu_read_lock_sched_held+0x6b/0x80
>>> [ 3.696924] kernel_init_freeable+0x1f0/0x27c
>>> [ 3.696961] ? rest_init+0x260/0x260
>>> [ 3.696997] kernel_init+0xa/0x110
>>> [ 3.697028] ret_from_fork+0x3a/0x50
>>>
>>> Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space
>>> devices")
>>> Signed-off-by: Lu Baolu <baolu.lu@...ux.intel.com>
>>> ---
>>> drivers/iommu/intel-iommu.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>> index 19c4c387a3f6..84e650c6a46d 100644
>>> --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
>>> cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead",
>>> NULL,
>>> intel_iommu_cpu_dead);
>>> + down_read(&dmar_global_lock);
>>> if (probe_acpi_namespace_devices())
>>> pr_warn("ACPI name space devices didn't probe correctly\n");
>>> + up_read(&dmar_global_lock);
>>
>> Doing a bit of archaeology here, is this actually broken? If any ANDD
>> entries exist, we'd end up doing:
>>
>> down_read(&dmar_global_lock)
>> probe_acpi_namespace_devices()
>> -> iommu_probe_device()
>> -> iommu_create_device_direct_mappings()
>> -> iommu_get_resv_regions()
>> -> intel_iommu_get_resv_regions()
>> -> down_read(&dmar_global_lock)
>>
>> I'm wondering whether this might explain why my bus_set_iommu series
>> prevented Baolu's machine from booting, since "iommu: Move bus setup
>> to IOMMU device registration" creates the same condition where we end
>> up in get_resv_regions (via bus_iommu_probe() this time) from the same
>> task that already holds dmar_global_lock. Of course that leaves me
>> wondering how it *did* manage to boot OK on my Xeon box, but maybe
>> there's a config difference or dumb luck at play?
>
> This is really problematic. Where does the latest bus_set_iommu series
> locate? I'd like to take a closer look at what happened here. Perhaps
> two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20
> these days.
I've prepared an up-to-date series here:
https://gitlab.arm.com/linux-arm/linux-rm/-/tree/bus-set-iommu-v3
but I've been hesitant to post it without trying to make *some* progress
on your breakage. I think last time I was just testing with
x86_64_defconfig, so I'll double-check it with lockdep this afternoon.
Thanks,
Robin.
Powered by blists - more mailing lists