linux-kernel - Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ffca1789-1e96-ae01-74a0-942fecb9caac@linux.intel.com>
Date:   Fri, 1 Jul 2022 15:19:14 +0800
From:   Baolu Lu <baolu.lu@...ux.intel.com>
To:     Robin Murphy <robin.murphy@....com>,
        Joerg Roedel <joro@...tes.org>,
        David Woodhouse <dwmw2@...radead.org>
Cc:     baolu.lu@...ux.intel.com, kevin.tian@...el.com,
        ashok.raj@...el.com, linux-kernel@...r.kernel.org,
        iommu@...ts.linux-foundation.org, cai@....pw,
        jacob.jun.pan@...el.com
Subject: Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in
 probe_acpi_namespace_devices()

On 2022/6/29 21:03, Robin Murphy wrote:
> On 2019-06-12 01:28, Lu Baolu wrote:
>> The drhd and device scope list should be iterated with the
>> iommu global lock held. Otherwise, a suspicious RCU usage
>> message will be displayed.
>>
>> [    3.695886] =============================
>> [    3.695917] WARNING: suspicious RCU usage
>> [    3.695950] 5.2.0-rc2+ #2467 Not tainted
>> [    3.695981] -----------------------------
>> [    3.696014] drivers/iommu/intel-iommu.c:4569 suspicious 
>> rcu_dereference_check() usage!
>> [    3.696069]
>>                 other info that might help us debug this:
>>
>> [    3.696126]
>>                 rcu_scheduler_active = 2, debug_locks = 1
>> [    3.696173] no locks held by swapper/0/1.
>> [    3.696204]
>>                 stack backtrace:
>> [    3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467
>> [    3.696370] Call Trace:
>> [    3.696404]  dump_stack+0x85/0xcb
>> [    3.696441]  intel_iommu_init+0x128c/0x13ce
>> [    3.696478]  ? kmem_cache_free+0x16b/0x2c0
>> [    3.696516]  ? __fput+0x14b/0x270
>> [    3.696550]  ? __call_rcu+0xb7/0x300
>> [    3.696583]  ? get_max_files+0x10/0x10
>> [    3.696631]  ? set_debug_rodata+0x11/0x11
>> [    3.696668]  ? e820__memblock_setup+0x60/0x60
>> [    3.696704]  ? pci_iommu_init+0x16/0x3f
>> [    3.696737]  ? set_debug_rodata+0x11/0x11
>> [    3.696770]  pci_iommu_init+0x16/0x3f
>> [    3.696805]  do_one_initcall+0x5d/0x2e4
>> [    3.696844]  ? set_debug_rodata+0x11/0x11
>> [    3.696880]  ? rcu_read_lock_sched_held+0x6b/0x80
>> [    3.696924]  kernel_init_freeable+0x1f0/0x27c
>> [    3.696961]  ? rest_init+0x260/0x260
>> [    3.696997]  kernel_init+0xa/0x110
>> [    3.697028]  ret_from_fork+0x3a/0x50
>>
>> Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space 
>> devices")
>> Signed-off-by: Lu Baolu <baolu.lu@...ux.intel.com>
>> ---
>>   drivers/iommu/intel-iommu.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 19c4c387a3f6..84e650c6a46d 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
>>       cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
>>                 intel_iommu_cpu_dead);
>> +    down_read(&dmar_global_lock);
>>       if (probe_acpi_namespace_devices())
>>           pr_warn("ACPI name space devices didn't probe correctly\n");
>> +    up_read(&dmar_global_lock);
> 
> Doing a bit of archaeology here, is this actually broken? If any ANDD 
> entries exist, we'd end up doing:
> 
>    down_read(&dmar_global_lock)
>    probe_acpi_namespace_devices()
>    -> iommu_probe_device()
>       -> iommu_create_device_direct_mappings()
>          -> iommu_get_resv_regions()
>             -> intel_iommu_get_resv_regions()
>                -> down_read(&dmar_global_lock)
> 
> I'm wondering whether this might explain why my bus_set_iommu series 
> prevented Baolu's machine from booting, since "iommu: Move bus setup to 
> IOMMU device registration" creates the same condition where we end up in 
> get_resv_regions (via bus_iommu_probe() this time) from the same task 
> that already holds dmar_global_lock. Of course that leaves me wondering 
> how it *did* manage to boot OK on my Xeon box, but maybe there's a 
> config difference or dumb luck at play?

This is really problematic. Where does the latest bus_set_iommu series
locate? I'd like to take a closer look at what happened here. Perhaps
two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20
these days.

Best regards,
baolu