linux-kernel - Re: I got an IOMMU IO page fault. What to do now?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <82fccb9d-43e8-4485-0ddb-7ff260f3ed32@arm.com>
Date:   Wed, 27 Oct 2021 18:18:54 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Paul Menzel <pmenzel@...gen.mpg.de>
Cc:     x86@...nel.org, Xinhui Pan <Xinhui.Pan@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        amd-gfx@...ts.freedesktop.org, iommu@...ts.linux-foundation.org,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Alex Deucher <alexander.deucher@....com>,
        it+linux-iommu@...gen.mpg.de, Thomas Gleixner <tglx@...utronix.de>,
        Christian König <christian.koenig@....com>,
        Christian König <ckoenig.leichtzumerken@...il.com>,
        Jörg Rödel <joro@...tes.org>,
        Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: Re: I got an IOMMU IO page fault. What to do now?

On 27/10/2021 5:45 pm, Paul Menzel wrote:
> Dear Robin,
> 
> 
> On 25.10.21 18:01, Robin Murphy wrote:
>> On 2021-10-25 12:23, Christian König wrote:
> 
>>> not sure how the IOMMU gives out addresses, but the printed ones look 
>>> suspicious to me. Something like we are using an invalid address like 
>>> -1 or similar.
>>
>> FWIW those look like believable DMA addresses to me, assuming that the 
>> DMA mapping APIs are being backed iommu_dma_ops and the device has a 
>> 40-bit DMA mask, since the IOVA allocator works top-down.
>>
>> Likely causes are either a race where the dma_unmap_*() call happens 
>> before the hardware has really stopped accessing the relevant 
>> addresses, or the device's DMA mask has been set larger than it should 
>> be, and thus the upper bits have been truncated in the round-trip 
>> through the hardware.
>>
>> Given the addresses involved, my suspicions would initially lean 
>> towards the latter case - the faults are in the very topmost pages 
>> which imply they're the first things mapped in that range. The other 
>> contributing factor being the trick that the IOVA allocator plays for 
>> PCI devices, where it tries to prefer 32-bit addresses. Thus you're 
>> only likely to see this happen once you already have ~3.5-4GB of live 
>> DMA-mapped memory to exhaust the 32-bit IOVA space (minus some 
>> reserved areas) and start allocating from the full DMA mask. You 
>> should be able to check that with a 5.13 or newer kernel by booting 
>> with "iommu.forcedac=1" and seeing if it breaks immediately 
>> (unfortunately with an older kernel you'd have to manually hack 
>> iommu_dma_alloc_iova() to the same effect).
> 
> I booted Linux 5.15-rc7 with `iommu.forcedac=1` and the system booted, 
> and I could log in remotely over SSH. Please find the Linux kernel 
> messages attached. (The system logs say lightdm failed to start, but it 
> might be some other issue due to a change in the operating system.)

OK, that looks like it's made the GPU blow up straight away, which is 
what I was hoping for (and also appears to reveal another bug where it's 
not handling probe failure very well - possibly trying to remove a 
non-existent audio device?). Lightdm presumably fails to start because 
it doesn't find any display devices, since amdgpu failed to probe.

If you can boot the same kernel without "iommu.forcedac" and get a 
successful probe and working display, that will imply that it is 
managing to work OK with 32-bit DMA addresses, at which point I'd have 
to leave it to Christian and Alex to figure out exactly where DMA 
addresses are getting mangled. The only thing that stands out to me is 
the reference to "gfx_v6_0", which makes me wonder whether it's related 
to gmc_v6_0_sw_init() where a 44-bit DMA mask gets set. If so, that 
would suggest that either this particular model of GPU is more limited 
than expected, or that SoC only has 40 bits of address wired up between 
the PCI host bridge and the IOMMU.

Cheers,
Robin.