[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <809b0d12-c5ce-2364-268f-f0c4564414c9@arm.com>
Date: Fri, 24 Jun 2022 15:49:47 +0100
From: Robin Murphy <robin.murphy@....com>
To: Joerg Roedel <joro@...tes.org>
Cc: will@...nel.org, iommu@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org, hch@....de, john.garry@...wei.com
Subject: Re: [PATCH v2] iommu/dma: Add config for PCI SAC address trick
On 2022-06-24 14:28, Joerg Roedel wrote:
> On Thu, Jun 23, 2022 at 12:41:00PM +0100, Robin Murphy wrote:
>> On 2022-06-23 12:33, Joerg Roedel wrote:
>>> On Wed, Jun 22, 2022 at 02:12:39PM +0100, Robin Murphy wrote:
>>>> Thanks for your bravery!
>>>
>>> It already starts, with that patch I am getting:
>>>
>>> xhci_hcd 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0xff00ffffffefe000 flags=0x0000]
>>>
>>> In my kernel log. The device is an AMD XHCI controller and seems to
>>> funciton normally after boot. The message disappears with
>>> iommu.forcedac=0.
>>>
>>> Need to look more into that...
>>
>> Given how amd_iommu_domain_alloc() sets the domain aperture, presumably the
>> DMA address allocated was 0xffffffffffefe000? Odd that it gets bits punched
>> out in the middle rather than simply truncated off the top as I would have
>> expected :/
>
> So even more weird, as a workaround I changed the AMD IOMMU driver to
> allocate a 4-level page-table and limit the DMA aperture to 48 bits. I
> still get the same message.
Hmm, in that case my best guess would be that somewhere between the
device itself and the IOMMU input it's trying to sign-extend the address
from bit 47 or lower, but for whatever reason bits 55:48 get lost.
Comparing the PCI xHCI I have to hand, mine (with nothing plugged in)
only has 6 pages mapped for its command ring and other stuff. Thus
unless it's sharing that domain with other devices, to be accessing
something down in the second MB of IOVA space suggests that this
probably isn't the very first access it's made, and therefore it would
almost certainly have to be the endpoint emitting a corrupted address,
but only for certain operations.
FWIW I'd be inclined to turn on DMA debug and call
debug_dma_dump_mappings() from the IOMMU fault handler, and/or add a bit
of tracing to all the DMA mapping/allocation sites in the xHCI driver,
to see what the offending address most likely represents.
Robin.
Powered by blists - more mailing lists