[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f1797b1-cd50-3c8d-59ff-8ce82ef1adb4@arm.com>
Date: Thu, 8 Jun 2023 19:02:06 +0100
From: Robin Murphy <robin.murphy@....com>
To: Alexander Duyck <alexander.duyck@...il.com>,
Ashok Raj <ashok_raj@...ux.intel.com>
Cc: Baolu Lu <baolu.lu@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>,
linux-pci <linux-pci@...r.kernel.org>, iommu@...ts.linux.dev,
Ashok Raj <ashok.raj@...el.com>
Subject: Re: Question about reserved_regions w/ Intel IOMMU
On 2023-06-08 18:10, Alexander Duyck wrote:
> On Thu, Jun 8, 2023 at 8:40 AM Ashok Raj <ashok_raj@...ux.intel.com> wrote:
>>
>> On Thu, Jun 08, 2023 at 07:33:31AM -0700, Alexander Duyck wrote:
>>> On Wed, Jun 7, 2023 at 8:05 PM Baolu Lu <baolu.lu@...ux.intel.com> wrote:
>>>>
>>>> On 6/8/23 7:03 AM, Alexander Duyck wrote:
>>>>> On Wed, Jun 7, 2023 at 3:40 PM Alexander Duyck
>>>>> <alexander.duyck@...il.com> wrote:
>>>>>>
>>>>>> I am running into a DMA issue that appears to be a conflict between
>>>>>> ACS and IOMMU. As per the documentation I can find, the IOMMU is
>>>>>> supposed to create reserved regions for MSI and the memory window
>>>>>> behind the root port. However looking at reserved_regions I am not
>>>>>> seeing that. I only see the reservation for the MSI.
>>>>>>
>>>>>> So for example with an enabled NIC and iommu enabled w/o passthru I am seeing:
>>>>>> # cat /sys/bus/pci/devices/0000\:83\:00.0/iommu_group/reserved_regions
>>>>>> 0x00000000fee00000 0x00000000feefffff msi
>>>>>>
>>>>>> Shouldn't there also be a memory window for the region behind the root
>>>>>> port to prevent any possible peer-to-peer access?
>>>>>
>>>>> Since the iommu portion of the email bounced I figured I would fix
>>>>> that and provide some additional info.
>>>>>
>>>>> I added some instrumentation to the kernel to dump the resources found
>>>>> in iova_reserve_pci_windows. From what I can tell it is finding the
>>>>> correct resources for the Memory and Prefetchable regions behind the
>>>>> root port. It seems to be calling reserve_iova which is successfully
>>>>> allocating an iova to reserve the region.
>>>>>
>>>>> However still no luck on why it isn't showing up in reserved_regions.
>>>>
>>>> Perhaps I can ask the opposite question, why it should show up in
>>>> reserve_regions? Why does the iommu subsystem block any possible peer-
>>>> to-peer DMA access? Isn't that a decision of the device driver.
>>>>
>>>> The iova_reserve_pci_windows() you've seen is for kernel DMA interfaces
>>>> which is not related to peer-to-peer accesses.
>>>
>>> The problem is if the IOVA overlaps with the physical addresses of
>>> other devices that can be routed to via ACS redirect. As such if ACS
>>> redirect is enabled a host IOVA could be directed to another device on
>>> the switch instead. To prevent that we need to reserve those addresses
>>> to avoid address space collisions.
>
> Our test case is just to perform DMA to/from the host on one device on
> a switch and what we are seeing is that when we hit an IOVA that
> matches up with the physical address of the neighboring devices BAR0
> then we are seeing an AER followed by a hot reset.
>
>> Any untranslated address from a device must be forwarded to the IOMMU when
>> ACS is enabled correct?I guess if you want true p2p, then you would need
>> to map so that the hpa turns into the peer address.. but its always a round
>> trip to IOMMU.
>
> This assumes all parts are doing the Request Redirect "correctly". In
> our case there is a PCIe switch we are trying to debug and we have a
> few working theories. One concern I have is that the switch may be
> throwing an ACS violation for us using an address that matches a
> neighboring device instead of redirecting it to the upstream port. If
> we pull the switch and just run on the root complex the issue seems to
> be resolved so I started poking into the code which led me to the
> documentation pointing out what is supposed to be reserved based on
> the root complex and MSI regions.
>
> As a part of going down that rabbit hole I realized that the
> reserved_regions seems to only list the MSI reservation. However after
> digging a bit deeper it seems like there is code to reserve the memory
> behind the root complex in the IOVA but it doesn't look like that is
> visible anywhere and is the piece I am currently trying to sort out.
> What I am working on is trying to figure out if the system that is
> failing is actually reserving that memory region in the IOVA, or if
> that is somehow not happening in our test setup.
How old's the kernel? Before 5.11, intel-iommu wasn't hooked up to
iommu-dma so didn't do quite the same thing - it only reserved whatever
specific PCI memory resources existed at boot, rather than the whole
window as iommu-dma does. Either way, ftrace on reserve_iova() (or just
whack a print in there) should suffice to see what's happened.
Robin.
Powered by blists - more mailing lists