lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UfTzExYZGNCEXCJaS7huWDxwoC3Z_2JCzJHAgr9Qyxmsg@mail.gmail.com>
Date:   Thu, 8 Jun 2023 10:10:54 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Ashok Raj <ashok_raj@...ux.intel.com>
Cc:     Baolu Lu <baolu.lu@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-pci <linux-pci@...r.kernel.org>, iommu@...ts.linux.dev,
        Ashok Raj <ashok.raj@...el.com>
Subject: Re: Question about reserved_regions w/ Intel IOMMU

On Thu, Jun 8, 2023 at 8:40 AM Ashok Raj <ashok_raj@...ux.intel.com> wrote:
>
> On Thu, Jun 08, 2023 at 07:33:31AM -0700, Alexander Duyck wrote:
> > On Wed, Jun 7, 2023 at 8:05 PM Baolu Lu <baolu.lu@...ux.intel.com> wrote:
> > >
> > > On 6/8/23 7:03 AM, Alexander Duyck wrote:
> > > > On Wed, Jun 7, 2023 at 3:40 PM Alexander Duyck
> > > > <alexander.duyck@...il.com> wrote:
> > > >>
> > > >> I am running into a DMA issue that appears to be a conflict between
> > > >> ACS and IOMMU. As per the documentation I can find, the IOMMU is
> > > >> supposed to create reserved regions for MSI and the memory window
> > > >> behind the root port. However looking at reserved_regions I am not
> > > >> seeing that. I only see the reservation for the MSI.
> > > >>
> > > >> So for example with an enabled NIC and iommu enabled w/o passthru I am seeing:
> > > >> # cat /sys/bus/pci/devices/0000\:83\:00.0/iommu_group/reserved_regions
> > > >> 0x00000000fee00000 0x00000000feefffff msi
> > > >>
> > > >> Shouldn't there also be a memory window for the region behind the root
> > > >> port to prevent any possible peer-to-peer access?
> > > >
> > > > Since the iommu portion of the email bounced I figured I would fix
> > > > that and provide some additional info.
> > > >
> > > > I added some instrumentation to the kernel to dump the resources found
> > > > in iova_reserve_pci_windows. From what I can tell it is finding the
> > > > correct resources for the Memory and Prefetchable regions behind the
> > > > root port. It seems to be calling reserve_iova which is successfully
> > > > allocating an iova to reserve the region.
> > > >
> > > > However still no luck on why it isn't showing up in reserved_regions.
> > >
> > > Perhaps I can ask the opposite question, why it should show up in
> > > reserve_regions? Why does the iommu subsystem block any possible peer-
> > > to-peer DMA access? Isn't that a decision of the device driver.
> > >
> > > The iova_reserve_pci_windows() you've seen is for kernel DMA interfaces
> > > which is not related to peer-to-peer accesses.
> >
> > The problem is if the IOVA overlaps with the physical addresses of
> > other devices that can be routed to via ACS redirect. As such if ACS
> > redirect is enabled a host IOVA could be directed to another device on
> > the switch instead. To prevent that we need to reserve those addresses
> > to avoid address space collisions.

Our test case is just to perform DMA to/from the host on one device on
a switch and what we are seeing is that when we hit an IOVA that
matches up with the physical address of the neighboring devices BAR0
then we are seeing an AER followed by a hot reset.

> Any untranslated address from a device must be forwarded to the IOMMU when
> ACS is enabled correct?I guess if you want true p2p, then you would need
> to map so that the hpa turns into the peer address.. but its always a round
> trip to IOMMU.

This assumes all parts are doing the Request Redirect "correctly". In
our case there is a PCIe switch we are trying to debug and we have a
few working theories. One concern I have is that the switch may be
throwing an ACS violation for us using an address that matches a
neighboring device instead of redirecting it to the upstream port. If
we pull the switch and just run on the root complex the issue seems to
be resolved so I started poking into the code which led me to the
documentation pointing out what is supposed to be reserved based on
the root complex and MSI regions.

As a part of going down that rabbit hole I realized that the
reserved_regions seems to only list the MSI reservation. However after
digging a bit deeper it seems like there is code to reserve the memory
behind the root complex in the IOVA but it doesn't look like that is
visible anywhere and is the piece I am currently trying to sort out.
What I am working on is trying to figure out if the system that is
failing is actually reserving that memory region in the IOVA, or if
that is somehow not happening in our test setup.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ