[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150501044443.GO24886@voom.redhat.com>
Date: Fri, 1 May 2015 14:44:43 +1000
From: David Gibson <david@...son.dropbear.id.au>
To: Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc: Alexey Kardashevskiy <aik@...abs.ru>,
linuxppc-dev@...ts.ozlabs.org, Paul Mackerras <paulus@...ba.org>,
Alex Williamson <alex.williamson@...hat.com>,
Gavin Shan <gwshan@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple
groups in one container if possible
On Fri, May 01, 2015 at 10:46:08AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-04-30 at 19:33 +1000, Alexey Kardashevskiy wrote:
> > On 04/30/2015 05:22 PM, David Gibson wrote:
> > > On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote:
> > >> At the moment only one group per container is supported.
> > >> POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
> > >> IOMMU group so we can relax this limitation and support multiple groups
> > >> per container.
> > >
> > > It's not obvious why allowing multiple TCE tables per PE has any
> > > pearing on allowing multiple groups per container.
> >
> >
> > This patchset is a global TCE tables rework (patches 1..30, roughly) with 2
> > outcomes:
> > 1. reusing the same IOMMU table for multiple groups - patch 31;
> > 2. allowing dynamic create/remove of IOMMU tables - patch 32.
> >
> > I can remove this one from the patchset and post it separately later but
> > since 1..30 aim to support both 1) and 2), I'd think I better keep them all
> > together (might explain some of changes I do in 1..30).
>
> I think you are talking past each other :-)
>
> But yes, having 2 tables per group is orthogonal to the ability of
> having multiple groups per container.
>
> The latter is made possible on P8 in large part because each PE has its
> own DMA address space (unlike P5IOC2 or P7IOC where a single address
> space is segmented).
>
> Also, on P8 you can actually make the TVT entries point to the same
> table in memory, thus removing the need to duplicate the actual
> tables (though you still have to duplicate the invalidations). I would
> however recommend only sharing the table that way within a chip/node.
>
> .../..
>
> > >>
> > >> -1) Only one IOMMU group per container is supported as an IOMMU group
> > >> -represents the minimal entity which isolation can be guaranteed for and
> > >> -groups are allocated statically, one per a Partitionable Endpoint (PE)
> > >> +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
> > >> +container is supported as an IOMMU table is allocated at the boot time,
> > >> +one table per a IOMMU group which is a Partitionable Endpoint (PE)
> > >> (PE is often a PCI domain but not always).
>
> > > I thought the more fundamental problem was that different PEs tended
> > > to use disjoint bus address ranges, so even by duplicating put_tce
> > > across PEs you couldn't have a common address space.
>
> Yes. This is the problem with P7IOC and earlier. It *could* be doable on
> P7IOC by making them the same PE but let's not go there.
>
> > Sorry, I am not following you here.
> >
> > By duplicating put_tce, I can have multiple IOMMU groups on the same
> > virtual PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple
> > groups per container" does this, the address ranges will the same.
>
> But that is only possible on P8 because only there do we have separate
> address spaces between PEs.
>
> > What I cannot do on p5ioc2 is programming the same table to multiple
> > physical PHBs (or I could but it is very different than IODA2 and pretty
> > ugly and might not always be possible because I would have to allocate
> > these pages from some common pool and face problems like fragmentation).
>
> And P7IOC has a similar issue. The DMA address top bits indexes the
> window on P7IOC within a shared address space. It's possible to
> configure a TVT to cover multiple devices but with very serious
> limitations.
Ok. To check my understanding does this sound reasonable:
* The table_group more-or-less represents a PE, but in a way you can
reference without first knowing the specific IOMMU hardware type.
* When attaching multiple groups to the same container, the first PE
(i.e. table_group) attached is used as a representative so that
subsequent groups can be checked for compatibility with the first
PE and therefore all PEs currently included in the container
- This is why the table_group appears in some places where it
doesn't seem sensible from a pure object ownership point of
view
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists