[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YJzAsBNF1irJxRGg@yekko>
Date: Thu, 13 May 2021 16:01:20 +1000
From: David Gibson <david@...son.dropbear.id.au>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Alex Williamson <alex.williamson@...hat.com>,
"Liu, Yi L" <yi.l.liu@...el.com>,
Jacob Pan <jacob.jun.pan@...ux.intel.com>,
Auger Eric <eric.auger@...hat.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
"Tian, Kevin" <kevin.tian@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
Joerg Roedel <joro@...tes.org>,
Lu Baolu <baolu.lu@...ux.intel.com>,
David Woodhouse <dwmw2@...radead.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
Tejun Heo <tj@...nel.org>, Li Zefan <lizefan@...wei.com>,
Johannes Weiner <hannes@...xchg.org>,
Jean-Philippe Brucker <jean-philippe@...aro.com>,
Jonathan Corbet <corbet@....net>,
"Raj, Ashok" <ashok.raj@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>,
Alexey Kardashevskiy <aik@...abs.ru>
Subject: Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and
allocation APIs
On Tue, May 04, 2021 at 03:15:37PM -0300, Jason Gunthorpe wrote:
> On Tue, May 04, 2021 at 01:54:55PM +1000, David Gibson wrote:
> > On Mon, May 03, 2021 at 01:05:30PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Apr 29, 2021 at 01:20:22PM +1000, David Gibson wrote:
> > > > > There is a certain appeal to having some
> > > > > 'PPC_TCE_CREATE_SPECIAL_IOASID' entry point that has a wack of extra
> > > > > information like windows that can be optionally called by the viommu
> > > > > driver and it remains well defined and described.
> > > >
> > > > Windows really aren't ppc specific. They're absolutely there on x86
> > > > and everything else as well - it's just that people are used to having
> > > > a window at 0..<something largish> that you can often get away with
> > > > treating it sloppily.
> > >
> > > My point is this detailed control seems to go on to more than just
> > > windows. As you say the vIOMMU is emulating specific HW that needs to
> > > have kernel interfaces to match it exactly.
> >
> > It's really not that bad. The case of emulating the PAPR vIOMMU on
> > something else is relatively easy, because all updates to the IO page
> > tables go through hypercalls. So, as long as the backend IOMMU can
> > map all the IOVAs that the guest IOMMU can, then qemu's implementation
> > of those hypercalls just needs to put an equivalent mapping in the
> > backend, which it can do with a generic VFIO_DMA_MAP.
>
> So you also want the PAPR vIOMMU driver to run on, say, an ARM IOMMU?
Well, I don't want to preclude it in the API. I'm not sure about that
specific example, but in most cases it should be possible to run the
PAPR vIOMMU on an x86 IOMMU backend. Obviously only something you'd
want to do for testing and experimentation, but it could be quite
useful for that.
> > vIOMMUs with page tables in guest memory are harder, but only really
> > in the usual ways that a vIOMMU of that type is harder (needs cache
> > mode or whatever). At whatever point you need to shadow from the
> > guest IO page tables to the host backend, you can again do that with
> > generic maps, as long as the backend supports the necessary IOVAs, and
> > has an IO page size that's equal to or a submultiple of the vIOMMU
> > page size.
>
> But this definitely all becomes HW specific.
>
> For instance I want to have an ARM vIOMMU driver it needs to do some
>
> ret = ioctl(ioasid_fd, CREATE_NESTED_IOASID, [page table format is ARMvXXX])
> if (ret == -EOPNOTSUPP)
> ret = ioctl(ioasid_fd, CREATE_NORMAL_IOASID, ..)
> // and do completely different and more expensive emulation
>
> I can get a little bit of generality, but at the end of the day the
> IOMMU must create a specific HW layout of the nested page table, if it
> can't, it can't.
Erm.. I don't really know how your IOASID interface works here. I'm
thinking about the VFIO interface where maps and unmaps are via
explicit ioctl()s, which provides an obvious point to do translation
between page table formats.
But.. even if you're exposing page tables to userspace.. with hardware
that has explicit support for nesting you can probably expose the hw
tables directly which is great for the cases that works for. But
surely for older IOMMUs which don't do nesting you must have some way
of shadowing guest IO page tables to host IO page tables to translate
GPA to HPA at least? If you're doing that, I don't see that
converting page table format is really any harder
> > > I'm remarking that trying to unify every HW IOMMU implementation that
> > > ever has/will exist into a generic API complete enough to allow the
> > > vIOMMU to be created is likely to result in an API too complicated to
> > > understand..
> >
> > Maybe not every one, but I think we can get a pretty wide range with a
> > reasonable interface.
>
> It sounds like a reasonable guideline is if the feature is actually
> general to all IOMMUs and can be used by qemu as part of a vIOMMU
> emulation when compatible vIOMMU HW is not available.
>
> Having 'requested window' support that isn't actually implemented in
> every IOMMU is going to mean the PAPR vIOMMU emulation won't work,
> defeating the whole point of making things general?
The trick is that you don't necessarily need dynamic window support in
the backend to emulate it in the vIOMMU. If your backend has fixed
windows, then you emulate request window as:
if (requested window is within backend windows)
no-op;
else
return ERROR;
It might not be a theoretically complete emulation of the vIOMMU, but
it can support in-practice usage. In particular it works pretty well
if your backend has a nice big IOVA range (like x86 IOMMUS) but your
guest platform typically uses relatively small IOVA windows. PAPR on
x86 is exactly that... well.. possibly not the 64-bit window, but
because of old PAPR platforms that didn't support that, we can choose
not to advertise that and guests will cope.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists