[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB54330217565C687A7275AE458C859@BN9PR11MB5433.namprd11.prod.outlook.com>
Date: Wed, 27 Oct 2021 02:32:57 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: David Gibson <david@...son.dropbear.id.au>
CC: "Liu, Yi L" <yi.l.liu@...el.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"hch@....de" <hch@....de>,
"jean-philippe@...aro.org" <jean-philippe@...aro.org>,
"Jiang, Dave" <dave.jiang@...el.com>,
"Raj, Ashok" <ashok.raj@...el.com>,
"corbet@....net" <corbet@....net>,
"jgg@...dia.com" <jgg@...dia.com>,
"parav@...lanox.com" <parav@...lanox.com>,
"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
"lkml@...ux.net" <lkml@...ux.net>,
"dwmw2@...radead.org" <dwmw2@...radead.org>,
"Tian, Jun J" <jun.j.tian@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"lushenming@...wei.com" <lushenming@...wei.com>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"robin.murphy@....com" <robin.murphy@....com>
Subject: RE: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE
> From: David Gibson <david@...son.dropbear.id.au>
> Sent: Monday, October 25, 2021 1:05 PM
>
> > > > For above cases a [base, max] hint can be provided by the user per
> > > > Jason's recommendation.
> > >
> > > Provided at which stage?
> >
> > IOMMU_IOASID_ALLOC
>
> Ok. I have mixed thoughts on this. Doing this at ALLOC time was my
> first instict as well. However with Jason's suggestion that any of a
> number of things could disambiguate multiple IOAS attached to a
> device, I wonder if it makes more sense for consistency to put base
> address at attach time, as with PASID.
In that case the base address provided at attach time is used as an
address space ID similar to PASID, which imho is orthogonal to the
generic [base, size] info for IOAS itself. The 2nd base sort of becomes
an offset on top of the first base in ppc case.
> >
> > regarding live migration with vfio devices, it's still in early stage. there
> > are tons of compatibility check opens to be addressed before it can
> > be widely deployed. this might just add another annoying open to that
> > long list...
>
> So, yes, live migration with VFIO is limited, unfortunately this
> still affects us even if we don't (currently) have VFIO devices. The
> problem arises from the combination of two limitations:
>
> 1) Live migration means that we can't dynamically select guest visible
> IOVA parameters at qemu start up time. We need to get consistent
> guest visible behaviour for a given set of qemu options, so that we
> can migrate between them.
>
> 2) Device hotplug means that we don't know if a PCI domain will have
> VFIO devices on it when we start qemu. So, we don't know if host
> limitations on IOVA ranges will affect the guest or not.
>
> Together these mean that the best we can do is to define a *fixed*
> (per machine type) configuration based on qemu options only. That is,
> defined by the guest platform we're trying to present, only, never
> host capabilities. We can then see if that configuration is possible
> on the host and pass or fail. It's never safe to go the other
> direction and take host capabilities and present those to the guest.
>
That is just one userspace policy. We don't want to design a uAPI
just for a specific userspace implementation. In concept the
userspace could:
1) use DMA-API like map/unmap i.e. letting IOVA address space
managed by the kernel;
* suitable for simple applications e.g. dpdk.
2) manage IOVA address space with *fixed* layout:
* fail device passthrough at MAP_DMA if conflict is detected
between mapped range and device specific IOVA holes
* suitable for VM when live migration is highly concerned
* potential problem with vIOMMU since the guest is unaware
of host constraints thus undefined behavior may occur if
guest IOVA addresses happens to overlap with host IOVA holes.
* ppc is special as you need to claim guest IOVA ranges in
the host. But it's not the case for other emulated IOMMUs.
3) manage IOVA address space with host constraints:
* create IOVA layout by combining qemu options and IOVA holes
of all boot-time passthrough devices
* reject hotplugged device if it has conflicting IOVA holes with
the initial IOVA layout
* suitable for vIOMMU since host constraints can be further
reported to the guest
* suitable for VM w/o live migration requirement, e.g. in many
client virtualization scenarios
* suboptimal with VM live migration with compatibility limitation
Overall the proposed uAPI will provide:
1) a simple DMA-API-like mapping protocol for kernel managed IOVA
address space:
2) a vfio-like mapping protocol for user managed IOVA address space:
a) check IOVA conflict in MAP_DMA ioctl;
b) allows the user to query available IOVA ranges;
Then it's totally user policy on how it wants to utilize those ioctls.
Thanks
Kevin
Powered by blists - more mailing lists