[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB1886469C0136C6523AB158B68C3B9@MWHPR11MB1886.namprd11.prod.outlook.com>
Date: Fri, 4 Jun 2021 08:38:26 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Alex Williamson <alex.williamson@...hat.com>,
Jason Gunthorpe <jgg@...dia.com>
CC: Jean-Philippe Brucker <jean-philippe@...aro.org>,
"Jiang, Dave" <dave.jiang@...el.com>,
"Raj, Ashok" <ashok.raj@...el.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
Jonathan Corbet <corbet@....net>,
Robin Murphy <robin.murphy@....com>,
LKML <linux-kernel@...r.kernel.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
David Gibson <david@...son.dropbear.id.au>,
Kirti Wankhede <kwankhede@...dia.com>,
"David Woodhouse" <dwmw2@...radead.org>,
Jason Wang <jasowang@...hat.com>
Subject: RE: [RFC] /dev/ioasid uAPI proposal
> From: Alex Williamson <alex.williamson@...hat.com>
> Sent: Friday, June 4, 2021 5:44 AM
>
> > Based on that observation we can say as soon as the user wants to use
> > an IOMMU that does not support DMA_PTE_SNP in the guest we can still
> > share the IO page table with IOMMUs that do support DMA_PTE_SNP.
page table sharing between incompatible IOMMUs is not a critical
thing. I prefer to disallowing sharing in such case as the starting point,
i.e. the user needs to create separate IOASIDs for such devices.
>
> If your goal is to prioritize IO page table sharing, sure. But because
> we cannot atomically transition from one to the other, each device is
> stuck with the pages tables it has, so the history of the VM becomes a
> factor in the performance characteristics.
>
> For example if device {A} is backed by an IOMMU capable of blocking
> no-snoop and device {B} is backed by an IOMMU which cannot block
> no-snoop, then booting VM1 with {A,B} and later removing device {B}
> would result in ongoing wbinvd emulation versus a VM2 only booted with
> {A}.
>
> Type1 would use separate IO page tables (domains/ioasids) for these such
> that VM1 and VM2 have the same characteristics at the end.
>
> Does this become user defined policy in the IOASID model? There's
> quite a mess of exposing sufficient GET_INFO for an IOASID for the user
> to know such properties of the IOMMU, plus maybe we need mapping flags
> equivalent to IOMMU_CACHE exposed to the user, preventing sharing an
> IOASID that could generate IOMMU faults, etc.
IOMMU_CACHE is a fixed attribute given an IOMMU. So it's better to
convey this info to userspace via GET_INFO for a device_label, before
creating any IOASID. But overall I agree that careful thinking is required
about how to organize those info reporting (per-fd, per-device, per-ioasid)
to userspace.
>
> > > > It doesn't solve the problem to connect kvm to AP and kvmgt though
> > >
> > > It does not, we'll probably need a vfio ioctl to gratuitously announce
> > > the KVM fd to each device. I think some devices might currently fail
> > > their open callback if that linkage isn't already available though, so
> > > it's not clear when that should happen, ie. it can't currently be a
> > > VFIO_DEVICE ioctl as getting the device fd requires an open, but this
> > > proposal requires some availability of the vfio device fd without any
> > > setup, so presumably that won't yet call the driver open callback.
> > > Maybe that's part of the attach phase now... I'm not sure, it's not
> > > clear when the vfio device uAPI starts being available in the process
> > > of setting up the ioasid. Thanks,
> >
> > At a certain point we maybe just have to stick to backward compat, I
> > think. Though it is useful to think about green field alternates to
> > try to guide the backward compat design..
>
> I think more to drive the replacement design; if we can't figure out
> how to do something other than backwards compatibility trickery in the
> kernel, it's probably going to bite us. Thanks,
>
I'm a bit lost on the desired flow in your minds. Here is one flow based
on my understanding of this discussion. Please comment whether it
matches your thinking:
0) ioasid_fd is created and registered to KVM via KVM_ADD_IOASID_FD;
1) Qemu binds dev1 to ioasid_fd;
2) Qemu calls IOASID_GET_DEV_INFO for dev1. This will carry IOMMU_
CACHE info i.e. whether underlying IOMMU can enforce snoop;
3) Qemu plans to create a gpa_ioasid, and attach dev1 to it. Here Qemu
needs to figure out whether dev1 wants to do no-snoop. This might
be based a fixed vendor/class list or specified by user;
4) gpa_ioasid = ioctl(ioasid_fd, IOASID_ALLOC); At this point a 'snoop'
flag is specified to decide the page table format, which is supposed
to match dev1;
5) Qemu attaches dev1 to gpa_ioasid via VFIO_ATTACH_IOASID. At this
point, specify snoop/no-snoop again. If not supported by related
iommu or different from what gpa_ioasid has, attach fails.
6) call KVM to update the snoop requirement via KVM_UPADTE_IOASID_FD.
this triggers ioasidfd_for_each_ioasid();
later when dev2 is attached to gpa_ioasid, same flow is followed. This
implies that KVM_UPDATE_IOASID_FD is called only when new IOASID is
created or existing IOASID is destroyed, because all devices under an
IOASID should have the same snoop requirement.
Thanks
Kevin
Powered by blists - more mailing lists