[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BN9PR11MB543328B13905017E11355AC78CB99@BN9PR11MB5433.namprd11.prod.outlook.com>
Date: Fri, 15 Oct 2021 01:01:38 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: Alex Williamson <alex.williamson@...hat.com>,
"Liu, Yi L" <yi.l.liu@...el.com>, "hch@....de" <hch@....de>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"joro@...tes.org" <joro@...tes.org>,
"jean-philippe@...aro.org" <jean-philippe@...aro.org>,
"parav@...lanox.com" <parav@...lanox.com>,
"lkml@...ux.net" <lkml@...ux.net>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"lushenming@...wei.com" <lushenming@...wei.com>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"corbet@....net" <corbet@....net>,
"Raj, Ashok" <ashok.raj@...el.com>,
"yi.l.liu@...ux.intel.com" <yi.l.liu@...ux.intel.com>,
"Tian, Jun J" <jun.j.tian@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>,
"jacob.jun.pan@...ux.intel.com" <jacob.jun.pan@...ux.intel.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"robin.murphy@....com" <robin.murphy@....com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"dwmw2@...radead.org" <dwmw2@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
"david@...son.dropbear.id.au" <david@...son.dropbear.id.au>,
"nicolinc@...dia.com" <nicolinc@...dia.com>
Subject: RE: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO
> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Thursday, October 14, 2021 11:43 PM
>
> > > > I think the key is whether other archs allow driver to decide DMA
> > > > coherency and indirectly the underlying I/O page table format.
> > > > If yes, then I don't see a reason why such decision should not be
> > > > given to userspace for passthrough case.
> > >
> > > The choice all comes down to if the other arches have cache
> > > maintenance instructions in the VM that *don't work*
> >
> > Looks vfio always sets IOMMU_CACHE on all platforms as long as
> > iommu supports it (true on all platforms except intel iommu which
> > is dedicated for GPU):
> >
> > vfio_iommu_type1_attach_group()
> > {
> > ...
> > if (iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> > domain->prot |= IOMMU_CACHE;
> > ...
> > }
> >
> > Should above be set according to whether a device is coherent?
>
> For IOMMU_CACHE there are two questions related to the overloaded
> meaning:
>
> - Should VFIO ask the IOMMU to use non-coherent DMA (ARM meaning)
> This depends on how the VFIO user expects to operate the DMA.
> If the VFIO user can issue cache maintenance ops then IOMMU_CACHE
> should be controlled by the user. I have no idea what platforms
> support user space cache maintenance ops.
But just like you said for intel meaning below, even if those ops are
privileged a uAPI can be provided to support such usage if necessary.
>
> - Should VFIO ask the IOMMU to suppress no-snoop (Intel meaning)
> This depends if the VFIO user has access to wbinvd or not.
>
> wbinvd is a privileged instruction so normally userspace will not
> be able to access it.
>
> Per Paolo recommendation there should be a uAPI someplace that
> allows userspace to issue wbinvd - basically the suppress no-snoop
> is also user controllable.
>
> The two things are very similar and ultimately are a choice userspace
> should be making.
yes
>
> From something like a qemu perspective things are more murkey - eg on
> ARM qemu needs to co-ordinate with the guest. Whatever IOMMU_CACHE
> mode VFIO is using must match the device coherent flag in the Linux
> guest. I'm guessing all Linux guest VMs only use coherent DMA for all
> devices today. I don't know if the cache maintaince ops are even
> permitted in an ARM VM.
>
I'll leave it to Jean to confirm. If only coherent DMA can be used in
the guest on other platforms, suppose VFIO should not blindly set
IOMMU_CACHE and in concept it should deny assigning a non-coherent
device since no co-ordination with guest exists today.
So the bottomline is that we'll keep this no-snoop thing Intel-specific.
For the basic skeleton we'll not support no-snoop thus the user
needs to set enforce-snoop flag when creating an IOAS like this RFC v1
does. Also need to introduce a new flag instead of abusing
IOMMU_CACHE in the kernel. For other platforms it may need a fix
to deny non-coherent device (based on above open) for now.
Thanks
Kevin
Powered by blists - more mailing lists