[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211014154259.GT2744544@nvidia.com>
Date: Thu, 14 Oct 2021 12:42:59 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: "Tian, Kevin" <kevin.tian@...el.com>
Cc: Alex Williamson <alex.williamson@...hat.com>,
"Liu, Yi L" <yi.l.liu@...el.com>, "hch@....de" <hch@....de>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"joro@...tes.org" <joro@...tes.org>,
"jean-philippe@...aro.org" <jean-philippe@...aro.org>,
"parav@...lanox.com" <parav@...lanox.com>,
"lkml@...ux.net" <lkml@...ux.net>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"lushenming@...wei.com" <lushenming@...wei.com>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"corbet@....net" <corbet@....net>,
"Raj, Ashok" <ashok.raj@...el.com>,
"yi.l.liu@...ux.intel.com" <yi.l.liu@...ux.intel.com>,
"Tian, Jun J" <jun.j.tian@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>,
"jacob.jun.pan@...ux.intel.com" <jacob.jun.pan@...ux.intel.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"robin.murphy@....com" <robin.murphy@....com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"dwmw2@...radead.org" <dwmw2@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
"david@...son.dropbear.id.au" <david@...son.dropbear.id.au>,
"nicolinc@...dia.com" <nicolinc@...dia.com>
Subject: Re: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO
On Thu, Oct 14, 2021 at 09:11:58AM +0000, Tian, Kevin wrote:
> But in both cases cache maintenance instructions are available from
> guest p.o.v and no coherency semantics would be violated.
You've described how Intel's solution papers over the problem.
In part wbinvd is defined to restore CPU cache coherence after a
no-snoop DMA. Having wbinvd NOP breaks this contract.
To counter-act the broken wbinvd the IOMMU completely prevents the use
of no-snoop DMA. It converts them to snoop instead.
The driver thinks it has no-snoop. The platform appears to support
no-snoop. The driver issues wbinvd - but all of it does nothing.
Don't think any of this is even remotely related to what ARM is doing
here. ARM has neither the broken VM cache ops, nor the IOMMU ability
to suppress no-snoop.
> > > I think the key is whether other archs allow driver to decide DMA
> > > coherency and indirectly the underlying I/O page table format.
> > > If yes, then I don't see a reason why such decision should not be
> > > given to userspace for passthrough case.
> >
> > The choice all comes down to if the other arches have cache
> > maintenance instructions in the VM that *don't work*
>
> Looks vfio always sets IOMMU_CACHE on all platforms as long as
> iommu supports it (true on all platforms except intel iommu which
> is dedicated for GPU):
>
> vfio_iommu_type1_attach_group()
> {
> ...
> if (iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> domain->prot |= IOMMU_CACHE;
> ...
> }
>
> Should above be set according to whether a device is coherent?
For IOMMU_CACHE there are two questions related to the overloaded
meaning:
- Should VFIO ask the IOMMU to use non-coherent DMA (ARM meaning)
This depends on how the VFIO user expects to operate the DMA.
If the VFIO user can issue cache maintenance ops then IOMMU_CACHE
should be controlled by the user. I have no idea what platforms
support user space cache maintenance ops.
- Should VFIO ask the IOMMU to suppress no-snoop (Intel meaning)
This depends if the VFIO user has access to wbinvd or not.
wbinvd is a privileged instruction so normally userspace will not
be able to access it.
Per Paolo recommendation there should be a uAPI someplace that
allows userspace to issue wbinvd - basically the suppress no-snoop
is also user controllable.
The two things are very similar and ultimately are a choice userspace
should be making.
>From something like a qemu perspective things are more murkey - eg on
ARM qemu needs to co-ordinate with the guest. Whatever IOMMU_CACHE
mode VFIO is using must match the device coherent flag in the Linux
guest. I'm guessing all Linux guest VMs only use coherent DMA for all
devices today. I don't know if the cache maintaince ops are even
permitted in an ARM VM.
Jason
Powered by blists - more mailing lists