[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB543333AD3C81312115686AAA8CA39@BN9PR11MB5433.namprd11.prod.outlook.com>
Date: Thu, 23 Sep 2021 03:10:47 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>,
Alex Williamson <alex.williamson@...hat.com>
CC: "Liu, Yi L" <yi.l.liu@...el.com>, "hch@....de" <hch@....de>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"joro@...tes.org" <joro@...tes.org>,
"jean-philippe@...aro.org" <jean-philippe@...aro.org>,
"parav@...lanox.com" <parav@...lanox.com>,
"lkml@...ux.net" <lkml@...ux.net>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"lushenming@...wei.com" <lushenming@...wei.com>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"corbet@....net" <corbet@....net>,
"Raj, Ashok" <ashok.raj@...el.com>,
"yi.l.liu@...ux.intel.com" <yi.l.liu@...ux.intel.com>,
"Tian, Jun J" <jun.j.tian@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>,
"jacob.jun.pan@...ux.intel.com" <jacob.jun.pan@...ux.intel.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"robin.murphy@....com" <robin.murphy@....com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"dwmw2@...radead.org" <dwmw2@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
"david@...son.dropbear.id.au" <david@...son.dropbear.id.au>,
"nicolinc@...dia.com" <nicolinc@...dia.com>
Subject: RE: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO
> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Thursday, September 23, 2021 7:50 AM
>
> On Wed, Sep 22, 2021 at 03:24:07PM -0600, Alex Williamson wrote:
> > On Sun, 19 Sep 2021 14:38:38 +0800
> > Liu Yi L <yi.l.liu@...el.com> wrote:
> >
> > > +struct iommu_device_info {
> > > + __u32 argsz;
> > > + __u32 flags;
> > > +#define IOMMU_DEVICE_INFO_ENFORCE_SNOOP (1 << 0) /* IOMMU
> enforced snoop */
> >
> > Is this too PCI specific, or perhaps too much of the mechanism rather
Isn't snoop vs. !snoop a general concept not pci specific?
> > than the result? ie. should we just indicate if the IOMMU guarantees
> > coherent DMA? Thanks,
>
> I think the name of "coherent DMA" for this feature inside the kernel
> is very, very confusing. We already have something called coherent dma
> and this usage on Intel has nothing at all to do with that.
>
> In fact it looks like this confusing name has already caused
> implementation problems as I see dma-iommu, is connecting
> dev->dma_coherent to IOMMU_CACHE! eg in dma_info_to_prot(). This is
> completely wrong if IOMMU_CACHE is linked to no_snoop.
>
> And ARM seems to have fallen out of step with x86 as the ARM IOMMU
> drivers are mapping IOMMU_CACHE to ARM_LPAE_PTE_MEMATTR_OIWB,
> ARM_LPAE_MAIR_ATTR_IDX_CACHE
>
> The SMMU spec for ARMv8 is pretty clear:
>
> 13.6.1.1 No_snoop
>
> Support for No_snoop is system-dependent and, if implemented, No_snoop
> transforms a final access attribute of a Normal cacheable type to
> Normal-iNC-oNC-OSH downstream of (or appearing to be performed
> downstream of) the SMMU. No_snoop does not transform a final access
> attribute of any-Device.
>
> Meaning setting ARM_LPAE_MAIR_ATTR_IDX_CACHE from IOMMU_CACHE
> does NOT
> block non-snoop, in fact it *enables* it - the reverse of what Intel
> is doing!
Checking the code:
if (data->iop.fmt == ARM_64_LPAE_S2 ||
data->iop.fmt == ARM_32_LPAE_S2) {
if (prot & IOMMU_MMIO)
pte |= ARM_LPAE_PTE_MEMATTR_DEV;
else if (prot & IOMMU_CACHE)
pte |= ARM_LPAE_PTE_MEMATTR_OIWB;
else
pte |= ARM_LPAE_PTE_MEMATTR_NC;
It does set attribute to WB for IOMMU_CACHE and then NC (Non-cacheable)
for !IOMMU_CACHE. The main difference between Intel and ARM is that Intel
by default allows both snoop and non-snoop traffic with one additional bit
to enforce snoop, while ARM requires explicit SMMU configuration for snoop
and non-snoop respectively.
} else {
if (prot & IOMMU_MMIO)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
else if (prot & IOMMU_CACHE)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
}
same for this one. MAIR_ELx register is programmed to ARM_LPAE_MAIR_
ATTR_WBRWA for IDX_CACHE bit. I'm not sure why it doesn't use
IDX_NC though, when !IOMMU_CACHE.
>
> So this is all a mess.
>
> Better to start clear and unambiguous names in the uAPI and someone
> can try to clean up the kernel eventually.
>
> The required behavior for iommufd is to have the IOMMU ignore the
> no-snoop bit so that Intel HW can disable wbinvd. This bit should be
> clearly documented for its exact purpose and if other arches also have
> instructions that need to be disabled if snoop TLPs are allowed then
> they can re-use this bit. It appears ARM does not have this issue and
> does not need the bit.
Disabling wbinvd is one purpose. imo the more important intention
is that iommu vendor uses different PTE formats between snoop and
!snoop. As long as we want allow userspace to opt in case of isoch
performance requirement (unlike current vfio which always choose
snoop format if available), such mechanism is required for all vendors.
When creating an ioas there could be three snoop modes:
1) snoop for all attached devices;
2) non-snoop for all attached devices;
3) device-selected snoop;
Intel supports 1) <enforce-snoop on> and 3) <enforce-snoop off>. snoop
and nonsnoop devices can be attached to a same ioas in 3).
ARM supports 1) <snoop format> and 2) <nonsnoop format>. snoop devices
and nonsnoop devices must be attached to different ioas's in 1) and 2)
respectively.
Then the device info should reports:
/* iommu enforced snoop */
+#define IOMMU_DEVICE_INFO_ENFORCE_SNOOP (1 << 0)
/* iommu enforced nonsnoop */
+#define IOMMU_DEVICE_INFO_ENFORCE_NONSNOOP (1 << 1)
/* device selected snoop */
+#define IOMMU_DEVICE_INFO_DEVICE_SNOOP (1 << 2)
>
> What ARM is doing with IOMMU_CACHE is unclear to me, and I'm unclear
> if/how iommufd should expose it as a controllable PTE flag. The ARM
>
Based on above analysis I think the ARM usage with IOMMU_CACHE
doesn't change.
Thanks
Kevin
Powered by blists - more mailing lists