linux-kernel - RE: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN9PR11MB52764F692E78C1B396D175DD8CB8A@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Fri, 24 Nov 2023 03:00:45 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Jason Gunthorpe <jgg@...dia.com>
CC:     "Liu, Yi L" <yi.l.liu@...el.com>,
        "joro@...tes.org" <joro@...tes.org>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "robin.murphy@....com" <robin.murphy@....com>,
        "baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
        "cohuck@...hat.com" <cohuck@...hat.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "nicolinc@...dia.com" <nicolinc@...dia.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "mjrosato@...ux.ibm.com" <mjrosato@...ux.ibm.com>,
        "chao.p.peng@...ux.intel.com" <chao.p.peng@...ux.intel.com>,
        "yi.y.sun@...ux.intel.com" <yi.y.sun@...ux.intel.com>,
        "peterx@...hat.com" <peterx@...hat.com>,
        "jasowang@...hat.com" <jasowang@...hat.com>,
        "shameerali.kolothum.thodi@...wei.com" 
        <shameerali.kolothum.thodi@...wei.com>,
        "lulu@...hat.com" <lulu@...hat.com>,
        "suravee.suthikulpanit@....com" <suravee.suthikulpanit@....com>,
        "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        "Duan, Zhenzhong" <zhenzhong.duan@...el.com>,
        "joao.m.martins@...cle.com" <joao.m.martins@...cle.com>,
        "Zeng, Xin" <xin.zeng@...el.com>,
        "Zhao, Yan Y" <yan.y.zhao@...el.com>
Subject: RE: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1
 cache invalidation

> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Wednesday, November 22, 2023 9:26 PM
> 
> On Wed, Nov 22, 2023 at 04:58:24AM +0000, Tian, Kevin wrote:
> > then we just define hwpt 'cache' invalidation in vtd always refers to
> > both iotlb and devtlb. Then viommu just needs to call invalidation
> > uapi once when emulating virtual iotlb invalidation descriptor
> > while emulating the following devtlb invalidation descriptor
> > as a nop.
> 
> In principle ATC and IOMMU TLB invalidations should not always be
> linked.
> 
> Any scenario that allows devices to share an IOTLB cache tag requires
> fewer IOMMU TLB invalidations than ATC invalidations.

as long as the host iommu driver has the same knowledge then it will
always do the right thing.

e.g. one iotlb entry shared by 4 devices.

guest issues:
	1) iotlb invalidation
	2) devtlb invalidation for dev1
	3) devtlb invalidation for dev2
	4) devtlb invalidation for dev3
	5) devtlb invalidation for dev4

intel-viommu calls HWPT cache invalidation for 1) and treats 2-5) as nop.

intel-iommu driver internally knows the iotlb is shared by 4 devices (given
the same domain is attached to those devices) to handle HWPT
cache invalidation:

	1) iotlb invalidation
	2) devtlb invalidation for dev1
	3) devtlb invalidation for dev2
	4) devtlb invalidation for dev3
	5) devtlb invalidation for dev4

this is a good optimization by reducing 5 syscalls to 1, with the 
assumption that the guest shouldn't expect any deterministic
behavior before 5) is completed to bring iotlb/devtlbs in sync.

another alternative is to have guest batch 1-5) in one request which
allows viommu to batch them in one invalidation call too. But
this is an orthogonal optimization in guest which we don't want
to rely on.

> 
> I like the view of this invalidation interface as reflecting the
> actual HW and not trying to be smarter an real HW.

the guest-oriented interface e.g. viommu reflects the HW.

uAPI is kind of viommu internal implementation. IMHO it's not
a bad thing to make it smarter as long as no guest observable
breakage.

> 
> I'm fully expecting that Intel will adopt an direct-DMA flush queue
> like SMMU and AMD have already done as a performance optimization. In
> this world it makes no sense that the behavior of the direct DMA queue
> and driver mediated queue would be different.
> 

that's a orthogonal topic. I don't think the value of direct-DMA flush
queue should prevent possible optimization in the mediation path
(as long as guest-expected deterministic behavior is sustained).