linux-kernel - Re: [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231213124055.GR3014157@nvidia.com>
Date:   Wed, 13 Dec 2023 08:40:55 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Nicolin Chen <nicolinc@...dia.com>
Cc:     Yi Liu <yi.l.liu@...el.com>,
        "Giani, Dhaval" <Dhaval.Giani@....com>,
        Vasant Hegde <vasant.hegde@....com>,
        Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
        joro@...tes.org, alex.williamson@...hat.com, kevin.tian@...el.com,
        robin.murphy@....com, baolu.lu@...ux.intel.com, cohuck@...hat.com,
        eric.auger@...hat.com, kvm@...r.kernel.org, mjrosato@...ux.ibm.com,
        chao.p.peng@...ux.intel.com, yi.y.sun@...ux.intel.com,
        peterx@...hat.com, jasowang@...hat.com,
        shameerali.kolothum.thodi@...wei.com, lulu@...hat.com,
        iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, zhenzhong.duan@...el.com,
        joao.m.martins@...cle.com, xin.zeng@...el.com, yan.y.zhao@...el.com
Subject: Re: [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2)

On Tue, Dec 12, 2023 at 12:05:41PM -0800, Nicolin Chen wrote:
> > > // iommufd_private.h
> > > 
> > > enum iommufd_object_type {
> > > 	...
> > > +	IOMMUFD_OBJ_VIOMMU,
> > > 	...
> > > };
> > > 
> > > +struct iommufd_viommu {
> > > +	struct iommufd_object obj;
> > > +	struct iommufd_hwpt_paging *hwpt;
> > > +	struct xarray devices;
> > > +};
> > > 
> > > struct iommufd_hwpt_paging hwpt {
> > > 	...
> > > +	struct list_head viommu_list;
> > > 	...
> > > };
> > 
> > I'd probably first try to go backwards and link the hwpt to the
> > viommu.
> 
> I think a VM should have only one hwpt_paging object while one
> or more viommu objects, so we could do only viommu->hwpt_paging
> and hwpt_paging->viommu_list. How to go backwards?

That is probably how things would work but I don't know if it makes
sense to enforce it in the kernel logic..

Point the S2 to a list of viommu objects it is linked to

> > > struct iommufd_group {
> > > 	...
> > > +	struct iommufd_viommu *viommu; // should we attach to viommu instead of hwpt?
> > > 	...
> > > };
> > 
> > No. Attach is a statement of translation so you still attach to the HWPT.
> 
> OK. It's probably not necessary since we know which piommu the
> device is behind. And we only need to link viommu and piommu,
> right?

Yes

> > The second version maybe we have the xarray, or maybe we just push the
> > xarray to the eventual viommu series.
> 
> I think that I still don't get the purpose of the xarray here.
> It was needed previously because a cache invalidate per hwpt
> doesn't know which device. Now IOMMUFD_DEV_INVALIDATE knows.
> 
> Maybe it's related to that narrative "logically we could have
> multiple mappings per iommufd" that you mentioned above. Mind
> elaborating a bit?
> 
> In my mind, viommu is allocated by VMM per piommu, by detecting
> the piommu_id via hw_info. In that case, viommu can only have
> one unique device list. If IOMMUFD_DEV_INVALIDATE passes in the
> dev_id, we don't really need a mapping of vRID-pRID in a multi-
> viommu case either? In another word, VMM already has a mapping
> from vRID to dev_id, so it could call the DEV_INVALIDATE ioctl
> in the first place?

The xarray exists to optimize the invalidation flow.

For SW you can imagine issuing an invalidation against the viommu
itself and *all* commands, be they ASID or ATC invalidations can be
processed in one shot. The xarray allows converting the vSID to pSID
to process ATC invalidations, and the viommu object forces a single
VMID to handle the ATC invalidations. If we want to do this, I don't
know.

For HW, the xarray holds the vSID to pSID mapping that must be
programmed into the HW operating the dedicated invalidation queue.

Jason