lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zk0ftlf3f4gBaNgy@nvidia.com>
Date: Tue, 21 May 2024 15:27:02 -0700
From: Nicolin Chen <nicolinc@...dia.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: <will@...nel.org>, <robin.murphy@....com>, <kevin.tian@...el.com>,
	<suravee.suthikulpanit@....com>, <joro@...tes.org>,
	<linux-kernel@...r.kernel.org>, <iommu@...ts.linux.dev>,
	<linux-arm-kernel@...ts.infradead.org>, <linux-tegra@...r.kernel.org>,
	<yi.l.liu@...el.com>, <eric.auger@...hat.com>, <vasant.hegde@....com>,
	<jon.grimm@....com>, <santosh.shukla@....com>, <Dhaval.Giani@....com>,
	<shameerali.kolothum.thodi@...wei.com>
Subject: Re: [PATCH RFCv1 07/14] iommufd: Add viommu set/unset_dev_id ops

On Tue, May 21, 2024 at 03:24:48PM -0300, Jason Gunthorpe wrote:
> On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > So, you want a proxy S1 domain for a device to attach, in case
> > of a stage-2 only setup, because an S2 domain will no longer has
> > a VMID, since it's shared among viommus. In the SMMU driver case,
> > an arm_smmu_domain won't have an smmu pointer, so a device can't
> > attach to an S2 domain but always an nested S1 domain, right?
> 
> That seems like a simple solution to the VMID lifetime, but it means
> the kernel has to decode more types of vSTE.

Yea. For vSTE=abort, likely we need a nested block domain too?

> > > Functionally we could use that global nesting domain
> > > to deliver the DEV_INVALIDATE too.
> > 
> > If my narrative above is correct, the device is actually still
> > attached to S2 domain via a proxy nested S1 domain. What cache
> > do we need to invalidate except S2 mappings in this case?
> 
> qemu needs a reliable place to send the invalidation commands to (ie
> what ID to provide).
> 
> If we add IOMMU_VIOMMU_INVALIDATE then the ID is the viommu id.
> 
> If we enhance IOMMU_HWPT_INVALIDATE then the ID is the identity
> nesting domain.
> 
> Either case leads to the viommu object in the kernel.

Ooohh! I didn't connect the dots this far. Yes. This turns the
IOMMU_HWPT_INVALIDATE back to the very first version supporting
device cache flush. Though using IOMMU_HWPT_INVALIDATE feels a
bit rule breaking now since it assumes the nested HWPT keeps a
vdev_id lookup table somewhere in its associates...

> I don't know if there is merit one way or the other. A more specific
> API surface is nice, but the two APIs are completely duplicating.
> 
> So maybe:
> 
> #define IOMMU_VIOMMU_INVALIDATE IOMMU_HWPT_INVALIDATE
> 
> As documentation and have the kernel just detect based on the type of
> the passed ID?

Yea, the only difference is viommu_id v.s. hwpt_id that we can
document.

Then in this case, we have two mostly identical uAPIs for the
SMMU driver to use. Should we implement both?

> > > > So again, yes, it makes sense to me that we move viommu and the
> > > > set_dev_id to the nested series, and then drop DEV_INVALIDATE.
> > > 
> > > I would like to do this bit by bit. viommu is a big series on its own.
> > > 
> > > DEV_INVALIDATE is fine, it just can't do ATS invalidation.
> > 
> > I am not very sure about AMD.
> 
> AMD will need the same vRID -> pRID stuff and we want that to run on
> the VIOMMU
> 
> > Same question: any other case can we use the DEV_INVALIDATE for?
> 
> DEV_INVALIDATE was interesting before the viommu idea because
> userspace could process each invalidation command and when it reaches
> ATS it would invoke the correct DEV_INVALIDATE.

Agreed. That helped a lot in VMM dispatching the invalidation
requests.

> With viommu we expect ATS supporting drivers to support viommu and
> then to do vRID -> pRID in the other invalidation paths. In this case
> I don't see a reason to do DEV_INVALIDATE right now.

Yea. I guessed so.

> > > We can add ATS invalidation after either as an enhancement as part of
> > > adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or
> > > both)
> > 
> > Yea, maybe step by step like this:
> > 
> > Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH
> > Part-2 VIOMMU_SET/UNSET_VDEV_ID
> > Part-3 VIOMMU_INVALIDATE
> > Part-4 VQUEUE_ALLOC
> > ...
> 
> So we have this stuff still open:
>  - Identity STE with PASID (part 2b)
>  - IOMMU_GET_HW_INFO (part 3)
>  - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3)
>  - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3)
>  - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3
>  - VIOMMU_ALLOC, VIOMMU_ATTACH
>  - VIOMMU_INVALIDATE
>  - VIOMMU_SET/UNSET_VDEV_ID
>  - VQUEUE_ALLOC / vCMDQ
> 
> I feel like IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 is a reasonable fit
> to part 3. Then part 4 would be VIOMMU_ALLOC -> VIOMMU_SET/UNSET_VDEV_ID
> which brings ATS support the API.

There is some conflict at passing in viommu_id/viommu v.s. parent
hwpt_id/domain for a nested domain allocation. Do you think that
should be addressed later in VIOMMU series v.s. part3?

More specifically, I have two drafts in my viommu series:
87a659e65229 WAR: iommufd: Allow pt_it to carry viommu_id
7c5fd8f50bc9 WAR pass in viommu pointer to domain_alloc_user op

I know that these two only make sense with VIOMMU_ALOC. Yet, will
there be a problem, if we establish nested domain allocation with
parent domain/hwpt by part3, in the uAPI, and then change later?
Will we end up with supporting two for backward compatibility?

> vCMDQ hypervisor support would sit on top of that with just VQUEUE?

Yea.

Thanks
Nicolin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ