lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB52764B70054A41BC169AFB318C709@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Thu, 9 Dec 2021 02:58:28 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Jason Gunthorpe <jgg@...dia.com>,
        Jean-Philippe Brucker <jean-philippe@...aro.org>
CC:     Eric Auger <eric.auger@...hat.com>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        Joerg Roedel <joro@...tes.org>,
        "peter.maydell@...aro.org" <peter.maydell@...aro.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "vivek.gautam@....com" <vivek.gautam@....com>,
        "kvmarm@...ts.cs.columbia.edu" <kvmarm@...ts.cs.columbia.edu>,
        "eric.auger.pro@...il.com" <eric.auger.pro@...il.com>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "maz@...nel.org" <maz@...nel.org>,
        "vsethi@...dia.com" <vsethi@...dia.com>,
        "zhangfei.gao@...aro.org" <zhangfei.gao@...aro.org>,
        "will@...nel.org" <will@...nel.org>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "wangxingang5@...wei.com" <wangxingang5@...wei.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "lushenming@...wei.com" <lushenming@...wei.com>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "robin.murphy@....com" <robin.murphy@....com>
Subject: RE: [RFC v16 1/9] iommu: Introduce attach/detach_pasid_table API

> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Thursday, December 9, 2021 2:31 AM
> 
> On Wed, Dec 08, 2021 at 05:20:39PM +0000, Jean-Philippe Brucker wrote:
> > On Wed, Dec 08, 2021 at 08:56:16AM -0400, Jason Gunthorpe wrote:
> > > From a progress perspective I would like to start with simple 'page
> > > tables in userspace', ie no PASID in this step.
> > >
> > > 'page tables in userspace' means an iommufd ioctl to create an
> > > iommu_domain where the IOMMU HW is directly travesering a
> > > device-specific page table structure in user space memory. All the HW
> > > today implements this by using another iommu_domain to allow the
> IOMMU
> > > HW DMA access to user memory - ie nesting or multi-stage or whatever.
> > >
> > > This would come along with some ioctls to invalidate the IOTLB.
> > >
> > > I'm imagining this step as a iommu_group->op->create_user_domain()
> > > driver callback which will create a new kind of domain with
> > > domain-unique ops. Ie map/unmap related should all be NULL as those
> > > are impossible operations.
> > >
> > > From there the usual struct device (ie RID) attach/detatch stuff needs
> > > to take care of routing DMAs to this iommu_domain.
> > >
> > > Step two would be to add the ability for an iommufd using driver to
> > > request that a RID&PASID is connected to an iommu_domain. This
> > > connection can be requested for any kind of iommu_domain, kernel
> owned
> > > or user owned.
> > >
> > > I don't quite have an answer how exactly the SMMUv3 vs Intel
> > > difference in PASID routing should be resolved.
> >
> > In SMMUv3 the user pgd is always stored in the PASID table (actually
> > called "context descriptor table" but I want to avoid confusion with
> > the VT-d "context table"). And to access the PASID table, the SMMUv3 first
> > translate its GPA into a PA using the stage-2 page table. For userspace to
> > pass individual pgds to the kernel, as opposed to passing whole PASID
> > tables, the host kernel needs to reserve GPA space and map it in stage-2,
> > so it can store the PASID table in there. Userspace manages GPA space.
> 
> It is what I thought.. So in the SMMUv3 spec the STE is completely in
> kernel memory, but it points to an S1ContextPtr that must be an IPA if
> the "stage 1 translation tables" are IPA. Only via S1ContextPtr can we
> decode the substream?
> 
> So in SMMUv3 land we don't really ever talk about PASID, we have a
> 'user page table' that is bound to an entire RID and *all* PASIDs.
> 
> While Intel would have a 'user page table' that is only bound to a RID
> & PASID
> 
> Certianly it is not a difference we can hide from userspace.

Concept-wise it is still a 'user page table' with vendor specific format.

Taking your earlier analog it's just for a single 84-bit address space
(20PASID+64bitVA) per RID.

So what we requires is still one unified ioctl in your step-1 to support
per-RID 'user page table'.

For ARM it's SMMU's PASID table format. There is no step-2 since PASID
is already within the address space covered by the user PASID table.

For Intel it's VT-d's 1st level page table format. When moving to step-2
then allows multiple 'user page tables' connected to RID & PASID.

> 
> > This would be easy for a single pgd. In this case the PASID table has a
> > single entry and userspace could just pass one GPA page during
> > registration. However it isn't easily generalized to full PASID support,
> > because managing a multi-level PASID table will require runtime GPA
> > allocation, and that API is awkward. That's why we opted for "attach PASID
> > table" operation rather than "attach page table" (back then the choice was
> > easy since VT-d used the same concept).
> 
> I think the entire context descriptor table should be in userspace,
> and filled in by userspace, as part of the userspace page table.
> 
> The kernel API should accept the S1ContextPtr IPA and all the parts of
> the STE that relate to the defining the layout of what the S1Context
> points to an thats it.
> 
> We should have another mode where the kernel owns everything, and the
> S1ContexPtr is a PA with Stage 2 bypassed.

I guess this is for the usage like DPDK. In that case yes we can have
unified ioctl since the kernel manages everything including the PASID
table. 

> 
> That part is fine, the more open question is what does the driver
> interface look like when userspace tell something like vfio-pci to
> connect to this thing. At some level the attaching device needs to
> authorize iommufd to take the entire PASID table and RID.

as long as smmu driver advocates only supporting step-1 ioctl,
then this authorization should be implied already.

> 
> Specifically we cannot use this thing with a mdev, while the Intel
> version of a userspace page table can be.

yes. Supporting mdev is all the reason why Intel puts the PASID
table in host physical address space to be managed by the kernel.

> 
> Maybe that is just some 'allow whole device' flag in an API
> 

As said, I feel this special flag is not required as long as the 
vendor iommu driver only supports your step-1 interface which
implies 'allow whole device' for ARM.

Thanks
Kevin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ