lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 23 Sep 2021 09:14:58 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Jason Gunthorpe <jgg@...dia.com>
CC:     "Liu, Yi L" <yi.l.liu@...el.com>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "hch@....de" <hch@....de>,
        "jasowang@...hat.com" <jasowang@...hat.com>,
        "joro@...tes.org" <joro@...tes.org>,
        "jean-philippe@...aro.org" <jean-philippe@...aro.org>,
        "parav@...lanox.com" <parav@...lanox.com>,
        "lkml@...ux.net" <lkml@...ux.net>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "lushenming@...wei.com" <lushenming@...wei.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "corbet@....net" <corbet@....net>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "yi.l.liu@...ux.intel.com" <yi.l.liu@...ux.intel.com>,
        "Tian, Jun J" <jun.j.tian@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        "jacob.jun.pan@...ux.intel.com" <jacob.jun.pan@...ux.intel.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "robin.murphy@....com" <robin.murphy@....com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "dwmw2@...radead.org" <dwmw2@...radead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
        "david@...son.dropbear.id.au" <david@...son.dropbear.id.au>,
        "nicolinc@...dia.com" <nicolinc@...dia.com>
Subject: RE: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE

> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Wednesday, September 22, 2021 10:09 PM
> 
> On Wed, Sep 22, 2021 at 03:40:25AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@...dia.com>
> > > Sent: Wednesday, September 22, 2021 1:45 AM
> > >
> > > On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote:
> > > > This patch adds IOASID allocation/free interface per iommufd. When
> > > > allocating an IOASID, userspace is expected to specify the type and
> > > > format information for the target I/O page table.
> > > >
> > > > This RFC supports only one type
> (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2),
> > > > implying a kernel-managed I/O page table with vfio type1v2 mapping
> > > > semantics. For this type the user should specify the addr_width of
> > > > the I/O address space and whether the I/O page table is created in
> > > > an iommu enfore_snoop format. enforce_snoop must be true at this
> point,
> > > > as the false setting requires additional contract with KVM on handling
> > > > WBINVD emulation, which can be added later.
> > > >
> > > > Userspace is expected to call IOMMU_CHECK_EXTENSION (see next
> patch)
> > > > for what formats can be specified when allocating an IOASID.
> > > >
> > > > Open:
> > > > - Devices on PPC platform currently use a different iommu driver in vfio.
> > > >   Per previous discussion they can also use vfio type1v2 as long as there
> > > >   is a way to claim a specific iova range from a system-wide address
> space.
> > > >   This requirement doesn't sound PPC specific, as addr_width for pci
> > > devices
> > > >   can be also represented by a range [0, 2^addr_width-1]. This RFC
> hasn't
> > > >   adopted this design yet. We hope to have formal alignment in v1
> > > discussion
> > > >   and then decide how to incorporate it in v2.
> > >
> > > I think the request was to include a start/end IO address hint when
> > > creating the ios. When the kernel creates it then it can return the
> >
> > is the hint single-range or could be multiple-ranges?
> 
> David explained it here:
> 
> https://lore.kernel.org/kvm/YMrKksUeNW%2FPEGPM@yekko/
> 
> qeumu needs to be able to chooose if it gets the 32 bit range or 64
> bit range.
> 
> So a 'range hint' will do the job
> 
> David also suggested this:
> 
> https://lore.kernel.org/kvm/YL6%2FbjHyuHJTn4Rd@yekko/
> 
> So I like this better:
> 
> struct iommu_ioasid_alloc {
> 	__u32	argsz;
> 
> 	__u32	flags;
> #define IOMMU_IOASID_ENFORCE_SNOOP	(1 << 0)
> #define IOMMU_IOASID_HINT_BASE_IOVA	(1 << 1)
> 
> 	__aligned_u64 max_iova_hint;
> 	__aligned_u64 base_iova_hint; // Used only if
> IOMMU_IOASID_HINT_BASE_IOVA
> 
> 	// For creating nested page tables
> 	__u32 parent_ios_id;
> 	__u32 format;
> #define IOMMU_FORMAT_KERNEL 0
> #define IOMMU_FORMAT_PPC_XXX 2
> #define IOMMU_FORMAT_[..]
> 	u32 format_flags; // Layout depends on format above
> 
> 	__aligned_u64 user_page_directory;  // Used if parent_ios_id != 0
> };
> 
> Again 'type' as an overall API indicator should not exist, feature
> flags need to have clear narrow meanings.

currently the type is aimed to differentiate three usages:

- kernel-managed I/O page table
- user-managed I/O page table
- shared I/O page table (e.g. with mm, or ept)

we can remove 'type', but is FORMAT_KENREL/USER/SHARED a good
indicator? their difference is not about format.

> 
> This does both of David's suggestions at once. If quemu wants the 1G
> limited region it could specify max_iova_hint = 1G, if it wants the
> extend 64bit region with the hole it can give either the high base or
> a large max_iova_hint. format/format_flags allows a further

Dave's links didn't answer one puzzle from me. Does PPC needs accurate
range information or be ok with a large range including holes (then let
the kernel to figure out where the holes locate)?

> device-specific escape if more specific customization is needed and is
> needed to specify user space page tables anyhow.

and I didn't understand the 2nd link. How does user-managed page
table jump into this range claim problem? I'm getting confused...

> 
> > > ioas works well here I think. Use ioas_id to refer to the xarray
> > > index.
> >
> > What about when introducing pasid to this uAPI? Then use ioas_id
> > for the xarray index
> 
> Yes, ioas_id should always be the xarray index.
> 
> PASID needs to be called out as PASID or as a generic "hw description"
> blob.

ARM doesn't use PASID. So we need a generic blob, e.g. ioas_hwid?

and still we have both ioas_id (iommufd) and ioasid (ioasid.c) in the
kernel. Do we want to clear this confusion? Or possibly it's fine because
ioas_id is never used outside of iommufd and iommufd doesn't directly
call ioasid_alloc() from ioasid.c?

> 
> kvm's API to program the vPASID translation table should probably take
> in a (iommufd,ioas_id,device_id) tuple and extract the IOMMU side
> information using an in-kernel API. Userspace shouldn't have to
> shuttle it around.

the vPASID info is carried in VFIO_DEVICE_ATTACH_IOASID uAPI. 
when kvm calls iommufd with above tuple, vPASID->pPASID is
returned to kvm. So we still need a generic blob to represent
vPASID in the uAPI.

> 
> I'm starting to feel like the struct approach for describing this uAPI
> might not scale well, but lets see..
> 
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ