lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YVaqIB32MzppOcxl@yekko>
Date:   Fri, 1 Oct 2021 16:26:40 +1000
From:   "david@...son.dropbear.id.au" <david@...son.dropbear.id.au>
To:     "Tian, Kevin" <kevin.tian@...el.com>
Cc:     Jason Gunthorpe <jgg@...dia.com>, "Liu, Yi L" <yi.l.liu@...el.com>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "hch@....de" <hch@....de>,
        "jasowang@...hat.com" <jasowang@...hat.com>,
        "joro@...tes.org" <joro@...tes.org>,
        "jean-philippe@...aro.org" <jean-philippe@...aro.org>,
        "parav@...lanox.com" <parav@...lanox.com>,
        "lkml@...ux.net" <lkml@...ux.net>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "lushenming@...wei.com" <lushenming@...wei.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "corbet@....net" <corbet@....net>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "yi.l.liu@...ux.intel.com" <yi.l.liu@...ux.intel.com>,
        "Tian, Jun J" <jun.j.tian@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        "jacob.jun.pan@...ux.intel.com" <jacob.jun.pan@...ux.intel.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "robin.murphy@....com" <robin.murphy@....com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "dwmw2@...radead.org" <dwmw2@...radead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
        "nicolinc@...dia.com" <nicolinc@...dia.com>
Subject: Re: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE

On Thu, Sep 23, 2021 at 09:14:58AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@...dia.com>
> > Sent: Wednesday, September 22, 2021 10:09 PM
> > 
> > On Wed, Sep 22, 2021 at 03:40:25AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@...dia.com>
> > > > Sent: Wednesday, September 22, 2021 1:45 AM
> > > >
> > > > On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote:
> > > > > This patch adds IOASID allocation/free interface per iommufd. When
> > > > > allocating an IOASID, userspace is expected to specify the type and
> > > > > format information for the target I/O page table.
> > > > >
> > > > > This RFC supports only one type
> > (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2),
> > > > > implying a kernel-managed I/O page table with vfio type1v2 mapping
> > > > > semantics. For this type the user should specify the addr_width of
> > > > > the I/O address space and whether the I/O page table is created in
> > > > > an iommu enfore_snoop format. enforce_snoop must be true at this
> > point,
> > > > > as the false setting requires additional contract with KVM on handling
> > > > > WBINVD emulation, which can be added later.
> > > > >
> > > > > Userspace is expected to call IOMMU_CHECK_EXTENSION (see next
> > patch)
> > > > > for what formats can be specified when allocating an IOASID.
> > > > >
> > > > > Open:
> > > > > - Devices on PPC platform currently use a different iommu driver in vfio.
> > > > >   Per previous discussion they can also use vfio type1v2 as long as there
> > > > >   is a way to claim a specific iova range from a system-wide address
> > space.
> > > > >   This requirement doesn't sound PPC specific, as addr_width for pci
> > > > devices
> > > > >   can be also represented by a range [0, 2^addr_width-1]. This RFC
> > hasn't
> > > > >   adopted this design yet. We hope to have formal alignment in v1
> > > > discussion
> > > > >   and then decide how to incorporate it in v2.
> > > >
> > > > I think the request was to include a start/end IO address hint when
> > > > creating the ios. When the kernel creates it then it can return the
> > >
> > > is the hint single-range or could be multiple-ranges?
> > 
> > David explained it here:
> > 
> > https://lore.kernel.org/kvm/YMrKksUeNW%2FPEGPM@yekko/
> > 
> > qeumu needs to be able to chooose if it gets the 32 bit range or 64
> > bit range.
> > 
> > So a 'range hint' will do the job
> > 
> > David also suggested this:
> > 
> > https://lore.kernel.org/kvm/YL6%2FbjHyuHJTn4Rd@yekko/
> > 
> > So I like this better:
> > 
> > struct iommu_ioasid_alloc {
> > 	__u32	argsz;
> > 
> > 	__u32	flags;
> > #define IOMMU_IOASID_ENFORCE_SNOOP	(1 << 0)
> > #define IOMMU_IOASID_HINT_BASE_IOVA	(1 << 1)
> > 
> > 	__aligned_u64 max_iova_hint;
> > 	__aligned_u64 base_iova_hint; // Used only if
> > IOMMU_IOASID_HINT_BASE_IOVA
> > 
> > 	// For creating nested page tables
> > 	__u32 parent_ios_id;
> > 	__u32 format;
> > #define IOMMU_FORMAT_KERNEL 0
> > #define IOMMU_FORMAT_PPC_XXX 2
> > #define IOMMU_FORMAT_[..]
> > 	u32 format_flags; // Layout depends on format above
> > 
> > 	__aligned_u64 user_page_directory;  // Used if parent_ios_id != 0
> > };
> > 
> > Again 'type' as an overall API indicator should not exist, feature
> > flags need to have clear narrow meanings.
> 
> currently the type is aimed to differentiate three usages:
> 
> - kernel-managed I/O page table
> - user-managed I/O page table
> - shared I/O page table (e.g. with mm, or ept)
> 
> we can remove 'type', but is FORMAT_KENREL/USER/SHARED a good
> indicator? their difference is not about format.

To me "format" indicates how the IO translation information is
encoded.  We potentially have two different encodings: from userspace
to the kernel and from the kernel to the hardware.  But since this is
the userspace API, it's only the userspace to kernel one that matters
here.

In that sense, KERNEL, is a "format": we encode the translation
information as a series of IOMAP operations to the kernel, rather than
as an in-memory structure.

> > This does both of David's suggestions at once. If quemu wants the 1G
> > limited region it could specify max_iova_hint = 1G, if it wants the
> > extend 64bit region with the hole it can give either the high base or
> > a large max_iova_hint. format/format_flags allows a further
> 
> Dave's links didn't answer one puzzle from me. Does PPC needs accurate
> range information or be ok with a large range including holes (then let
> the kernel to figure out where the holes locate)?

I need more specifics to answer that.  Are you talking from a
userspace PoV, a guest kernel's or the host kernel's?  In general I
think requiring userspace to locate and work aronud holes is a bad
idea.  If userspace requests a range, it should get *all* of that
range.

The ppc case is further complicated because there are multiple ranges
and each range could have separate IO page tables.  In practice
non-kernel managed IO pagetables are likely to be hard on ppc (or at
least rely on firmware/hypervisor interfaces which don't exist yet,
AFAIK).  But even then, the underlying hardware page table format can
affect the minimum pagesize of each range, which could be different.

How all of this interacts with PASIDs I really haven't figured out.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ