lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 3 Jun 2021 09:56:20 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Lu Baolu <baolu.lu@...ux.intel.com>
Cc:     David Gibson <david@...son.dropbear.id.au>,
        "Tian, Kevin" <kevin.tian@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Joerg Roedel <joro@...tes.org>,
        David Woodhouse <dwmw2@...radead.org>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "Alex Williamson (alex.williamson@...hat.com)" 
        <alex.williamson@...hat.com>, Jason Wang <jasowang@...hat.com>,
        Eric Auger <eric.auger@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "Liu, Yi L" <yi.l.liu@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        Jacob Pan <jacob.jun.pan@...ux.intel.com>,
        Jean-Philippe Brucker <jean-philippe@...aro.org>,
        Kirti Wankhede <kwankhede@...dia.com>,
        Robin Murphy <robin.murphy@....com>
Subject: Re: [RFC] /dev/ioasid uAPI proposal

On Thu, Jun 03, 2021 at 02:50:11PM +0800, Lu Baolu wrote:
> Hi David,
> 
> On 6/3/21 1:54 PM, David Gibson wrote:
> > On Tue, Jun 01, 2021 at 07:09:21PM +0800, Lu Baolu wrote:
> > > Hi Jason,
> > > 
> > > On 2021/5/29 7:36, Jason Gunthorpe wrote:
> > > > > /*
> > > > >     * Bind an user-managed I/O page table with the IOMMU
> > > > >     *
> > > > >     * Because user page table is untrusted, IOASID nesting must be enabled
> > > > >     * for this ioasid so the kernel can enforce its DMA isolation policy
> > > > >     * through the parent ioasid.
> > > > >     *
> > > > >     * Pgtable binding protocol is different from DMA mapping. The latter
> > > > >     * has the I/O page table constructed by the kernel and updated
> > > > >     * according to user MAP/UNMAP commands. With pgtable binding the
> > > > >     * whole page table is created and updated by userspace, thus different
> > > > >     * set of commands are required (bind, iotlb invalidation, page fault, etc.).
> > > > >     *
> > > > >     * Because the page table is directly walked by the IOMMU, the user
> > > > >     * must  use a format compatible to the underlying hardware. It can
> > > > >     * check the format information through IOASID_GET_INFO.
> > > > >     *
> > > > >     * The page table is bound to the IOMMU according to the routing
> > > > >     * information of each attached device under the specified IOASID. The
> > > > >     * routing information (RID and optional PASID) is registered when a
> > > > >     * device is attached to this IOASID through VFIO uAPI.
> > > > >     *
> > > > >     * Input parameters:
> > > > >     *	- child_ioasid;
> > > > >     *	- address of the user page table;
> > > > >     *	- formats (vendor, address_width, etc.);
> > > > >     *
> > > > >     * Return: 0 on success, -errno on failure.
> > > > >     */
> > > > > #define IOASID_BIND_PGTABLE		_IO(IOASID_TYPE, IOASID_BASE + 9)
> > > > > #define IOASID_UNBIND_PGTABLE	_IO(IOASID_TYPE, IOASID_BASE + 10)
> > > > Also feels backwards, why wouldn't we specify this, and the required
> > > > page table format, during alloc time?
> > > > 
> > > Thinking of the required page table format, perhaps we should shed more
> > > light on the page table of an IOASID. So far, an IOASID might represent
> > > one of the following page tables (might be more):
> > > 
> > >   1) an IOMMU format page table (a.k.a. iommu_domain)
> > >   2) a user application CPU page table (SVA for example)
> > >   3) a KVM EPT (future option)
> > >   4) a VM guest managed page table (nesting mode)
> > > 
> > > This version only covers 1) and 4). Do you think we need to support 2),
> > Isn't (2) the equivalent of using the using the host-managed pagetable
> > then doing a giant MAP of all your user address space into it?  But
> > maybe we should identify that case explicitly in case the host can
> > optimize it.
> 
> Conceptually, yes. Current SVA implementation just reuses the
> application's cpu page table w/o map/unmap operations.

The key distinction is faulting, and this goes back to the importance
of having the device tell drivers/iommu what TLPs it is generating.

A #1 table with a map of 'all user space memory' does not have IO DMA
faults. The pages should be pinned and this object should be
compatible with any DMA user.

A #2/#3 table allows page faulting, and it can only be used with a
device that supports the page faulting protocol. For instance a PCI
device needs to say it is running in ATS mode and supports PRI. This
is where you might fit in CAPI generically.

As the other case in my other email, the kind of TLPs the device
generates is only known by the driver when it connects to the IOASID
and must be communicated to the IOMMU so it knows how to set things
up. ATS/PRI w/ faulting is a very different setup than simple RID
matching.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ