lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 3 Jun 2021 15:09:10 +1000
From:   David Gibson <david@...son.dropbear.id.au>
To:     "Tian, Kevin" <kevin.tian@...el.com>
Cc:     Jason Gunthorpe <jgg@...dia.com>,
        Jean-Philippe Brucker <jean-philippe@...aro.org>,
        "Alex Williamson (alex.williamson@...hat.com)" 
        <alex.williamson@...hat.com>, "Raj, Ashok" <ashok.raj@...el.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        Jonathan Corbet <corbet@....net>,
        Robin Murphy <robin.murphy@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        Kirti Wankhede <kwankhede@...dia.com>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Jason Wang <jasowang@...hat.com>
Subject: Re: [RFC] /dev/ioasid uAPI proposal

On Thu, Jun 03, 2021 at 01:29:58AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe
> > Sent: Thursday, June 3, 2021 12:09 AM
> > 
> > On Wed, Jun 02, 2021 at 01:33:22AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@...dia.com>
> > > > Sent: Wednesday, June 2, 2021 1:42 AM
> > > >
> > > > On Tue, Jun 01, 2021 at 08:10:14AM +0000, Tian, Kevin wrote:
> > > > > > From: Jason Gunthorpe <jgg@...dia.com>
> > > > > > Sent: Saturday, May 29, 2021 1:36 AM
> > > > > >
> > > > > > On Thu, May 27, 2021 at 07:58:12AM +0000, Tian, Kevin wrote:
> > > > > >
> > > > > > > IOASID nesting can be implemented in two ways: hardware nesting
> > and
> > > > > > > software nesting. With hardware support the child and parent I/O
> > page
> > > > > > > tables are walked consecutively by the IOMMU to form a nested
> > > > translation.
> > > > > > > When it's implemented in software, the ioasid driver is responsible
> > for
> > > > > > > merging the two-level mappings into a single-level shadow I/O page
> > > > table.
> > > > > > > Software nesting requires both child/parent page tables operated
> > > > through
> > > > > > > the dma mapping protocol, so any change in either level can be
> > > > captured
> > > > > > > by the kernel to update the corresponding shadow mapping.
> > > > > >
> > > > > > Why? A SW emulation could do this synchronization during
> > invalidation
> > > > > > processing if invalidation contained an IOVA range.
> > > > >
> > > > > In this proposal we differentiate between host-managed and user-
> > > > > managed I/O page tables. If host-managed, the user is expected to use
> > > > > map/unmap cmd explicitly upon any change required on the page table.
> > > > > If user-managed, the user first binds its page table to the IOMMU and
> > > > > then use invalidation cmd to flush iotlb when necessary (e.g. typically
> > > > > not required when changing a PTE from non-present to present).
> > > > >
> > > > > We expect user to use map+unmap and bind+invalidate respectively
> > > > > instead of mixing them together. Following this policy, map+unmap
> > > > > must be used in both levels for software nesting, so changes in either
> > > > > level are captured timely to synchronize the shadow mapping.
> > > >
> > > > map+unmap or bind+invalidate is a policy of the IOASID itself set when
> > > > it is created. If you put two different types in a tree then each IOASID
> > > > must continue to use its own operation mode.
> > > >
> > > > I don't see a reason to force all IOASIDs in a tree to be consistent??
> > >
> > > only for software nesting. With hardware support the parent uses map
> > > while the child uses bind.
> > >
> > > Yes, the policy is specified per IOASID. But if the policy violates the
> > > requirement in a specific nesting mode, then nesting should fail.
> > 
> > I don't get it.
> > 
> > If the IOASID is a page table then it is bind/invalidate. SW or not SW
> > doesn't matter at all.
> > 
> > > >
> > > > A software emulated two level page table where the leaf level is a
> > > > bound page table in guest memory should continue to use
> > > > bind/invalidate to maintain the guest page table IOASID even though it
> > > > is a SW construct.
> > >
> > > with software nesting the leaf should be a host-managed page table
> > > (or metadata). A bind/invalidate protocol doesn't require the user
> > > to notify the kernel of every page table change.
> > 
> > The purpose of invalidate is to inform the implementation that the
> > page table has changed so it can flush the caches. If the page table
> > is changed and invalidation is not issued then then the implementation
> > is free to ignore the changes.
> > 
> > In this way the SW mode is the same as a HW mode with an infinite
> > cache.
> > 
> > The collaposed shadow page table is really just a cache.
> > 
> 
> OK. One additional thing is that we may need a 'caching_mode"
> thing reported by /dev/ioasid, indicating whether invalidation is
> required when changing non-present to present. For hardware 
> nesting it's not reported as the hardware IOMMU will walk the
> guest page table in cases of iotlb miss. For software nesting 
> caching_mode is reported so the user must issue invalidation 
> upon any change in guest page table so the kernel can update
> the shadow page table timely.

For the fist cut, I'd have the API assume that invalidates are
*always* required.  Some bypass to avoid them in cases where they're
not needed can be an additional extension.

> Following this and your other comment with David, we will mark
> host-managed vs. guest-managed explicitly for I/O page table
> of each IOASID. map+unmap or bind+invalid is decided by
> which owner is specified by the user.
> 
> Thanks
> Kevin
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ