lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB18865A92E12E7C2F6422987C8C089@MWHPR11MB1886.namprd11.prod.outlook.com>
Date:   Wed, 23 Jun 2021 07:57:19 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Jean-Philippe Brucker <jean-philippe@...aro.org>,
        David Gibson <david@...son.dropbear.id.au>
CC:     "Alex Williamson (alex.williamson@...hat.com)" 
        <alex.williamson@...hat.com>, "Raj, Ashok" <ashok.raj@...el.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        Jonathan Corbet <corbet@....net>,
        Robin Murphy <robin.murphy@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        Kirti Wankhede <kwankhede@...dia.com>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "Jason Gunthorpe" <jgg@...dia.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        "David Woodhouse" <dwmw2@...radead.org>,
        Jason Wang <jasowang@...hat.com>
Subject: RE: [RFC] /dev/ioasid uAPI proposal

> From: Jean-Philippe Brucker
> Sent: Saturday, June 19, 2021 1:04 AM
> 
> On Thu, Jun 17, 2021 at 01:00:14PM +1000, David Gibson wrote:
> > On Thu, Jun 10, 2021 at 06:37:31PM +0200, Jean-Philippe Brucker wrote:
> > > On Tue, Jun 08, 2021 at 04:31:50PM +1000, David Gibson wrote:
> > > > For the qemu case, I would imagine a two stage fallback:
> > > >
> > > >     1) Ask for the exact IOMMU capabilities (including pagetable
> > > >        format) that the vIOMMU has.  If the host can supply, you're
> > > >        good
> > > >
> > > >     2) If not, ask for a kernel managed IOAS.  Verify that it can map
> > > >        all the IOVA ranges the guest vIOMMU needs, and has an equal or
> > > >        smaller pagesize than the guest vIOMMU presents.  If so,
> > > >        software emulate the vIOMMU by shadowing guest io pagetable
> > > >        updates into the kernel managed IOAS.
> > > >
> > > >     3) You're out of luck, don't start.
> > > >
> > > > For both (1) and (2) I'd expect it to be asking this question *after*
> > > > saying what devices are attached to the IOAS, based on the virtual
> > > > hardware configuration.  That doesn't cover hotplug, of course, for
> > > > that you have to just fail the hotplug if the new device isn't
> > > > supportable with the IOAS you already have.
> > >
> > > Yes. So there is a point in time when the IOAS is frozen, and cannot take
> > > in new incompatible devices. I think that can support the usage I had in
> > > mind. If the VMM (non-QEMU, let's say) wanted to create one IOASID FD
> per
> > > feature set it could bind the first device, freeze the features, then bind
> >
> > Are you thinking of this "freeze the features" as an explicitly
> > triggered action?  I have suggested that an explicit "ENABLE" step
> > might be useful, but that hasn't had much traction from what I've
> > seen.
> 
> Seems like we do need an explicit enable step for the flow you described
> above:
> 
> a) Bind all devices to an ioasid. Each bind succeeds.

let's use consistent terms in this discussion. :)

'bind' the device to a IOMMU fd (container of I/O address spaces). 

'attach' the device to an IOASID (representing an I/O address space 
within the IOMMU fd)

> b) Ask for a specific set of features for this aggregate of device. Ask
>    for (1), fall back to (2), or abort.
> c) Boot the VM
> d) Hotplug a device, bind it to the ioasid. We're long past negotiating
>    features for the ioasid, so the host needs to reject the bind if the
>    new device is incompatible with what was requested at (b)
> 
> So a successful request at (b) would be the point where we change the
> behavior of bind.

Per Jason's recommendation v2 will move to a new model:

a) Bind all devices to an IOMMU fd:
        - The user should provide a 'device_cookie' to mark each bound 
          device in following uAPIs.

b) Successful binding allows user to check the capability/format info per
     device_cookie (GET_DEVICE_INFO), before creating any IOASID:
        - Sample capability info:
               * VFIO type1 map: supported page sizes, permitted IOVA ranges, etc.;
               * IOASID nesting: hardware nesting vs. software nesting;
               * User-managed page table: vendor specific formats;
               * User-managed pasid table: vendor specific formats;
               * vPASID: whether delegated to user, if kernel-managed per-RID or global;
               * coherency: what's kernel default policy, whether allows user to change;
               * ...
       - Actual logistics might be finalized when code is implemented;

c) When creating a new IOASID, the user should specify a format which
    is compatible to one or more devices which will be attached to this 
    IOASID right after.

d) Attaching a device which has incompatible format to this IOASID 
     is simply rejected. Whether it's hotplugged doesn't matter.

Qemu is expected to query capability/format information for all devices
according to what a specified vIOMMU model requires. Then decide
whether to fail vIOMMU creation if not strictly matched or fall back to
a hybrid model with software emulation to bridge the gap. In any case
before a new I/O address space is created, Qemu should have a clear 
picture about what format is required given a set of to-be-attached 
devices and whether multiple IOASIDs are required if these devices 
have incompatible formats. 

With this model we don't need a separate 'enable' step.  

> 
> Since the kernel needs a form of feature check in any case, I still have a
> preference for aborting the bind at (a) if the device isn't exactly
> compatible with other devices already in the ioasid, because it might be
> simpler to implement in the host, but I don't feel strongly about this.

this is covered by d). Actually with all the format information available
Qemu even should not attempt to attach incompatible device in the 
first place, though the kernel will do this simple check under the hood.

Thanks
Kevin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ