lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 11 Aug 2020 20:36:18 -0600
From:   Alex Williamson <alex.williamson@...hat.com>
To:     "Tian, Kevin" <kevin.tian@...el.com>
Cc:     Jason Gunthorpe <jgg@...dia.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        "vkoul@...nel.org" <vkoul@...nel.org>,
        "Dey, Megha" <megha.dey@...el.com>,
        "maz@...nel.org" <maz@...nel.org>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "rafael@...nel.org" <rafael@...nel.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "hpa@...or.com" <hpa@...or.com>,
        "Pan, Jacob jun" <jacob.jun.pan@...el.com>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "Liu, Yi L" <yi.l.liu@...el.com>, "Lu, Baolu" <baolu.lu@...el.com>,
        "Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        "Lin, Jing" <jing.lin@...el.com>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "parav@...lanox.com" <parav@...lanox.com>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "netanelg@...lanox.com" <netanelg@...lanox.com>,
        "shahafs@...lanox.com" <shahafs@...lanox.com>,
        "yan.y.zhao@...ux.intel.com" <yan.y.zhao@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Ortiz, Samuel" <samuel.ortiz@...el.com>,
        "Hossain, Mona" <mona.hossain@...el.com>,
        "dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH RFC v2 00/18] Add VFIO mediated device support and
 DEV-MSI support for the idxd driver

On Wed, 12 Aug 2020 01:58:00 +0000
"Tian, Kevin" <kevin.tian@...el.com> wrote:

> > From: Alex Williamson <alex.williamson@...hat.com>
> > Sent: Wednesday, August 12, 2020 1:01 AM
> > 
> > On Mon, 10 Aug 2020 07:32:24 +0000
> > "Tian, Kevin" <kevin.tian@...el.com> wrote:
> >   
> > > > From: Jason Gunthorpe <jgg@...dia.com>
> > > > Sent: Friday, August 7, 2020 8:20 PM
> > > >
> > > > On Wed, Aug 05, 2020 at 07:22:58PM -0600, Alex Williamson wrote:
> > > >  
> > > > > If you see this as an abuse of the framework, then let's identify those
> > > > > specific issues and come up with a better approach.  As we've discussed
> > > > > before, things like basic PCI config space emulation are acceptable
> > > > > overhead and low risk (imo) and some degree of register emulation is
> > > > > well within the territory of an mdev driver.  
> > > >
> > > > What troubles me is that idxd already has a direct userspace interface
> > > > to its HW, and does userspace DMA. The purpose of this mdev is to
> > > > provide a second direct userspace interface that is a little different
> > > > and trivially plugs into the virtualization stack.  
> > >
> > > No. Userspace DMA and subdevice passthrough (what mdev provides)
> > > are two distinct usages IMO (at least in idxd context). and this might
> > > be the main divergence between us, thus let me put more words here.
> > > If we could reach consensus in this matter, which direction to go
> > > would be clearer.
> > >
> > > First, a passthrough interface requires some unique requirements
> > > which are not commonly observed in an userspace DMA interface, e.g.:
> > >
> > > - Tracking DMA dirty pages for live migration;
> > > - A set of interfaces for using SVA inside guest;
> > > 	* PASID allocation/free (on some platforms);
> > > 	* bind/unbind guest mm/page table (nested translation);
> > > 	* invalidate IOMMU cache/iotlb for guest page table changes;
> > > 	* report page request from device to guest;
> > > 	* forward page response from guest to device;
> > > - Configuring irqbypass for posted interrupt;
> > > - ...
> > >
> > > Second, a passthrough interface requires delegating raw controllability
> > > of subdevice to guest driver, while the same delegation might not be
> > > required for implementing an userspace DMA interface (especially for
> > > modern devices which support SVA). For example, idxd allows following
> > > setting per wq (guest driver may configure them in any combination):
> > > 	- put in dedicated or shared mode;
> > > 	- enable/disable SVA;
> > > 	- Associate guest-provided PASID to MSI/IMS entry;
> > > 	- set threshold;
> > > 	- allow/deny privileged access;
> > > 	- allocate/free interrupt handle (enlightened for guest);
> > > 	- collect error status;
> > > 	- ...
> > >
> > > We plan to support idxd userspace DMA with SVA. The driver just needs
> > > to prepare a wq with a predefined configuration (e.g. shared, SVA,
> > > etc.), bind the process mm to IOMMU (non-nested) and then map
> > > the portal to userspace. The goal that userspace can do DMA to
> > > associated wq doesn't change the fact that the wq is still *owned*
> > > and *controlled* by kernel driver. However as far as passthrough
> > > is concerned, the wq is considered 'owned' by the guest driver thus
> > > we need an interface which can support low-level *controllability*
> > > from guest driver. It is sort of a mess in uAPI when mixing the
> > > two together.
> > >
> > > Based on above two reasons, we see distinct requirements between
> > > userspace DMA and passthrough interfaces, at least in idxd context
> > > (though other devices may have less distinction in-between). Therefore,
> > > we didn't see the value/necessity of reinventing the wheel that mdev
> > > already handles well to evolve an simple application-oriented usespace
> > > DMA interface to a complex guest-driver-oriented passthrough interface.
> > > The complexity of doing so would incur far more kernel-side changes
> > > than the portion of emulation code that you've been concerned about...
> > >  
> > > >
> > > > I don't think VFIO should be the only entry point to
> > > > virtualization. If we say the universe of devices doing user space DMA
> > > > must also implement a VFIO mdev to plug into virtualization then it
> > > > will be alot of mdevs.  
> > >
> > > Certainly VFIO will not be the only entry point. and This has to be a
> > > case-by-case decision.  If an userspace DMA interface can be easily
> > > adapted to be a passthrough one, it might be the choice. But for idxd,
> > > we see mdev a much better fit here, given the big difference between
> > > what userspace DMA requires and what guest driver requires in this hw.
> > >  
> > > >
> > > > I would prefer to see that the existing userspace interface have the
> > > > extra needed bits for virtualization (eg by having appropriate
> > > > internal kernel APIs to make this easy) and all the emulation to build
> > > > the synthetic PCI device be done in userspace.  
> > >
> > > In the end what decides the direction is the amount of changes that
> > > we have to put in kernel, not whether we call it 'emulation'. For idxd,
> > > adding special passthrough requirements (guest SVA, dirty tracking,
> > > etc.) and raw controllability to the simple userspace DMA interface
> > > is for sure making kernel more complex than reusing the mdev
> > > framework (plus some degree of emulation mockup behind). Not to
> > > mention the merit of uAPI compatibility with mdev...  
> > 
> > I agree with a lot of this argument, exposing a device through a
> > userspace interface versus allowing user access to a device through a
> > userspace interface are different levels of abstraction and control.
> > In an ideal world, perhaps we could compose one from the other, but I
> > don't think the existence of one is proof that the other is redundant.
> > That's not to say that mdev/vfio isn't ripe for abuse in this space,
> > but I'm afraid the test for that abuse is probably much more subtle.
> > 
> > I'll also remind folks that LPC is coming up in just a couple short
> > weeks and this might be something we should discuss (virtually)
> > in-person.  uconf CfPs are currently open. </plug>   Thanks,
> >   
> 
> Yes, LPC is a good place to reach consensus. btw I saw there is 
> already one VFIO topic called "device assignment/sub-assignment".
> Do you think whether this can be covered under that topic, or
> makes more sense to be a new one?

All the things listed in the CFP are only potential topics to get ideas
flowing, there is currently no proposal to talk about sub-assignment.
I'd suggest submitting separate topics for each and if we run into time
constraints we can ask that they might be combined together.  Thanks,

Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ