lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 Aug 2020 11:28:12 +0800
From:   Jason Wang <>
To:     "Tian, Kevin" <>,
        Jason Gunthorpe <>,
        Alex Williamson <>
Cc:     "Jiang, Dave" <>,
        "" <>,
        "Dey, Megha" <>,
        "" <>,
        "" <>,
        "" <>,
        "" <>,
        "" <>,
        "" <>,
        "Pan, Jacob jun" <>,
        "Raj, Ashok" <>,
        "Liu, Yi L" <>, "Lu, Baolu" <>,
        "Kumar, Sanjay K" <>,
        "Luck, Tony" <>,
        "Lin, Jing" <>,
        "Williams, Dan J" <>,
        "" <>,
        "" <>,
        "" <>,
        "Hansen, Dave" <>,
        "" <>,
        "" <>,
        "" <>,
        "" <>,
        "Ortiz, Samuel" <>,
        "Hossain, Mona" <>,
        "" <>,
        "" <>,
        "" <>,
        "" <>,
        "" <>
Subject: Re: [PATCH RFC v2 00/18] Add VFIO mediated device support and DEV-MSI
 support for the idxd driver

On 2020/8/10 下午3:32, Tian, Kevin wrote:
>> From: Jason Gunthorpe <>
>> Sent: Friday, August 7, 2020 8:20 PM
>> On Wed, Aug 05, 2020 at 07:22:58PM -0600, Alex Williamson wrote:
>>> If you see this as an abuse of the framework, then let's identify those
>>> specific issues and come up with a better approach.  As we've discussed
>>> before, things like basic PCI config space emulation are acceptable
>>> overhead and low risk (imo) and some degree of register emulation is
>>> well within the territory of an mdev driver.
>> What troubles me is that idxd already has a direct userspace interface
>> to its HW, and does userspace DMA. The purpose of this mdev is to
>> provide a second direct userspace interface that is a little different
>> and trivially plugs into the virtualization stack.
> No. Userspace DMA and subdevice passthrough (what mdev provides)
> are two distinct usages IMO (at least in idxd context). and this might
> be the main divergence between us, thus let me put more words here.
> If we could reach consensus in this matter, which direction to go
> would be clearer.
> First, a passthrough interface requires some unique requirements
> which are not commonly observed in an userspace DMA interface, e.g.:
> - Tracking DMA dirty pages for live migration;
> - A set of interfaces for using SVA inside guest;
> 	* PASID allocation/free (on some platforms);
> 	* bind/unbind guest mm/page table (nested translation);
> 	* invalidate IOMMU cache/iotlb for guest page table changes;
> 	* report page request from device to guest;
> 	* forward page response from guest to device;
> - Configuring irqbypass for posted interrupt;
> - ...
> Second, a passthrough interface requires delegating raw controllability
> of subdevice to guest driver, while the same delegation might not be
> required for implementing an userspace DMA interface (especially for
> modern devices which support SVA). For example, idxd allows following
> setting per wq (guest driver may configure them in any combination):
> 	- put in dedicated or shared mode;
> 	- enable/disable SVA;
> 	- Associate guest-provided PASID to MSI/IMS entry;
> 	- set threshold;
> 	- allow/deny privileged access;
> 	- allocate/free interrupt handle (enlightened for guest);
> 	- collect error status;
> 	- ...
> We plan to support idxd userspace DMA with SVA. The driver just needs
> to prepare a wq with a predefined configuration (e.g. shared, SVA,
> etc.), bind the process mm to IOMMU (non-nested) and then map
> the portal to userspace. The goal that userspace can do DMA to
> associated wq doesn't change the fact that the wq is still *owned*
> and *controlled* by kernel driver. However as far as passthrough
> is concerned, the wq is considered 'owned' by the guest driver thus
> we need an interface which can support low-level *controllability*
> from guest driver. It is sort of a mess in uAPI when mixing the
> two together.

So for userspace drivers like DPDK, it can use both of the two uAPIs?

> Based on above two reasons, we see distinct requirements between
> userspace DMA and passthrough interfaces, at least in idxd context
> (though other devices may have less distinction in-between). Therefore,
> we didn't see the value/necessity of reinventing the wheel that mdev
> already handles well to evolve an simple application-oriented usespace
> DMA interface to a complex guest-driver-oriented passthrough interface.
> The complexity of doing so would incur far more kernel-side changes
> than the portion of emulation code that you've been concerned about...
>> I don't think VFIO should be the only entry point to
>> virtualization. If we say the universe of devices doing user space DMA
>> must also implement a VFIO mdev to plug into virtualization then it
>> will be alot of mdevs.
> Certainly VFIO will not be the only entry point. and This has to be a
> case-by-case decision.

The problem is that if we tie all controls via VFIO uAPI, the other 
subsystem like vDPA is likely to duplicate them. I wonder if there is a 
way to decouple the vSVA out of VFIO uAPI?

>   If an userspace DMA interface can be easily
> adapted to be a passthrough one, it might be the choice.

It's not that easy even for VFIO which requires a lot of new uAPIs and 
infrastructures(e.g mdev) to be invented.

> But for idxd,
> we see mdev a much better fit here, given the big difference between
> what userspace DMA requires and what guest driver requires in this hw.

A weak point for mdev is that it can't serve kernel subsystem other than 
VFIO. In this case, you need some other infrastructures (like [1]) to do 

(For idxd, you probably don't need this, but it's pretty common in the 
case of networking or storage device.)



>> I would prefer to see that the existing userspace interface have the
>> extra needed bits for virtualization (eg by having appropriate
>> internal kernel APIs to make this easy) and all the emulation to build
>> the synthetic PCI device be done in userspace.
> In the end what decides the direction is the amount of changes that
> we have to put in kernel, not whether we call it 'emulation'. For idxd,
> adding special passthrough requirements (guest SVA, dirty tracking,
> etc.) and raw controllability to the simple userspace DMA interface
> is for sure making kernel more complex than reusing the mdev
> framework (plus some degree of emulation mockup behind). Not to
> mention the merit of uAPI compatibility with mdev...
> Thanks
> Kevin

Powered by blists - more mailing lists