[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB1645C60468BC6C6009C3DDE28CBF0@MWHPR11MB1645.namprd11.prod.outlook.com>
Date: Wed, 13 May 2020 08:30:15 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...lanox.com>,
"Raj, Ashok" <ashok.raj@...el.com>
CC: Alex Williamson <alex.williamson@...hat.com>,
"Jiang, Dave" <dave.jiang@...el.com>,
"vkoul@...nel.org" <vkoul@...nel.org>,
"megha.dey@...ux.intel.com" <megha.dey@...ux.intel.com>,
"maz@...nel.org" <maz@...nel.org>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"rafael@...nel.org" <rafael@...nel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"hpa@...or.com" <hpa@...or.com>,
"Pan, Jacob jun" <jacob.jun.pan@...el.com>,
"Liu, Yi L" <yi.l.liu@...el.com>, "Lu, Baolu" <baolu.lu@...el.com>,
"Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
"Luck, Tony" <tony.luck@...el.com>,
"Lin, Jing" <jing.lin@...el.com>,
"Williams, Dan J" <dan.j.williams@...el.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"parav@...lanox.com" <parav@...lanox.com>,
"dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
Paolo Bonzini <pbonzini@...hat.com>
Subject: RE: [PATCH RFC 00/15] Add VFIO mediated device support and IMS
support for the idxd driver.
> From: Jason Gunthorpe
> Sent: Saturday, May 9, 2020 8:21 PM
> > > putting emulation code back into them, except in a more dangerous
> > > kernel location. This does not seem like a net win to me.
> >
> > Its not a whole lot of emulation right? mdev are soft partitioned. There is
> > just a single PF, but we can create a separate partition for the guest using
> > PASID along with the normal BDF (RID). And exposing a consistent PCI like
> > interface to user space you get everything else for free.
> >
> > Yes, its not SRIOV, but giving that interface to user space via VFIO, we get
> > all of that functionality without having to reinvent a different way to do it.
> >
> > vDPA went the other way, IRC, they went and put a HW implementation of
> what
> > virtio is in hardware. So they sort of fit the model. Here the instance
> > looks and feels like real hardware for the setup and control aspect.
>
> VDPA and this are very similar, of course it depends on the exact HW
> implementation.
>
Hi, Jason,
I have more thoughts below. let's see whether making sense to you.
When talking about virtualization, here the target is unmodified guest
kernel driver which expects seeing the raw controllability of queues
as defined by device spec. In idxd, such controllability includes enable/
disable SVA, dedicated or shared WQ, size, threshold, privilege, fault
mode, max batch size, and many other attributes. Different guest OS
has its own policy of using all or partial available controllability.
When talking about application, we care about providing an efficient
programming interface to userspace. For example with uacce, we
allow an application to submit vaddr-based workloads to a reserved
WQ with kernel bypassed. But it's not necessary to export the raw
controllability of the reserved WQ to userspace, and we still rely on
kernel driver to configure it including bind_mm. I'm not sure whether
uacce would like to evolve as a generic queue management system
including non-SVA and all vendor specific raw capabilities as
expected by all kinds of guest kernel drivers. It sounds like not
worthwhile at this point, given that we already have an highly efficient
SVA interface for user applications.
That is why we start with mdev as an evolutionary approach. Mdev is
introduced to expose raw controllability of a subdevice (WQ or ADI) to
guest. It build a channel between guest kernel driver and host kernel
driver and uses device spec as the uAPI by sticking to the mmio interface.
and all virtualization related setups are just consolidated together in vfio.
the drawback, as you pointed out, is putting some degree of emulation
code in the kernel. But as explained earlier, they are only small portion of
code. Moreover, most registers are emulated as simple memory read/
write, while the remaining logic mostly belongs to raw controllability
(e.g. cmd register) that host driver grants to the guest thus must
propagate to the device. For the latter part, I would call it more as
'mediation' instead of 'emulation', as required in whatever uapi would
be used.
If in the future, there do have such requirement of delegating raw
WQ controllability to pure userspace applications for DMA engines,
and there is be a well-defined uAPI to cover a large common set of
controllability across multiple vendors, we will look at that option for
sure.
>From above p.o.v, I feel vdpa is a different story. virtio/vhost has a
well established eco-system between guest and host. The user
space VMM already emulates all available controllability as defined
in virtio spec. Host kernel already supports vhost uAPI for vring
setup, iotlb management, etc. Extending that path for data path
offloading sounds a reasonable choice for vdpa...
Thanks
Kevin
Powered by blists - more mailing lists