lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB1645C60468BC6C6009C3DDE28CBF0@MWHPR11MB1645.namprd11.prod.outlook.com>
Date:   Wed, 13 May 2020 08:30:15 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Jason Gunthorpe <jgg@...lanox.com>,
        "Raj, Ashok" <ashok.raj@...el.com>
CC:     Alex Williamson <alex.williamson@...hat.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        "vkoul@...nel.org" <vkoul@...nel.org>,
        "megha.dey@...ux.intel.com" <megha.dey@...ux.intel.com>,
        "maz@...nel.org" <maz@...nel.org>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "rafael@...nel.org" <rafael@...nel.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "hpa@...or.com" <hpa@...or.com>,
        "Pan, Jacob jun" <jacob.jun.pan@...el.com>,
        "Liu, Yi L" <yi.l.liu@...el.com>, "Lu, Baolu" <baolu.lu@...el.com>,
        "Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        "Lin, Jing" <jing.lin@...el.com>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "parav@...lanox.com" <parav@...lanox.com>,
        "dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: RE: [PATCH RFC 00/15] Add VFIO mediated device support and IMS
 support for the idxd driver.

> From: Jason Gunthorpe
> Sent: Saturday, May 9, 2020 8:21 PM
> > > putting emulation code back into them, except in a more dangerous
> > > kernel location. This does not seem like a net win to me.
> >
> > Its not a whole lot of emulation right? mdev are soft partitioned. There is
> > just a single PF, but we can create a separate partition for the guest using
> > PASID along with the normal BDF (RID). And exposing a consistent PCI like
> > interface to user space you get everything else for free.
> >
> > Yes, its not SRIOV, but giving that interface to user space via VFIO, we get
> > all of that functionality without having to reinvent a different way to do it.
> >
> > vDPA went the other way, IRC, they went and put a HW implementation of
> what
> > virtio is in hardware. So they sort of fit the model. Here the instance
> > looks and feels like real hardware for the setup and control aspect.
> 
> VDPA and this are very similar, of course it depends on the exact HW
> implementation.
> 

Hi, Jason,

I have more thoughts below. let's see whether making sense to you.

When talking about virtualization, here the target is unmodified guest 
kernel driver which expects seeing the raw controllability of queues 
as defined by device spec. In idxd, such controllability includes enable/
disable SVA, dedicated or shared WQ, size, threshold, privilege, fault 
mode, max batch size, and many other attributes. Different guest OS 
has its own policy of using all or partial available controllability. 

When talking about application, we care about providing an efficient
programming interface to userspace. For example with uacce, we
allow an application to submit vaddr-based workloads to a reserved
WQ with kernel bypassed. But it's not necessary to export the raw
controllability of the reserved WQ to userspace, and we still rely on
kernel driver to configure it including bind_mm. I'm not sure whether 
uacce would like to evolve as a generic queue management system
including non-SVA and all vendor specific raw capabilities as 
expected by all kinds of guest kernel drivers. It sounds like not 
worthwhile at this point, given that we already have an highly efficient 
SVA interface for user applications.

That is why we start with mdev as an evolutionary approach. Mdev is 
introduced to expose raw controllability of a subdevice (WQ or ADI) to 
guest. It build a channel between guest kernel driver and host kernel 
driver and uses device spec as the uAPI by sticking to the mmio interface.
and all virtualization related setups are just consolidated together in vfio. 
the drawback, as you pointed out, is putting some degree of emulation
code in the kernel. But as explained earlier, they are only small portion of
code. Moreover, most registers are emulated as simple memory read/
write, while the remaining logic mostly belongs to raw controllability 
(e.g. cmd register) that host driver grants to the guest thus must 
propagate to the device. For the latter part, I would call it more as 
'mediation' instead of 'emulation', as required in whatever uapi would 
be used.

If in the future, there do have such requirement of delegating raw
WQ controllability to pure userspace applications for DMA engines, 
and there is be a well-defined uAPI to cover a large common set of 
controllability across multiple vendors, we will look at that option for
sure.

>From above p.o.v, I feel vdpa is a different story. virtio/vhost has a 
well established eco-system between guest and host. The user
space VMM already emulates all available controllability as defined 
in virtio spec. Host kernel already supports vhost uAPI for vring
setup, iotlb management, etc. Extending that path for data path
offloading sounds a reasonable choice for vdpa...

Thanks
Kevin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ