lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201107001207.GA2620339@nvidia.com>
Date:   Fri, 6 Nov 2020 20:12:07 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Dan Williams <dan.j.williams@...el.com>
CC:     "Raj, Ashok" <ashok.raj@...el.com>,
        "Tian, Kevin" <kevin.tian@...el.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        Bjorn Helgaas <helgaas@...nel.org>,
        "vkoul@...nel.org" <vkoul@...nel.org>,
        "Dey, Megha" <megha.dey@...el.com>,
        "maz@...nel.org" <maz@...nel.org>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "Pan, Jacob jun" <jacob.jun.pan@...el.com>,
        "Liu, Yi L" <yi.l.liu@...el.com>, "Lu, Baolu" <baolu.lu@...el.com>,
        "Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        "jing.lin@...el.com" <jing.lin@...el.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "parav@...lanox.com" <parav@...lanox.com>,
        "rafael@...nel.org" <rafael@...nel.org>,
        "netanelg@...lanox.com" <netanelg@...lanox.com>,
        "shahafs@...lanox.com" <shahafs@...lanox.com>,
        "yan.y.zhao@...ux.intel.com" <yan.y.zhao@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Ortiz, Samuel" <samuel.ortiz@...el.com>,
        "Hossain, Mona" <mona.hossain@...el.com>,
        "dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

On Fri, Nov 06, 2020 at 03:47:00PM -0800, Dan Williams wrote:

> Also feel free to straighten me out (Jason or Ashok) if I've botched
> the understanding of this.

It is pretty simple when you get down to it.

We have a new kernel API that Thomas added:

  pci_subdevice_msi_create_irq_domain()

This creates an IRQ domain that hands out addr/data pairs that
trigger interrupts.

On bare metal the addr/data pairs from the IRQ domain are programmed
into the HW in some HW specific way by the device driver that calls
the above function.

On (kvm) virtualization the addr/data pair the IRQ domain hands out
doesn't work. It is some fake thing.

To make this work on normal MSI/MSI-X the VMM implements emulation of
the standard MSI/MSI-X programming and swaps the fake addr/data pair
for a real one obtained from the hypervisor IRQ domain.

To "deal" with this issue the SIOV spec suggests to add a per-device
PCI Capability that says "IMS works". Which means either:
 - This is bare metal, so of course it works
 - The VMM is trapping and emulating whatever the device specific IMS
   programming is.

The idea being that a VMM can never advertise the IMS cap flag to the
guest unles the VMM provides a device specific driver that does device
specific emulation to capture the addr/data pair. Remeber IMS doesn't
say how to program the addr/data pair! Every device is unique!

On something like IDXD this emulation is not so hard, on something
like mlx5 this is completely unworkable. Further we never do
emulation on our devices, they always pass native hardware through,
even for SIOV-like cases.

In the end pci_subdevice_msi_create_irq_domain() is a platform
function. Either it should work completely on every device with no
device-specific emulation required in the VMM, or it should not work
at all and return -EOPNOTSUPP.

The only sane way to implement this generically is for the VMM to
provide a hypercall to obtain a real *working* addr/data pair(s) and
then have the platform hand those out from
pci_subdevice_msi_create_irq_domain(). 

All IMS device drivers will work correctly. No VMM device emulation is
ever needed to translate addr/data pairs.

Earlier in this thread Kevin said hyper-v is already working this way,
even for MSI/MSI-X. To me this says it is fundamentally a KVM platform
problem and it should not be solved by PCI capability flags.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ