lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201108234142.GD2620339@nvidia.com>
Date:   Sun, 8 Nov 2020 19:41:42 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     "Raj, Ashok" <ashok.raj@...el.com>
CC:     Dan Williams <dan.j.williams@...el.com>,
        "Tian, Kevin" <kevin.tian@...el.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        Bjorn Helgaas <helgaas@...nel.org>,
        "vkoul@...nel.org" <vkoul@...nel.org>,
        "Dey, Megha" <megha.dey@...el.com>,
        "maz@...nel.org" <maz@...nel.org>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "Pan, Jacob jun" <jacob.jun.pan@...el.com>,
        "Liu, Yi L" <yi.l.liu@...el.com>, "Lu, Baolu" <baolu.lu@...el.com>,
        "Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        "jing.lin@...el.com" <jing.lin@...el.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "parav@...lanox.com" <parav@...lanox.com>,
        "rafael@...nel.org" <rafael@...nel.org>,
        "netanelg@...lanox.com" <netanelg@...lanox.com>,
        "shahafs@...lanox.com" <shahafs@...lanox.com>,
        "yan.y.zhao@...ux.intel.com" <yan.y.zhao@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Ortiz, Samuel" <samuel.ortiz@...el.com>,
        "Hossain, Mona" <mona.hossain@...el.com>,
        "dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

On Sun, Nov 08, 2020 at 10:11:24AM -0800, Raj, Ashok wrote:

> > On (kvm) virtualization the addr/data pair the IRQ domain hands out
> > doesn't work. It is some fake thing.
> 
> Is it really some fake thing? I thought the vCPU and vector are real
> for a guest, and VMM ensures when interrupts are delivered they are either.

It is fake in the sense it is programmed into no hardware.
 
It is real in the sense it is an ABI contract with the VMM.

> > On something like IDXD this emulation is not so hard, on something
> > like mlx5 this is completely unworkable. Further we never do
> > emulation on our devices, they always pass native hardware through,
> > even for SIOV-like cases.
> 
> So is that true for interrupts too? 

There is no *mlx5* emulation. We ride on the generic MSI emulation KVM
is going.

> Possibly you have the interrupt entries sitting in memory resident
> on the device?

For SRIOV, yes. The appeal of IMS is to move away from that.

> Don't we need the VMM to ensure they are brokered by VMM in either
> one of the two ways above?

Yes, no matter what the VMM has to know the guest wants an interrupt
routed in and setup the VMM part of the equation. With SRIOV this is
all done with the MSI trapping.

> What if the guest creates some addr in the 0xfee... range how do we
> take care of interrupt remapping and such without any VMM assist?

Not sure I understand this?

> That's true. Probably this can work the same even for MSIx types too then?

Yes, once you have the ability to hypercall to create the addr/data
pair then it can work with MSI and the VMM can stop emulation. It
would be a nice bit of uniformity to close this, but switching the VMM
from legacy to new mode is going to be tricky, I fear.

> I agree with the overall idea and we should certainly take that into
> consideration when we need IMS in guest support and in context of
> interrupt remapping.

The issue with things, as they sit now, is SRIOV.

If any driver starts using pci_subdevice_msi_create_irq_domain() then
it fails if the VF is assigned to a guest with SRVIO. This is a real
and important, use case for many devices today!

The "solution" can't be to go back and retroactively change every
shipping device to add PCI capability blocks, and ensure that every
existing VMM strips them out before assigning the device (including
Hyper-V!!)  :(

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ