lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d69953378bd1fdcdda54a2fbe285f6c0b1484e8a.camel@infradead.org>
Date:   Sun, 08 Nov 2020 19:36:34 +0000
From:   David Woodhouse <dwmw2@...radead.org>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Jason Gunthorpe <jgg@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>
Cc:     "Raj, Ashok" <ashok.raj@...el.com>,
        "Tian, Kevin" <kevin.tian@...el.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        Bjorn Helgaas <helgaas@...nel.org>,
        "vkoul@...nel.org" <vkoul@...nel.org>,
        "Dey, Megha" <megha.dey@...el.com>,
        "maz@...nel.org" <maz@...nel.org>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "Pan, Jacob jun" <jacob.jun.pan@...el.com>,
        "Liu, Yi L" <yi.l.liu@...el.com>, "Lu, Baolu" <baolu.lu@...el.com>,
        "Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        "jing.lin@...el.com" <jing.lin@...el.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "parav@...lanox.com" <parav@...lanox.com>,
        "rafael@...nel.org" <rafael@...nel.org>,
        "netanelg@...lanox.com" <netanelg@...lanox.com>,
        "shahafs@...lanox.com" <shahafs@...lanox.com>,
        "yan.y.zhao@...ux.intel.com" <yan.y.zhao@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Ortiz, Samuel" <samuel.ortiz@...el.com>,
        "Hossain, Mona" <mona.hossain@...el.com>,
        "dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

On Sun, 2020-11-08 at 19:47 +0100, Thomas Gleixner wrote:
> This only works when the guest OS actually knows that it runs in a
> VM. If the guest can't figure that out, i.e. via CPUID, this cannot be
> solved because from the guest OS view that's the same as running on bare
> metal. Obviously on bare metal the Vector domain can and must handle
> this.
> 
> So this needs some thought.

The problem here is that Intel implemented interrupt remapping in a way
which is anathema to structured, ordered IRQ domains.

When a guest writes an MSI message (addr/data) to the MSI table of a
PCI device which has been assigned to that guest, it *doesn't* properly
inherit the MSI composition from a parent irqdomain which knows about
the (host-side) IOMMU.

What actually happens is the hypervisor *traps* the writes to the
device's MSI table, and translates them *then*. In *precisely* the
fashion which we're trying to avoid for IMS.

Now, you can imagine a world where it wasn't like this, where
Remappable Format MSI messages don't exist, and where we let guests
write native MSI message to the device without trapping — and where the
IOMMU then sees the incoming interrupt and has to map the APIC ID to a
*virtual* CPU for that guest, based on the PCI source-id of the device.

In that world, IMS would work naturally. But that isn't how Intel
designed interrupt remapping. They *designed* to have to trap and
translate as the message is written to the device.

So it does look like we're going to need a hypercall interface to
compose an MSI message on behalf of the guest, for IMS to use. In fact
PCI devices assigned to a guest could use that too, and then we'd only
need to trap-and-remap any attempt to write a Compatibility Format MSI
to the device's MSI table, while letting Remappable Format messages get
written directly.

We'd also need a way for an OS running on bare metal to *know* that
it's on bare metal and can just compose MSI messages for itself. Since
we do expect bare metal to have an IOMMU, perhaps that is just a
feature flag on the IOMMU?

That or Intel needs to fix the IOMMU to do proper virtualisation and
actually translate "Compatibility Format" MSIs for a guest too.

Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5174 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ