lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 6 Nov 2020 09:48:34 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Jason Gunthorpe <jgg@...dia.com>
CC:     "Jiang, Dave" <dave.jiang@...el.com>,
        Bjorn Helgaas <helgaas@...nel.org>,
        "vkoul@...nel.org" <vkoul@...nel.org>,
        "Dey, Megha" <megha.dey@...el.com>,
        "maz@...nel.org" <maz@...nel.org>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "Pan, Jacob jun" <jacob.jun.pan@...el.com>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "Liu, Yi L" <yi.l.liu@...el.com>, "Lu, Baolu" <baolu.lu@...el.com>,
        "Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        "jing.lin@...el.com" <jing.lin@...el.com>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "parav@...lanox.com" <parav@...lanox.com>,
        "rafael@...nel.org" <rafael@...nel.org>,
        "netanelg@...lanox.com" <netanelg@...lanox.com>,
        "shahafs@...lanox.com" <shahafs@...lanox.com>,
        "yan.y.zhao@...ux.intel.com" <yan.y.zhao@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Ortiz, Samuel" <samuel.ortiz@...el.com>,
        "Hossain, Mona" <mona.hossain@...el.com>,
        "dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: RE: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Wednesday, November 4, 2020 9:54 PM
> 
> On Wed, Nov 04, 2020 at 01:34:08PM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@...dia.com>
> > > Sent: Wednesday, November 4, 2020 8:40 PM
> > >
> > > On Wed, Nov 04, 2020 at 03:41:33AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@...dia.com>
> > > > > Sent: Tuesday, November 3, 2020 8:44 PM
> > > > >
> > > > > On Tue, Nov 03, 2020 at 02:49:27AM +0000, Tian, Kevin wrote:
> > > > >
> > > > > > > There is a missing hypercall to allow the guest to do this on its own,
> > > > > > > presumably it will someday be fixed so IMS can work in guests.
> > > > > >
> > > > > > Hypercall is VMM specific, while IMS cap provides a VMM-agnostic
> > > > > > interface so any guest driver (if following the spec) can seamlessly
> > > > > > work on all hypervisors.
> > > > >
> > > > > It is a *VMM* issue, not PCI. Adding a PCI cap to describe a VMM
> issue
> > > > > is architecturally wrong.
> > > > >
> > > > > IMS *can not work* in any hypervsior without some special
> > > > > hypercall. Just block it in the platform code and forget about the PCI
> > > > > cap.
> > > > >
> > > >
> > > > It's per-device thing instead of platform thing. If the VMM understands
> > > > the IMS format of a specific device and virtualize it to the guest,
> > >
> > > Please no! Adding device specific emulation is just going down deeper
> > > into this bad architecture.
> > >
> > > Interrupts is a platform issue. Using emulation of MSI to dynamically
> >
> > Interrupt controller is a platform issue. Interrupt source is about device.
> 
> The interrupt controller is responsible to create an addr/data pair
> for an interrupt message. It sets the message format and ensures it
> routes to the proper CPU interrupt handler. Everything about the
> addr/data pair is owned by the platform interrupt controller.
> 
> Devices do not create interrupts. They only trigger the addr/data pair
> the platform gives them.

I guess that we may just view it from different angles. On x86 platform,
a MSI/IMS capable device directly composes interrupt messages, with 
addr/data pair filled by OS. If there is no IOMMU remapping enabled in 
the middle, the message just hits the CPU. Your description possibly
is from software side, e.g. describing the hierarchical IRQ domain
concept?

> 
> > > insert vectors to a VM was a reasonable, but hacky thing. Now it needs
> > > proper platform support.
> >
> > why is MSI emulation a hacky thing? isn't it defined by PCISIG? I guess
> > that I must misunderstand your real point here...
> 
> It means the interrupt controller in the VM's platform is a fiction,
> the addr/data pairs it creates are not real.
> 
> A PCI device assigned to a VM is supposed to be fully contained by the
> IOMMU, interrupts included, so there is no reason to do MSI emulation
> if the VM's interrupt controller is aware of what addr/data pairs it
> can use with the device - eg by getting them through a hypercall. This
> is much cleaner and supports things like IMS

I agree with this point, just as how pci-hyperv.c works. In concept Linux
guest driver should be able to use IMS when running on Hyper-v. There
is no such thing for KVM, but possibly one day we will need similar stuff.
Before that happens the guest could choose to simply disallow devmsi 
by default in the platform code (inventing a hypercall just for 'disable' 
doesn't make sense) and ignore the IMS cap. One small open is whether
this can be done in one central-place. The detection of running as guest
is done in arch-specific code. Do we need disabling devmsi for every arch?

But when talking about virtualization it's not good to assume the guest
behavior. It's perfectly sane to run a guest OS which doesn't implement 
any PV stuff (thus don't know running in a VM) but do support IMS. In 
such scenario the IMS cap allows the hypervisor to educate the guest 
driver to use MSI instead of IMS, as long as the driver follows the device 
spec. In this regard I don't think that the IMS cap will be a short-term 
thing, although Linux may choose to not use it.

> 
> Trying to do IMS emulation is nutz, the entire point of IMS is the
> device can do what it likes, and emulating that is not going to
> feasible. For instance go read the discussion I had with Thomas how a
> object-centric device would manage interrupts.
> 

Do you mind providing the link? There were lots of discussions between
you and Thomas. I failed to locate the exact mail when searching above
keywords. 

Thanks
Kevin

Powered by blists - more mailing lists