[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB1886811339F7873A8E34549A8C089@MWHPR11MB1886.namprd11.prod.outlook.com>
Date: Wed, 23 Jun 2021 06:12:05 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Alex Williamson <alex.williamson@...hat.com>
CC: Jason Gunthorpe <jgg@...dia.com>,
"Dey, Megha" <megha.dey@...el.com>,
"Raj, Ashok" <ashok.raj@...el.com>,
"Pan, Jacob jun" <jacob.jun.pan@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>,
"Liu, Yi L" <yi.l.liu@...el.com>, "Lu, Baolu" <baolu.lu@...el.com>,
"Williams, Dan J" <dan.j.williams@...el.com>,
"Luck, Tony" <tony.luck@...el.com>,
"Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
LKML <linux-kernel@...r.kernel.org>, KVM <kvm@...r.kernel.org>,
Kirti Wankhede <kwankhede@...dia.com>,
"Peter Zijlstra" <peterz@...radead.org>,
Marc Zyngier <maz@...nel.org>,
"Bjorn Helgaas" <helgaas@...nel.org>
Subject: RE: Virtualizing MSI-X on IMS via VFIO
> From: Thomas Gleixner <tglx@...utronix.de>
> Sent: Wednesday, June 23, 2021 7:59 AM
>
[...]
> I only can assume that back then the main problem was vector exhaustion
> on the host and to avoid allocating memory for interrupt descriptors
> etc, right?
>
> The host vector exhaustion problem was that each MSIX vector consumed a
> real CPU vector which is a limited resource on X86. This is not longer
> the case today:
>
> 1) pci_msix_enable[range]() consumes exactly zero CPU vectors from
> the allocatable range independent of the number of MSIX vectors
> it allocates, unless it is in multi-queue managed mode where it
> will actually reserve a vector (maybe two) per CPU.
>
> But for devices which are not using that mode, they just
> opportunistically "reserve" vectors.
>
> All entries are initialized with a special system vector which
> when raised will emit a nastigram in dmesg.
>
> 2) request_irq() actually allocates a CPU vector from the
> allocatable vector space which can obviously still fail, which is
> perfectly fine.
>
> So the only downside today of allocating more MSI-X vectors than
> necessary is memory consumption for the irq descriptors.
Curious about irte entry when IRQ remapping is enabled. Is it also
allocated at request_irq()?
>
> Though for virtualization there is still another problem:
>
> Even if all possible MSI-X vectors for a passthrough PCI device would
> be allocated upfront independent of the actual usage in the guest,
> then there is still the issue of request_irq() failing on the host
> once the guest decides to use any of those interrupts.
>
> It's a halfways reasonable argumentation by some definition of
> reasonable, that this case would be a host system configuration problem
> and the admin who overcommitted is responsible for the consequence.
>
> Where the only reasonable consequence is to kill the guest right there
> because there is no mechanism to actually tell it that the host ran out
> of resources.
>
> Not at all a pretty solution, but it is contrary to the status quo well
> defined. The most important aspect is that it is well defined for the
> case of success:
>
> If it succeeds then there is no way that already requested interrupts
> can be lost or end up being redirected to the legacy PCI irq due to
> clearing the MSIX enable bit, which is a gazillion times better than
> the "let's hope it works" based tinkerware we have now.
fair enough.
>
> So, aside of the existing VFIO/PCI/MSIX thing being just half thought
> out, even thinking about proliferating this with IMS is bonkers.
>
> IMS is meant to avoid the problem of MSI-X which needs to disable MSI-X
> in order to expand the number of vectors. The use cases are going to be
> even more dynamic than the usual device drivers, so the lost interrupt
> issue will be much more likely to trigger.
>
> So no, we are not going to proliferate this complete ignorance of how
> MSI-X actually works and just cram another "feature" into code which is
> known to be incorrect.
>
So the correct flow is like below:
guest::enable_msix()
trapped_by_host()
pci_alloc_irq_vectors(); // for all possible vMSI-X entries
pci_enable_msix();
guest::unmask()
trapped_by_host()
request_irqs();
the first trap calls a new VFIO ioctl e.g. VFIO_DEVICE_ALLOC_IRQS.
the 2nd trap can reuse existing VFIO_DEVICE_SET_IRQS which just
does request_irq() if specified irqs have been allocated.
Then map ims to this flow:
guest::enable_msix()
trapped_by_host()
msi_domain_alloc_irqs(); // for all possible vMSI-X entries
for_all_allocated_irqs(i)
pci_update_msi_desc_id(i, default_pasid); // a new helper func
guest::unmask(entry#0)
trapped_by_host()
request_irqs();
ims_array_irq_startup(); // write msi_desc.id (default_pasid) to ims entry
guest::set_msix_perm(entry#1, guest_sva_pasid)
trapped_by_host()
pci_update_msi_desc_id(1, host_sva_pasid);
guest::unmask(entry#1)
trapped_by_host()
request_irqs();
ims_array_irq_startup(); // write msi_desc.id (host_sva_pasid) to ims entry
Does above match your thoughts?
Thanks
Kevin
Powered by blists - more mailing lists