[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c5626d2-ad80-24eb-0b89-402562156135@deltatee.com>
Date: Mon, 29 Nov 2021 16:52:35 -0700
From: Logan Gunthorpe <logang@...tatee.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Bjorn Helgaas <helgaas@...nel.org>,
Marc Zygnier <maz@...nel.org>,
Alex Williamson <alex.williamson@...hat.com>,
Kevin Tian <kevin.tian@...el.com>,
Megha Dey <megha.dey@...el.com>,
Ashok Raj <ashok.raj@...el.com>, linux-pci@...r.kernel.org,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jon Mason <jdmason@...zu.us>,
Dave Jiang <dave.jiang@...el.com>,
Allen Hubbe <allenbh@...il.com>, linux-ntb@...glegroups.com,
linux-s390@...r.kernel.org, Heiko Carstens <hca@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ibm.com>
Subject: Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
On 2021-11-29 4:31 p.m., Jason Gunthorpe wrote:
> On Mon, Nov 29, 2021 at 03:27:20PM -0700, Logan Gunthorpe wrote:
>
>> In most cases, the NTB code needs more interrupts than the hardware
>> actually provides for in its MSI-X table. That's what PCI_IRQ_VIRTUAL is
>> for: it allows the driver to request more interrupts than the hardware
>> advertises (ie. pci_msix_vec_count()). These extra interrupts are
>> created, but get flagged with msi_attrib.is_virtual which ensures
>> functions that program the MSI-X table don't try to write past the end
>> of the hardware's table.
>
> AFAICT what you've described is what Intel is calling IMS in other
> contexts.
>
> IMS is fundamentally a way to control MSI interrupt descriptors that
> are not accessed through PCI SIG compliant means. In this case the NTB
> driver has to do its magic to relay the addr/data pairs to the real
> MSI storage in the hidden devices.
With current applications, it isn't that there is real "MSI storage"
anywhere; the device on the other side of the bridge is always another
Linux host which holds the address (or rather mw offset) and data in
memory to use when it needs to trigger the interrupt of the other
machine. There are many prototypes and proprietary messes that try to
have other PCI devices (ie NVMe, etc) behind the non-transparent bridge;
but the Linux subsystem has no support for this.
> PCI_IRQ_VIRTUAL should probably be fully replaced by the new dynamic
> APIs in the fullness of time..
Perhaps, I don't really know much about IMS or how close a match it is.
>> Existing NTB hardware does already have what's called a doorbell which
>> provides the same functionally as the above technique. However, existing
>> hardware implementations of doorbells have significant latency and thus
>> slow down performance substantially. Implementing the MSI interrupts as
>> described above increased the performance of ntb_transport by more than
>> three times[1].
>
> Does the doorbell scheme allow as many interrupts?
No, but for current applications there are plenty of doorbells.
Switchtec hardware (and I think other hardware) typically have 64
doorbells for the entire network (they must be split among the number of
hosts in the network; a two host system could have 32 per host). The NTB
subsystem in Linux only currently supports 2 hosts, but switchtec
hardware supports up to 48 hosts, in which case you might only have 1
doorbell per host and that might be limiting depending on the application.
Logan
Powered by blists - more mailing lists