[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210114200825.GR4147@nvidia.com>
Date: Thu, 14 Jan 2021 16:08:25 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: Leon Romanovsky <leon@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Saeed Mahameed <saeedm@...dia.com>,
Jakub Kicinski <kuba@...nel.org>,
linux-pci <linux-pci@...r.kernel.org>,
<linux-rdma@...r.kernel.org>, Netdev <netdev@...r.kernel.org>,
Don Dutile <ddutile@...hat.com>,
Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [PATCH mlx5-next v1 2/5] PCI: Add SR-IOV sysfs entry to read
number of MSI-X vectors
On Thu, Jan 14, 2021 at 11:24:12AM -0800, Alexander Duyck wrote:
> > As you say BAR and MSI vector table are not so different, it seems
> > perfectly in line with current PCI sig thinking to allow resizing the
> > MSI as well
>
> The resizing of a BAR has an extended capability that goes with it and
> isn't something that the device can just do on a whim. This patch set
> is not based on implementing some ECN for resizable MSI-X tables. I
> view it as arbitrarily rewriting the table size for a device after it
> is created.
The only difference is resizing the BAR is backed by an ECN, and this
is an extension. The device does not "do it on a whim" the OS tells it
when to change the size, exactly like for BAR resizing.
> In addition Leon still hasn't answered my question on preventing the
> VF driver from altering entries beyond the ones listed in the table.
Of course this is blocked, the FW completely revokes the HW resource
backing the vectors.
> From what I can tell, the mlx5 Linux driver never reads the MSI-X
> flags register so it isn't reading the MSI-X size either.
I don't know why you say that. All Linux drivers call into something
like pci_alloc_irq_vectors() requesting a maximum # of vectors and
that call returns the actual allocated. Drivers can request more
vectors than the system provides, which is what mlx5 does.
Under the call chain of pci_alloc_irq_vectors() it calls
pci_msix_vec_count() which does
pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
return msix_table_size(control);
And eventually uses that to return the result to the driver.
So, yes, it reads the config space and ensures it doesn't allocate
more vectors than that.
Every driver using MSI does this in Linux.
Adjusting config space *directly* limits the number of vectors the
driver allocates.
You should be able to find the call chain in mlx5 based on the above
guidance.
> At a minimum I really think we need to go through and have a clear
> definition on when updating the MSI-X table size is okay and when it
> is not. I am not sure just saying to not update it when a driver isn't
> attached is enough to guarantee all that.
If you know of a real issue then please state it, other don't fear
monger "maybe" issues that don't exist.
> What we are talking about is the MSI-X table size. Not the number of
> MSI-X vectors being requested by the device driver. Those are normally
> two seperate things.
Yes, table size is what is critical. The number of entries in that BAR
memory is what needs to be controlled.
> > The standards based way to communicate that limit is the MSI table cap
> > size.
>
> This only defines the maximum number of entries, not how many have to be used.
A driver can't use entries beyond the cap. We are not trying to
reclaim vectors that are available but not used by the OS.
> I'm not offering up a non-standard way to do this. Just think about
> it. If I have a device that defines an MSI-X table size of 2048 but
> makes use of only one vector how would that be any different than what
> I am suggesting where you size your VF to whatever the maximum is you
> need but only make use of some fixed number from the hardware.
That is completely different, in the hypervisor there is no idea how
many vectors a guest OS will create. The FW is told to only permit 1
vector. How is the guest to know this if we don't update the config
space *as the standard requires* ?
> I will repeat what I said before. Why can't this be handled via the
> vfio interface?
1) The FW has to be told of the limit and everything has to be in sync
If the FW is out of sync with the config space then everything
breaks if the user makes even a small mistake - for instance
forgetting to use the ioctl to override vfio. This is needlessly
frail and complicated.
2) VFIO needs to know how to tell the FW the limit so it can override
the config space with emulation. This is all device specific and I
don't see that adding an extension to vfio is any better than
adding one here.
3) VFIO doesn't cover any other driver that binds to the VF, so
this "solution" leaves the normal mlx5_core functionally broken on
VFs which is a major regression.
Overall the entire idea to have the config space not reflect the
actual current device configuration seems completely wrong to me - why
do this? For what gain? It breaks everything.
Jason
Powered by blists - more mailing lists