[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201118183830.GA917484@nvidia.com>
Date: Wed, 18 Nov 2020 14:38:30 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: David Ahern <dsahern@...il.com>
CC: Parav Pandit <parav@...dia.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
Jiri Pirko <jiri@...dia.com>,
"dledford@...hat.com" <dledford@...hat.com>,
Leon Romanovsky <leonro@...dia.com>,
Saeed Mahameed <saeedm@...dia.com>,
"kuba@...nel.org" <kuba@...nel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
Vu Pham <vuhuong@...dia.com>
Subject: Re: [PATCH net-next 03/13] devlink: Support add and delete devlink
port
On Wed, Nov 18, 2020 at 11:03:24AM -0700, David Ahern wrote:
> With Connectx-4 Lx for example the netdev can have at most 63 queues
What netdev calls a queue is really a "can the device deliver
interrupts and packets to a given per-CPU queue" and covers a whole
spectrum of smaller limits like RSS scheme, # of available interrupts,
ability of the device to create queues, etc.
CX4Lx can create a huge number of queues, but hits one of these limits
that mean netdev's specific usage can't scale up. Other stuff like
RDMA doesn't have the same limits, and has tonnes of queues.
What seems to be needed is a resource controller concept like cgroup
has for processes. The system is really organized into a tree:
physical device
mlx5_core
/ | \ \ (aux bus)
netdev rdma vdpa SF etc
| (aux bus)
mlx5_core
/ \ (aux bus)
netdev vdpa
And it does make a lot of sense to start to talk about limits at each
tree level.
eg the top of the tree may have 128 physical interrupts. With 128 CPU
cores that isn't enough interrupts to support all of those things
concurrently.
So the user may want to configure:
- The first level netdev only gets 64,
- 3rd level mlx5_core gets 32
- Final level vdpa gets 8
Other stuff has to fight it out with the remaining shared interrupts.
In netdev land # of interrupts governs # of queues
For RDMA # of interrupts limits the CPU affinities for queues
VPDA limits the # of VMs that can use VT-d
The same story repeats for other less general resources, mlx5 also
has consumption of limited BAR space, and consumption of some limited
memory elements. These numbers are much bigger and may not need
explicit governing, but the general concept holds.
It would be very nice if the limit could be injected when the aux
device is created but before the driver is bound. I'm not sure how to
manage that though..
I assume other devices will be different, maybe some devices have a
limit on the number of total queues, or a limit on the number of
VDPA or RDMA devices.
Jason
Powered by blists - more mailing lists