[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5c47f5de-c3cf-4921-9e8c-efc8b55f1d7f@linux.dev>
Date: Sat, 6 Apr 2024 11:05:41 +0200
From: Zhu Yanjun <yanjun.zhu@...ux.dev>
To: Parav Pandit <parav@...dia.com>, netdev@...r.kernel.org,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, corbet@....net, dw@...idwei.uk,
kalesh-anakkur.purayil@...adcom.com
Cc: saeedm@...dia.com, leon@...nel.org, jiri@...nulli.us, shayd@...dia.com,
danielj@...dia.com, dchumak@...dia.com, linux-doc@...r.kernel.org,
linux-rdma@...r.kernel.org
Subject: Re: [net-next v4 0/2] devlink: Add port function attribute for IO EQs
在 2024/4/6 3:05, Parav Pandit 写道:
> Currently, PCI SFs and VFs use IO event queues to deliver netdev per
> channel events. The number of netdev channels is a function of IO
> event queues. In the second scenario of an RDMA device, the
> completion vectors are also a function of IO event queues. Currently, an
> administrator on the hypervisor has no means to provision the number
> of IO event queues for the SF device or the VF device. Device/firmware
> determines some arbitrary value for these IO event queues. Due to this,
> the SF netdev channels are unpredictable, and consequently, the
> performance is too.
>
> This short series introduces a new port function attribute: max_io_eqs.
> The goal is to provide administrators at the hypervisor level with the
> ability to provision the maximum number of IO event queues for a
> function. This gives the control to the administrator to provision
> right number of IO event queues and have predictable performance.
>
> Examples of when an administrator provisions (set) maximum number of
> IO event queues when using switchdev mode:
>
> $ devlink port show pci/0000:06:00.0/1
> pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
> function:
> hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10
>
> $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20
>
> $ devlink port show pci/0000:06:00.0/1
> pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
> function:
> hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20
>
> This sets the corresponding maximum IO event queues of the function
> before it is enumerated. Thus, when the VF/SF driver reads the
> capability from the device, it sees the value provisioned by the
> hypervisor. The driver is then able to configure the number of channels
> for the net device, as well as the number of completion vectors
> for the RDMA device. The device/firmware also honors the provisioned
> value, hence any VF/SF driver attempting to create IO EQs
> beyond provisioned value results in an error.
>
> With above setting now, the administrator is able to achieve the 2x
> performance on SFs with 20 channels. In second example when SF was
> provisioned for a container with 2 cpus, the administrator provisioned only
> 2 IO event queues, thereby saving device resources.
>
The following paragraph is the same with the above paragraph?
> With the above settings now in place, the administrator achieved 2x
> performance with the SF device with 20 channels. In the second example,
> when the SF was provisioned for a container with 2 CPUs, the administrator
> provisioned only 2 IO event queues, thereby saving device resources.
>
> changelog:
> v2->v3:
> - limited to 80 chars per line in devlink
> - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock
> on error path
> v1->v2:
> - limited comment to 80 chars per line in header file
> - fixed set function variables for reverse christmas tree
> - fixed comments from Kalesh
> - fixed missing kfree in get call
> - returning error code for get cmd failure
> - fixed error msg copy paste error in set on cmd failure
>
> Parav Pandit (2):
> devlink: Support setting max_io_eqs
> mlx5/core: Support max_io_eqs for a function
>
> .../networking/devlink/devlink-port.rst | 33 +++++++
> .../mellanox/mlx5/core/esw/devlink_port.c | 4 +
> .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++
> .../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++
> include/net/devlink.h | 14 +++
> include/uapi/linux/devlink.h | 1 +
> net/devlink/port.c | 53 ++++++++++
> 7 files changed, 209 insertions(+)
>
Powered by blists - more mailing lists