[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210202181401.66f4359f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Tue, 2 Feb 2021 18:14:01 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Yishai Hadas <yishaih@...dia.com>
Cc: <netdev@...r.kernel.org>, <davem@...emloft.net>,
<parav@...dia.com>, <saeedm@...dia.com>
Subject: Re: [PATCH net-next 0/2] devlink: Add port function attribute to
enable/disable roce
On Mon, 1 Feb 2021 19:51:50 +0200 Yishai Hadas wrote:
> Currently mlx5 PCI VF and SF are enabled by default for RoCE
> functionality.
>
> Currently a user does not have the ability to disable RoCE for a PCI
> VF/SF device before such device is enumerated by the driver.
>
> User is also incapable to do such setting from smartnic scenario for a
> VF from the smartnic.
>
> Current 'enable_roce' device knob is limited to do setting only at
> driverinit time. By this time device is already created and firmware has
> already allocated necessary system memory for supporting RoCE.
>
> When a RoCE is disabled for the PCI VF/SF device, it saves 1 Mbyte of
> system memory per function. Such saving is helpful when running on low
> memory embedded platform with many VFs or SFs.
>
> Therefore, it is desired to empower user to disable RoCE functionality
> before a PCI SF/VF device is enumerated.
You say that the user on the VF/SF side wants to save memory, yet
the control knob is on the eswitch instance side, correct?
> This is achieved by extending existing 'port function' object to control
> capabilities of a function. This enables users to control capability of
> the device before enumeration.
>
> Examples when user prefers to disable RoCE for a VF when using switchdev
> mode:
>
> $ devlink port show pci/0000:06:00.0/1
> pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0
> pfnum 0 vfnum 0 external false splittable false
> function:
> hw_addr 00:00:00:00:00:00 roce on
>
> $ devlink port function set pci/0000:06:00.0/1 roce off
>
> $ devlink port show pci/0000:06:00.0/1
> pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0
> pfnum 0 vfnum 0 external false splittable false
> function:
> hw_addr 00:00:00:00:00:00 roce off
>
> FAQs:
> -----
> 1. What does roce on/off do?
> Ans: It disables RoCE capability of the function before its enumerated,
> so when driver reads the capability from the device firmware, it is
> disabled.
> At this point RDMA stack will not be able to create UD, QP1, RC, XRC
> type of QPs. When RoCE is disabled, the GID table of all ports of the
> device is disabled in the device and software stack.
>
> 2. How is the roce 'port function' option different from existing
> devlink param?
> Ans: RoCE attribute at the port function level disables the RoCE
> capability at the specific function level; while enable_roce only does
> at the software level.
>
> 3. Why is this option for disabling only RoCE and not the whole RDMA
> device?
> Ans: Because user still wants to use the RDMA device for non RoCE
> commands in more memory efficient way.
What are those "non-RoCE commands" that user may want to use "in a more
efficient way"?
Powered by blists - more mailing lists