[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39dbf7f6-76e0-4319-97d8-24b54e788435@nvidia.com>
Date: Thu, 1 Feb 2024 11:16:22 -0800
From: William Tu <witu@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Jacob Keller <jacob.e.keller@...el.com>, bodong@...dia.com,
jiri@...dia.com, netdev@...r.kernel.org, saeedm@...dia.com,
"aleksander.lobakin@...el.com" <aleksander.lobakin@...el.com>,
ecree.xilinx@...il.com, Yossi Kuperman <yossiku@...dia.com>,
William Tu <u9012063@...il.com>
Subject: Re: [RFC PATCH v3 net-next] Documentation: devlink: Add devlink-sd
On 1/31/24 3:17 PM, Jakub Kicinski wrote:
> External email: Use caution opening links or attachments
>
>
> On Wed, 31 Jan 2024 15:02:58 -0800 William Tu wrote:
>>> I just did a grep on METADATA_HW_PORT_MUX and assumed bnxt, ice and nfp
>>> all do buffer sharing. You're saying you mux Tx queues but not Rx
>>> queues? Or I need to actually read the code instead of grepping? :)
>> I guess bnxt, ice, nfp are doing tx buffer sharing?
> I'm not familiar with ice. I'm 90% sure bnxt shares both Rx and Tx.
> I'm 99.9% sure nfp does.
>
> It'd be great if you could do the due diligence rather than guessing
> given that you're proposing uAPI extension :(
>
*
(sorry again, html is detected in previous email)
due diligence here:
Summary
======
1. The VF-reps that simply "use" PF-rep's queue for rx and tx:
sfc, ice (under review), bnxt, and nfp
2. VF-rep that has its own rx/tx queue:
ice (1 rx/tx queue per rep), mlx5 (multiple rx/tx queues)
case 1: no way to prioritize important VF-rep’s traffic
case 2: scalling to 1k repr will wastes lots of memory.
Details
=======
Based on reading the code around open_repr, napi_poll for repr, and
search commit message.
ICE
---
has dedicated 1 rx/tx ring for each VF-REP. Patches from Michal under
review for VF-REP to share PF-REP’s queue.
see:
ice_eswitch_remap_rings_to_vectors, it is setting up tx rings and rx
rings for each reps. "Each port representor will have dedicated 1 Tx/Rx
ring pair, so number of rings pair is equal to number of VFs."
and later on it's setting up 1 napi for each rep, see ice_eswitch_setup_repr
BNXT
----
no dedicated rx/tx ring for rep. VF-REP rx/tx share PF-rep's rx/tx ring.
see
commit ee5c7fb34047: bnxt_en: add vf-rep RX/TX and netdev implementation
"This patch introduces the RX/TX and a simple netdev implementationfor
VF-reps. The VF-reps use the RX/TX rings of the PF. "
NFP
---
VF-rep uses PF-rep’s rx/tx queues
see:
https://lore.kernel.org/netdev/20170621095933.GC6691@vergenet.net/T/
<https://lore.kernel.org/netdev/20170621095933.GC6691@vergenet.net/T/>
“The PF netdev acts as a lower-device which sends and receives packets to
and from the firmware. The representors act as upper-devices. For TX
representors attach a metadata dst to the skb which is used by the PF
netdev to prepend metadata to the packet before forwarding the firmware. On
RX the PF netdev looks up the representor based on the prepended metadata”
and
nfp_flower_spawn_vnic_reprs
nfp_abm_spawn_repr -> nfp_repr_alloc_mqs
nfp_repr_open -> nfp_app_repr -> nfp_flower_repr_netdev_open
SFC
---
VF-rep uses PF-rep’s queues, or PF-rep receives packets for VF-rep
See:
efx_ef100_rep_poll
efx_ef100_rep_open -> efx_ef100_rep_poll
“commit 9fe00c8 sfc: ef100 representor RX top half”
Representor RX uses a NAPI context driven by a 'fake interrupt': when
the parent PF receives a packet destined for the representor, it adds
it to an SKB list (efv->rx_list), and schedules NAPI if the 'fake
interrupt' is primed.
MLX5
----
VF-reps has its own dedicated rx/tx queue
all representor queues have descriptors only used for the individual
representor
see mlx5e_rep_open -> mlx5e_open_locked -> mlx5e_open_channels -> queues
*
Powered by blists - more mailing lists