netdev - Re: [RFC PATCH v3 net-next] Documentation: devlink: Add devlink-sd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <39dbf7f6-76e0-4319-97d8-24b54e788435@nvidia.com>
Date: Thu, 1 Feb 2024 11:16:22 -0800
From: William Tu <witu@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Jacob Keller <jacob.e.keller@...el.com>, bodong@...dia.com,
 jiri@...dia.com, netdev@...r.kernel.org, saeedm@...dia.com,
 "aleksander.lobakin@...el.com" <aleksander.lobakin@...el.com>,
 ecree.xilinx@...il.com, Yossi Kuperman <yossiku@...dia.com>,
 William Tu <u9012063@...il.com>
Subject: Re: [RFC PATCH v3 net-next] Documentation: devlink: Add devlink-sd



On 1/31/24 3:17 PM, Jakub Kicinski wrote:
> External email: Use caution opening links or attachments
>
>
> On Wed, 31 Jan 2024 15:02:58 -0800 William Tu wrote:
>>> I just did a grep on METADATA_HW_PORT_MUX and assumed bnxt, ice and nfp
>>> all do buffer sharing. You're saying you mux Tx queues but not Rx
>>> queues? Or I need to actually read the code instead of grepping? :)
>> I guess bnxt, ice, nfp are doing tx buffer sharing?
> I'm not familiar with ice. I'm 90% sure bnxt shares both Rx and Tx.
> I'm 99.9% sure nfp does.
>
> It'd be great if you could do the due diligence rather than guessing
> given that you're proposing uAPI extension :(
>
*

(sorry again, html is detected in previous email)

due diligence here:


Summary

======

1. The VF-reps that simply "use" PF-rep's queue for rx and tx:

    sfc, ice (under review), bnxt, and nfp

2. VF-rep that has its own rx/tx queue:

    ice (1 rx/tx queue per rep), mlx5 (multiple rx/tx queues)


case 1: no way to prioritize important VF-rep’s traffic

case 2: scalling to 1k repr will wastes lots of memory.


Details

=======

Based on reading the code around open_repr, napi_poll for repr, and 
search commit message.

ICE

---

has dedicated 1 rx/tx ring for each VF-REP. Patches from Michal under 
review for VF-REP to share PF-REP’s queue.

see:

ice_eswitch_remap_rings_to_vectors, it is setting up tx rings and rx 
rings for each reps. "Each port representor will have dedicated 1 Tx/Rx 
ring pair, so number of rings pair is equal to number of VFs."

and later on it's setting up 1 napi for each rep, see ice_eswitch_setup_repr


BNXT

----

no dedicated rx/tx ring for rep. VF-REP rx/tx share PF-rep's rx/tx ring.

see

commit ee5c7fb34047: bnxt_en: add vf-rep RX/TX and netdev implementation

"This patch introduces the RX/TX and a simple netdev implementationfor 
VF-reps. The VF-reps use the RX/TX rings of the PF. "


NFP

---

VF-rep uses PF-rep’s rx/tx queues

see:

https://lore.kernel.org/netdev/20170621095933.GC6691@vergenet.net/T/ 
<https://lore.kernel.org/netdev/20170621095933.GC6691@vergenet.net/T/>

“The PF netdev acts as a lower-device which sends and receives packets to

and from the firmware. The representors act as upper-devices. For TX

representors attach a metadata dst to the skb which is used by the PF

netdev to prepend metadata to the packet before forwarding the firmware. On

RX the PF netdev looks up the representor based on the prepended metadata”


and

nfp_flower_spawn_vnic_reprs

nfp_abm_spawn_repr -> nfp_repr_alloc_mqs

nfp_repr_open -> nfp_app_repr -> nfp_flower_repr_netdev_open


SFC

---

VF-rep uses PF-rep’s queues, or PF-rep receives packets for VF-rep


See:

efx_ef100_rep_poll

efx_ef100_rep_open -> efx_ef100_rep_poll


“commit 9fe00c8 sfc: ef100 representor RX top half”

Representor RX uses a NAPI context driven by a 'fake interrupt': when

the parent PF receives a packet destined for the representor, it adds

it to an SKB list (efv->rx_list), and schedules NAPI if the 'fake

interrupt' is primed.


MLX5

----

VF-reps has its own dedicated rx/tx queue


all representor queues have descriptors only used for the individual 
representor

see mlx5e_rep_open -> mlx5e_open_locked -> mlx5e_open_channels -> queues

*